Managing HP Serviceguard A.11.20.10 for Linux, December 2012

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux RH AS ProLiant Cluster

251

252

253

254

255

256

257

258

259

260

For more information, including requirements and recommendations, see the MEMBER_TIMEOUT

discussion under “Cluster Configuration Parameters ” (page 86).

8.8.5 System Administration Errors

There are a number of errors you can make when configuring Serviceguard that will not show up

when you start the cluster. Your cluster can be running, and everything appears to be fine, until

there is a hardware or software failure and control of your packages are not transferred to another

node as you would have expected.

These are errors caused specifically by errors in the cluster configuration file and package

configuration scripts. Examples of these errors include:

• Volume groups not defined on adoptive node.

• Mount point does not exist on adoptive node.

• Network errors on adoptive node (configuration errors).

• User information not correct on adoptive node.

You can use the following commands to check the status of your disks:

• df - to see if your package’s volume group is mounted.

• vgdisplay -v - to see if all volumes are present.

• strings /etc/lvmconf/*.conf - to ensure that the configuration is correct.

• fdisk -v /dev/sdx - to display information about a disk.

8.8.5.1 Package Control Script Hangs or Failures

When a RUN_SCRIPT_TIMEOUT or HALT_SCRIPT_TIMEOUT value is set, and the control script

hangs, causing the timeout to be exceeded, Serviceguard kills the script and marks the package

“Halted.” Similarly, when a package control script fails, Serviceguard kills the script and marks

the package “Halted.” In both cases, the following also take place:

• Control of the package will not be transferred.

• The run or halt instructions may not run to completion.

• Global switching will be disabled.

• The current node will be disabled from running the package.

Following such a failure, since the control script is terminated, some of the package’s resources

may be left activated. Specifically:

• Volume groups may be left active.

• File systems may still be mounted.

• IP addresses may still be installed.

• Services may still be running.

In this kind of situation, Serviceguard will not restart the package without manual intervention. You

must clean up manually before restarting the package. Use the following steps as guidelines:

1. Perform application specific cleanup. Any application specific actions the control script might

have taken should be undone to ensure successfully starting the package on an alternate node.

This might include such things as shutting down application processes, removing lock files,

and removing temporary files.

2. Ensure that package IP addresses are removed from the system. This step is accomplished via

the cmmodnet(1m) command. First determine which package IP addresses are installed by

inspecting the output resulting from running the ifconfig command. If any of the IP addresses

8.8 Solving Problems 251