Managing HP Serviceguard A.11.20.10 for Linux, December 2012

For more information, including requirements and recommendations, see the MEMBER_TIMEOUT
discussion under “Cluster Configuration Parameters ” (page 86).
8.8.5 System Administration Errors
There are a number of errors you can make when configuring Serviceguard that will not show up
when you start the cluster. Your cluster can be running, and everything appears to be fine, until
there is a hardware or software failure and control of your packages are not transferred to another
node as you would have expected.
These are errors caused specifically by errors in the cluster configuration file and package
configuration scripts. Examples of these errors include:
Volume groups not defined on adoptive node.
Mount point does not exist on adoptive node.
Network errors on adoptive node (configuration errors).
User information not correct on adoptive node.
You can use the following commands to check the status of your disks:
df - to see if your package’s volume group is mounted.
vgdisplay -v - to see if all volumes are present.
strings /etc/lvmconf/*.conf - to ensure that the configuration is correct.
fdisk -v /dev/sdx - to display information about a disk.
8.8.5.1 Package Control Script Hangs or Failures
When a RUN_SCRIPT_TIMEOUT or HALT_SCRIPT_TIMEOUT value is set, and the control script
hangs, causing the timeout to be exceeded, Serviceguard kills the script and marks the package
“Halted.” Similarly, when a package control script fails, Serviceguard kills the script and marks
the package “Halted.” In both cases, the following also take place:
Control of the package will not be transferred.
The run or halt instructions may not run to completion.
Global switching will be disabled.
The current node will be disabled from running the package.
Following such a failure, since the control script is terminated, some of the package’s resources
may be left activated. Specifically:
Volume groups may be left active.
File systems may still be mounted.
IP addresses may still be installed.
Services may still be running.
In this kind of situation, Serviceguard will not restart the package without manual intervention. You
must clean up manually before restarting the package. Use the following steps as guidelines:
1. Perform application specific cleanup. Any application specific actions the control script might
have taken should be undone to ensure successfully starting the package on an alternate node.
This might include such things as shutting down application processes, removing lock files,
and removing temporary files.
2. Ensure that package IP addresses are removed from the system. This step is accomplished via
the cmmodnet(1m) command. First determine which package IP addresses are installed by
inspecting the output resulting from running the ifconfig command. If any of the IP addresses
8.8 Solving Problems 251