Managing HP Serviceguard for Linux, Tenth Edition, September 2012

These are errors caused specifically by errors in the cluster configuration file and package

configuration scripts. Examples of these errors include:

• Volume groups not defined on adoptive node.

• Mount point does not exist on adoptive node.

• Network errors on adoptive node (configuration errors).

• User information not correct on adoptive node.

You can use the following commands to check the status of your disks:

• df - to see if your package’s volume group is mounted.

• vgdisplay -v - to see if all volumes are present.

• strings /etc/lvmconf/*.conf - to ensure that the configuration is correct.

• fdisk -v /dev/sdx - to display information about a disk.

Package Control Script Hangs or Failures

When a RUN_SCRIPT_TIMEOUT or HALT_SCRIPT_TIMEOUT value is set, and the

control script hangs, causing the timeout to be exceeded, Serviceguard kills the script

and marks the package “Halted.” Similarly, when a package control script fails,

Serviceguard kills the script and marks the package “Halted.” In both cases, the following

also take place:

• Control of the package will not be transferred.

• The run or halt instructions may not run to completion.

• Global switching will be disabled.

• The current node will be disabled from running the package.

Following such a failure, since the control script is terminated, some of the package’s

resources may be left activated. Specifically:

• Volume groups may be left active.

• File systems may still be mounted.

• IP addresses may still be installed.

• Services may still be running.

In this kind of situation, Serviceguard will not restart the package without manual

intervention. You must clean up manually before restarting the package. Use the following

steps as guidelines:

1. Perform application specific cleanup. Any application specific actions the control

script might have taken should be undone to ensure successfully starting the package

302 Troubleshooting Your Cluster