Managing HP Serviceguard for Linux, Seventh Edition, July 2007

Troubleshooting Your Cluster
Solving Problems
Chapter 8 305
Package Movement Errors
These errors are similar to the system administration errors except they
are caused specifically by errors in the package control script. The best
way to prevent these errors is to test your package control script before
putting your high availability application on line.
Adding a “set -x” statement in the second line of your control script will
give you details on where your script may be failing.
Node and Network Failures
These failures cause Serviceguard to transfer control of a package to
another node. This is the normal action of Serviceguard, but you have to
be able to recognize when a transfer has taken place and decide to leave
the cluster in its current condition or to restore it to its original
condition.
Possible node failures can be caused by the following conditions:
reboot
Kernel Oops
•Hangs
Power failures
You can use the following commands to check the status of your network
and subnets:
ifconfig - to display LAN status and check to see if the package IP
is stacked on the LAN card.
arp -a - to check the arp tables.
Since your cluster is unique, there are no cookbook solutions to all
possible problems. But if you apply these checks and commands and
work your way through the log files, you will be successful in identifying
and solving problems.