Managing HP Serviceguard for Linux, Tenth Edition, September 2012

Package Movement Errors.
Node and Network Failures.
Quorum Server Messages.
Name Resolution Problems
Many Serviceguard commands, including cmviewcl, depend on name resolution services
to look up the addresses of cluster nodes. When name services are not available (for
example, if a name server is down), Serviceguard commands may hang, or may return
a network-related error message. If this happens, use the host command on each cluster
node to see whether name resolution is correct. For example:
host ftsys9
ftsys9.cup.hp.com has address 15.13.172.229
If the output of this command does not include the correct IP address of the node, then
check your name resolution services further.
Networking and Security Configuration Errors
In many cases, a symptom such as Permission denied... or Connection
refused... is the result of an error in the networking or security configuration. Most
such problems can be resolved by correcting the entries in /etc/hosts. See
“Configuring Name Resolution” (page 159) for more information.
Cluster Re-formations Caused by Temporary Conditions
You may see Serviceguard error messages, such as the following, which indicate that a
node is having problems:
Member node_name seems unhealthy, not receiving heartbeats from
it.
This may indicate a serious problem, such as a node failure, whose underlying cause is
probably a too-aggressive setting for the MEMBER_TIMEOUT parameter; see the next
section, “Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low. Or
it may be a transitory problem, such as excessive network traffic or system load.
What to do: If you find that cluster nodes are failing because of temporary network or
system-load problems (which in turn cause heartbeat messages to be delayed in network
or during processing), you should solve the networking or load problem if you can.
Failing that, you can increase the value of MEMBER_TIMEOUT, as described in the next
section.
Cluster Re-formations Caused by MEMBER_TIMEOUT Being Set too Low
If you have set the MEMBER_TIMEOUT parameter too low, the cluster demon, cmcld,
will write warnings to syslog that indicate the problem. There are three in particular
that you should watch for:
300 Troubleshooting Your Cluster