Managing HP Serviceguard for Linux, Tenth Edition, September 2012

group vg02 and the Package2 IP address) as quickly as possible, SystemB halts

(system reset).

NOTE: If AUTOSTART_CMCLD in /etc/rc.config.d/cmcluster ($SGAUTOSTART)

is set to zero, the node will not attempt to join the cluster when it comes back up.

For more information on cluster failover, see the white paper Optimizing Failover Time

in a Serviceguard Environment (version A.11.19 and later) at http://

www.docs.hp.com -> High Availability -> Serviceguard -> White

Papers. For troubleshooting information, see “Cluster Re-formations Caused by

MEMBER_TIMEOUT Being Set too Low” (page 300).

Responses to Hardware Failures

If a serious system problem occurs, such as a system panic or physical disruption of the

SPU's circuits, Serviceguard recognizes a node failure and transfers the packages currently

running on that node to an adoptive node elsewhere in the cluster. (System multi-node

and multi-node packages do not fail over.)

The new location for each package is determined by that package's configuration file,

which lists primary and alternate nodes for the package. Transfer of a package to another

node does not transfer the program counter. Processes in a transferred package will

restart from the beginning. In order for an application to be expeditiously restarted after

a failure, it must be “crash-tolerant”; that is, all processes in the package must be written

so that they can detect such a restart. This is the same application design required for

restart after a normal system crash.

In the event of a LAN interface failure, bonding provides a backup path for IP messages.

If a heartbeat LAN interface fails and no redundant heartbeat is configured, the node

fails with a reboot. If a monitored data LAN interface fails, the node fails with a reboot

only if node_fail_fast_enabled (described further under “Configuring a Package:

Next Steps” (page 154)) is set to yes for the package. Otherwise any packages using

that LAN interface will be halted and moved to another node if possible (unless the LAN

recovers immediately; see “When a Service or Subnet Fails, or a Dependency is Not

Met” (page 64)).

Disk monitoring provides additional protection. You can configure packages to be

dependent on the health of disks, so that when a disk monitor reports a problem, the

package can fail over to another node. See “Creating a Disk Monitor Configuration”

(page 231).

Serviceguard does not respond directly to power failures, although a loss of power to

an individual cluster component may appear to Serviceguard like the failure of that

component, and will result in the appropriate switching behavior. Power protection is

provided by HP-supported uninterruptible power supplies (UPS).

88 Understanding Serviceguard Software Components