Managing HP Serviceguard for Linux, Seventh Edition, July 2007

Understanding Serviceguard Software Components
Responses to Failures
Chapter 386
For more information on cluster failover, see the white paper Optimizing
Failover Time in a Serviceguard Environment at
http://www.docs.hp.com->High
Availability->Serviceguard->White Papers.
Responses to Hardware Failures
If a serious system problem occurs, such as a system panic or physical
disruption of the SPU's circuits, Serviceguard recognizes a node failure
and transfers the packages currently running on that node to an
adoptive node elsewhere in the cluster. (System multi-node and
multi-node packages do not fail over.)
The new location for each package is determined by that package's
configuration file, which lists primary and alternate nodes for the
package. Transfer of a package to another node does not transfer the
program counter. Processes in a transferred package will restart from
the beginning. In order for an application to be expeditiously restarted
after a failure, it must be “crash-tolerant”; that is, all processes in the
package must be written so that they can detect such a restart. This is
the same application design required for restart after a normal system
crash.
In the event of a LAN interface failure, bonding provides a backup path
for IP messages. If a heartbeat LAN interface fails and no redundant
heartbeat is configured, the node fails with a reboot. If a monitored data
LAN interface, the node fails with a reboot only if
node_fail_fast_enabled (described further under “Package
Configuration File Parameters” starting on page 127) is set to yes for the
package.
Disk monitoring provides additional protection. You can configure
packages to be dependent on the health of disks, so that when a disk
monitor reports a problem, the package can fail over to another node. See
“Creating a Disk Monitor Configuration” on page 228.
Serviceguard does not respond directly to power failures, although a loss
of power to an individual cluster component may appear to Serviceguard
like the failure of that component, and will result in the appropriate
switching behavior. Power protection is provided by HP-supported
uninterruptible power supplies (UPS).