Managing HP Serviceguard A.11.20.10 for Linux, December 2012

If service_fail_fast_enabled (page 178) is set to yes in the package configuration
file, Serviceguard will reboot the node if there is a failure of that specific service.
If node_fail_fast_enabled (page 171) is set to yes in the package configuration file,
and the package fails, Serviceguard will halt (reboot) the node on which the package is
running.
For more information, see “Package Configuration Planning (page 100) and Chapter 6 (page 163).
3.8.4 Responses to Package and Generic Resources Failures
In a package that is configured with a generic resource and is running, failure of a resource prompts
the Serviceguard Package Manager to take appropriate action based on the style of the package.
For failover packages, the package is halted on the node where the resource failure occurred and
started on an available alternative node. For multi-node packages, failure of a generic resources
causes the package to be halted only on the node where the failure occurred.
In case of simple resources, failure of a resource must trigger the monitoring script to set the
status of a resource to 'down' using the cmsetresource command.
In case of extended resources, the value fetched by the monitoring script can be set using the
cmsetresource command.
The Serviceguard Package Manager evaluates this value against the
generic_resource_up_criteria set for a resource in the packages where it is configured.
If the value that is set (current_value) does not satisfy the generic_resource_up_criteria,
then the generic resource is marked as 'down' on that node.
NOTE: If a simple resource is down on a particular node, it is down on that node for all the
packages using it whereas, in case of an extended resource the resource may be up on a node
for a particular package and down for another package, since it is dependent on the
generic_resource_up_criteria.
Additionally, in a running package configured with a generic resource:
Any failure of a generic resource of evaluation type "before_package_start" configured in a
package will not disable the node switching for the package.
Any failure of a generic resource of evaluation type "during_package_start" configured in a
package will disable the node switching for the package.
“Choosing Switching and Failover Behavior” (page 103) provides advice on choosing appropriate
failover behavior.
See “Parameters for Configuring Generic Resources” (page 103).
3.8.4.1 Service Restarts
You can allow a service to restart locally following a failure. To do this, you indicate a number of
restarts for each service in the package control script. When a service starts, the variable
service_restart is set in the service’s environment. The service, as it executes, can examine
this variable to see whether it has been restarted after a failure, and if so, it can take appropriate
action such as cleanup.
3.8.4.2 Network Communication Failure
An important element in the cluster is the health of the network itself. As it continuously monitors
the cluster, each node listens for heartbeat messages from the other nodes confirming that all nodes
are able to communicate with each other. If a node does not hear these messages within the
configured amount of time, a node timeout occurs, resulting in a cluster re-formation and later, if
there are still no heartbeat messages received, a reboot. See “What Happens when a Node Times
Out” (page 72)
74 Understanding Serviceguard Software Components