Managing HP Serviceguard A.11.20.20 for Linux, March 2014

NOTE: The failover estimates provided here apply to the
Serviceguard component of failover; that is, the package is
expected to be up and running on the adoptive node in this
time, but the application that the package runs may take
more time to start.
For most clusters that use a lock LUN, a minimum
MEMBER_TIMEOUT of 14 seconds is appropriate.
For most clusters that use a MEMBER_TIMEOUT value lower
than 14 seconds, a quorum server is more appropriate than
a lock LUN. The cluster will fail if the time it takes to acquire
the disk lock exceeds 0.2 times the MEMBER_TIMEOUT. This
means that if you use a disk-based quorum device (lock
LUN), you must be certain that the nodes in the cluster, the
connection to the disk, and the disk itself can respond
quickly enough to perform 10 disk writes within 0.2 times
the MEMBER_TIMEOUT.
Keep the following guidelines in mind when deciding how
to set the value.
Guidelines: You need to decide whether it's more important
for your installation to have fewer (but slower) cluster
re-formations, or faster (but possibly more frequent)
re-formations:
To ensure the fastest cluster re-formations, use the
minimum value applicable to your cluster. But keep in
mind that this setting will lead to a cluster re-formation,
and to the node being removed from the cluster and
rebooted, if a system hang or network load spike
prevents the node from sending a heartbeat signal
within the MEMBER_TIMEOUT value. More than one
node could be affected if, for example, a network event
such as a broadcast storm caused kernel interrupts to
be turned off on some or all nodes while the packets
are being processed, preventing the nodes from
sending and processing heartbeat messages.
See “Cluster Re-formations Caused by
MEMBER_TIMEOUT Being Set too Low” (page 262) for
troubleshooting information.
For fewer re-formations, use a setting in the range of
10 to 25 seconds (10,000,000 to 25,000,000
microseconds), keeping in mind that a value larger than
the default will lead to slower re-formations than the
default. A value in this range is appropriate for most
installations
See also “What Happens when a Node Times Out”
(page 75), “Cluster Daemon: cmcld” (page 34), and the
white paper Optimizing Failover Time in a Serviceguard
Environment (version A.11.19 and later) at http://
www.hp.com/go/linux-serviceguard-docs.
Can be changed while the cluster is running.
4.7 Cluster Configuration Planning 99