Managing HP Serviceguard A.11.20.20 for Linux, March 2014

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux Cluster

100

NOTE: The failover estimates provided here apply to the

Serviceguard component of failover; that is, the package is

expected to be up and running on the adoptive node in this

time, but the application that the package runs may take

more time to start.

For most clusters that use a lock LUN, a minimum

MEMBER_TIMEOUT of 14 seconds is appropriate.

For most clusters that use a MEMBER_TIMEOUT value lower

than 14 seconds, a quorum server is more appropriate than

a lock LUN. The cluster will fail if the time it takes to acquire

the disk lock exceeds 0.2 times the MEMBER_TIMEOUT. This

means that if you use a disk-based quorum device (lock

LUN), you must be certain that the nodes in the cluster, the

connection to the disk, and the disk itself can respond

quickly enough to perform 10 disk writes within 0.2 times

the MEMBER_TIMEOUT.

Keep the following guidelines in mind when deciding how

to set the value.

Guidelines: You need to decide whether it's more important

for your installation to have fewer (but slower) cluster

re-formations, or faster (but possibly more frequent)

re-formations:

• To ensure the fastest cluster re-formations, use the

minimum value applicable to your cluster. But keep in

mind that this setting will lead to a cluster re-formation,

and to the node being removed from the cluster and

rebooted, if a system hang or network load spike

prevents the node from sending a heartbeat signal

within the MEMBER_TIMEOUT value. More than one

node could be affected if, for example, a network event

such as a broadcast storm caused kernel interrupts to

be turned off on some or all nodes while the packets

are being processed, preventing the nodes from

sending and processing heartbeat messages.

See “Cluster Re-formations Caused by

MEMBER_TIMEOUT Being Set too Low” (page 262) for

troubleshooting information.

• For fewer re-formations, use a setting in the range of

10 to 25 seconds (10,000,000 to 25,000,000

microseconds), keeping in mind that a value larger than

the default will lead to slower re-formations than the

default. A value in this range is appropriate for most

installations

See also “What Happens when a Node Times Out”

(page 75), “Cluster Daemon: cmcld” (page 34), and the

white paper Optimizing Failover Time in a Serviceguard

Environment (version A.11.19 and later) at http://

www.hp.com/go/linux-serviceguard-docs.

Can be changed while the cluster is running.

4.7 Cluster Configuration Planning 99