Managing HP Serviceguard for Linux, Eighth Edition, March 2008

Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4 117
There are more complex cases that require you to make
a trade-off between fewer failovers and faster failovers.
For example, a network event such as a broadcast
storm may cause kernel interrupts to be turned off on
some or all nodes while the packets are being
processed, preventing the nodes from sending and
processing heartbeat messages. This in turn could
prevent the kernel’s safety timer from being reset,
causing the node to halt. (See “Cluster Daemon: cmcld”
on page 39 for more information about the safety
timer.)
Can be changed while the cluster is running.
AUTO_START_TIMEOUT
The amount of time a node waits before it stops trying
to join a cluster during automatic cluster startup. All
nodes wait this amount of time for other nodes to begin
startup before the cluster completes the operation. The
time should be selected based on the slowest boot time
in the cluster. Enter a value equal to the boot time of
the slowest booting node minus the boot time of the
fastest booting node plus 600 seconds (ten minutes).
Default is 600,000,000 microseconds.
Can be changed while the cluster is running.
NETWORK_POLLING_INTERVAL
The frequency at which the networks configured for
Serviceguard are checked. In the ASCII cluster
configuration file, this parameter is
NETWORK_POLLING_INTERVAL.
Default is 2,000,000 microseconds in the ASCII file.
Thus every 2 seconds, the network manager polls each
network interface to make sure it can still send and
receive information. Changing this value can affect
how quickly a network failure is detected.
The minimum value is 1,000,000 (1 second). The
maximum value recommended is 15 seconds, and the
maximum value supported is 30 seconds.
Can be changed while the cluster is running