Managing HP Serviceguard for Linux, Sixth Edition, August 2006

Planning and Documenting an HA Cluster
Cluster Configuration Planning
Chapter 4 99
Cluster Configuration Planning
A cluster should be designed to provide the quickest possible recovery
from failures. The actual time required to recover from a failure depends
on several factors:
The length of the cluster heartbeat interval and node timeout. They
should each be set as short as practical, but not shorter than 1000000
(one second) and 2000000 (two seconds), respectively; this is because
of the possibility of losing heartbeat messages in many
configurations. The recommended value for heartbeat interval is
1000000 (one second), and the recommended value for node timeout
is within the 5 to 8 second range (5000000 to 8000000).
The design of the run and halt instructions in the package control
script. They should be written for fast execution.
The application and database recovery time. They should be
designed for the shortest recovery time.
In addition, you must provide consistency across the cluster so that:
User names are the same on all nodes.
UIDs are the same on all nodes.
GIDs are the same on all nodes.
Applications in the system area are the same on all nodes.
System time is consistent across the cluster.
Files that could be used by more than one node, such as /usr or /opt
files, must be the same on all nodes.