Managing HP Serviceguard for Linux, Eighth Edition, March 2008

Understanding Serviceguard Software Components
How the Cluster Manager Works
Chapter 3 43
How the Cluster Manager Works
The cluster manager is used to initialize a cluster, to monitor the
health of the cluster, to recognize node failure if it should occur, and to
regulate the re-formation of the cluster when a node joins or leaves the
cluster. The cluster manager operates as a daemon process that runs on
each node. During cluster startup and re-formation activities, one node is
selected to act as the cluster coordinator. Although all nodes perform
some cluster management functions, the cluster coordinator is the
central point for inter-node communication.
Configuration of the Cluster
The system administrator sets up cluster configuration parameters and
does an initial cluster startup; thereafter, the cluster regulates itself
without manual intervention in normal operation. Configuration
parameters for the cluster include the cluster name and nodes,
networking parameters for the cluster heartbeat, cluster lock
information, and timing parameters (discussed in detail in the
“Planning” chapter). Cluster parameters are entered by editing the
cluster ASCII configuration file (details are given in Chapter 5). The
parameters you enter are used to build a binary configuration file which
is propagated to all nodes in the cluster. This binary cluster
configuration file must be the same on all the nodes in the cluster.
Heartbeat Messages
Central to the operation of the cluster manager is the sending and
receiving of heartbeat messages among the nodes in the cluster. Each
node in the cluster exchanges heartbeat messages with the cluster
coordinator over each TCP/IP network configured as a heartbeat device.
If a cluster node does not receive heartbeat messages from all other
cluster nodes within the prescribed time, a cluster re-formation is
initiated. At the end of the re-formation, if a new set of nodes form a
cluster, that information is passed to the package coordinator
(described further below, under “How the Package Manager Works”).
Failover packages which were running on nodes that are no longer in the
new cluster are transferred to their adoptive nodes. Note that if there is
a transitory loss of heartbeat, the cluster may re-form with the same