Managing HP Serviceguard for Linux Ninth Edition, April 2009

Heartbeat Messages

Central to the operation of the cluster manager is the sending and receiving of heartbeat

messages among the nodes in the cluster. Each node in the cluster exchanges UDP

heartbeat messages with every other node over each IP network configured as a

heartbeat device.

If a cluster node does not receive heartbeat messages from all other cluster nodes within

the prescribed time, a cluster re-formation is initiated; see “What Happens when a

Node Times Out” (page 88). At the end of the re-formation, if a new set of nodes form

a cluster, that information is passed to the package coordinator (described later in this

chapter, under “How the Package Manager Works” (page 49)). Failover packages that

were running on nodes that are no longer in the new cluster are transferred to their

adoptive nodes.

If heartbeat and data are sent over the same LAN subnet, data congestion may cause

Serviceguard to miss heartbeats and initiate a cluster re-formation that would not

otherwise have been needed. For this reason, HP recommends that you dedicate a LAN

for the heartbeat as well as configuring heartbeat over the data network.

Each node sends its heartbeat message at a rate calculated by Serviceguard on the basis

of the value of the MEMBER_TIMEOUT parameter, set in the cluster configuration

file, which you create as a part of cluster configuration.

IMPORTANT: When multiple heartbeats are configured, heartbeats are sent in parallel;

Serviceguard must receive at least one heartbeat to establish the health of a node. HP

recommends that you configure all subnets that interconnect cluster nodes as heartbeat

networks; this increases protection against multiple faults at no additional cost.

Heartbeat IP addresses must be on the same subnet on each node, but it is possible to

configure a cluster that spans subnets; see “Cross-Subnet Configurations” (page 32).

See HEARTBEAT_IP, under “Cluster Configuration Parameters ” (page 100), for more

information about heartbeat requirements. For timeout requirements and

recommendations, see the MEMBER_TIMEOUT parameter description in the same

section. For troubleshooting information, see “Cluster Re-formations Caused by

MEMBER_TIMEOUT Being Set too Low” (page 282). See also “Cluster Daemon: cmcld”

(page 39).

Manual Startup of Entire Cluster

A manual startup forms a cluster out of all the nodes in the cluster configuration.

Manual startup is normally done the first time you bring up the cluster, after

cluster-wide maintenance or upgrade, or after reconfiguration.

Before startup, the same binary cluster configuration file must exist on all nodes in the

cluster. The system administrator starts the cluster with the cmruncl command issued

from one node. The cmruncl command can only be used when the cluster is not

running, that is, when none of the nodes is running the cmcld daemon.

How the Cluster Manager Works 43