Managing HP Serviceguard A.11.20.10 for Linux, December 2012

A node halts because of a package failure.
A node halts because of a service failure.
Heavy network traffic prohibited the heartbeat signal from being received by the cluster.
The heartbeat network failed, and another network is not configured to carry heartbeat.
Typically, re-formation results in a cluster with a different composition. The new cluster may contain
fewer or more nodes than in the previous incarnation of the cluster.
3.2.6 Cluster Quorum to Prevent Split-Brain Syndrome
In general, the algorithm for cluster re-formation requires a cluster quorum of a strict majority (that
is, more than 50%) of the nodes previously running. If both halves (exactly 50%) of a previously
running cluster were allowed to re-form, there would be a split-brain situation in which two instances
of the same cluster were running. In a split-brain scenario, different incarnations of an application
could end up simultaneously accessing the same disks. One incarnation might well be initiating
recovery activity while the other is modifying the state of the disks. Serviceguard’s quorum
requirement is designed to prevent a split-brain situation.
3.2.7 Cluster Lock
Although a cluster quorum of more than 50% is generally required, exactly 50% of the previously
running nodes may re-form as a new cluster provided that the other 50% of the previously running
nodes do not also re-form. This is guaranteed by the use of a tie-breaker to choose between the
two equal-sized node groups, allowing one group to form the cluster and forcing the other group
to shut down. This tie-breaker is known as a cluster lock. The cluster lock is implemented either by
means of a lock LUN or a quorum server. A cluster lock is required on two-node clusters.
The cluster lock is used as a tie-breaker only for situations in which a running cluster fails and, as
Serviceguard attempts to form a new cluster, the cluster is split into two sub-clusters of equal size.
Each sub-cluster will attempt to acquire the cluster lock. The sub-cluster which gets the cluster lock
will form the new cluster, preventing the possibility of two sub-clusters running at the same time. If
the two sub-clusters are of unequal size, the sub-cluster with greater than 50% of the nodes will
form the new cluster, and the cluster lock is not used.
If you have a two-node cluster, you are required to configure a cluster lock. If communications are
lost between these two nodes, the node that obtains the cluster lock will take over the cluster and
the other node will halt (system reset). Without a cluster lock, a failure of either node in the cluster
will cause the other node, and therefore the cluster, to halt. Note also that if the cluster lock fails
during an attempt to acquire it, the cluster will halt.
3.2.8 Use of a Lock LUN as the Cluster Lock
A lock LUN can be used for clusters up to and including four nodes in size. The cluster lock LUN
is a special piece of storage (known as a partition) that is shareable by all nodes in the cluster.
When a node obtains the cluster lock, this partition is marked so that other nodes will recognize
the lock as “taken.
NOTE: The lock LUN is dedicated for use as the cluster lock, and, in addition, HP recommends
that this LUN comprise the entire disk; that is, the partition should take up the entire disk.
The complete path name of the lock LUN is identified in the cluster configuration file.
The operation of the lock LUN is shown in Figure 7.
38 Understanding Serviceguard Software Components