Managing HP Serviceguard for Linux, Tenth Edition, September 2012

Typically, re-formation results in a cluster with a different composition. The new cluster
may contain fewer or more nodes than in the previous incarnation of the cluster.
Cluster Quorum to Prevent Split-Brain Syndrome
In general, the algorithm for cluster re-formation requires a cluster quorum of a strict
majority (that is, more than 50%) of the nodes previously running. If both halves (exactly
50%) of a previously running cluster were allowed to re-form, there would be a split-brain
situation in which two instances of the same cluster were running. In a split-brain scenario,
different incarnations of an application could end up simultaneously accessing the same
disks. One incarnation might well be initiating recovery activity while the other is
modifying the state of the disks. Serviceguard’s quorum requirement is designed to prevent
a split-brain situation.
Cluster Lock
Although a cluster quorum of more than 50% is generally required, exactly 50% of the
previously running nodes may re-form as a new cluster provided that the other 50% of
the previously running nodes do not also re-form. This is guaranteed by the use of a
tie-breaker to choose between the two equal-sized node groups, allowing one group to
form the cluster and forcing the other group to shut down. This tie-breaker is known as
a cluster lock. The cluster lock is implemented either by means of a lock LUN or a quorum
server. A cluster lock is required on two-node clusters.
The cluster lock is used as a tie-breaker only for situations in which a running cluster fails
and, as Serviceguard attempts to form a new cluster, the cluster is split into two sub-clusters
of equal size. Each sub-cluster will attempt to acquire the cluster lock. The sub-cluster
which gets the cluster lock will form the new cluster, preventing the possibility of two
sub-clusters running at the same time. If the two sub-clusters are of unequal size, the
sub-cluster with greater than 50% of the nodes will form the new cluster, and the cluster
lock is not used.
If you have a two-node cluster, you are required to configure a cluster lock. If
communications are lost between these two nodes, the node that obtains the cluster lock
will take over the cluster and the other node will halt (system reset). Without a cluster
lock, a failure of either node in the cluster will cause the other node, and therefore the
cluster, to halt. Note also that if the cluster lock fails during an attempt to acquire it, the
cluster will halt.
Use of a Lock LUN as the Cluster Lock
A lock LUN can be used for clusters up to and including four nodes in size. The cluster
lock LUN is a special piece of storage (known as a partition) that is shareable by all
nodes in the cluster. When a node obtains the cluster lock, this partition is marked so
that other nodes will recognize the lock as “taken.
40 Understanding Serviceguard Software Components