HP Serviceguard Cluster Configuration for HP-UX 11i or Linux Partitioned Systems, April 2009

5
After a failure that results in loss of communication between the nodes, active cluster nodes execute a
cluster re-formation algorithm that is used to determine the new cluster quorum. This new quorum, in
conjunction with the previous quorum, is used to determine which nodes remain in the new active
cluster.
The algorithm for cluster re-formation generally requires a cluster quorum of a strict majoritymore
than 50% of the nodes that were previously running. However, exactly 50% of the previously running
nodes are allowed to re-form as a new cluster, provided there is a guarantee that the other 50% of
the previously running nodes do not also re-form. In these cases, some form of quorum arbitration or
tie-breaker is needed. For example, if there is a communication failure between the nodes in a two-
node cluster and each node is attempting to re-form the cluster, Serviceguard must only allow one
node to form the new cluster. This is accomplished by configuring a cluster lock or quorum service.
The important concept to note here is that if more than 50% of the nodes in the cluster fail at the same
time, the remaining nodes have insufficient quorum to form a new cluster and fail themselves. This is
irrespective of whether or not a cluster lock has been configured. It is for this reason that cluster
configuration must be carefully analyzed to prevent failure modes that are common among the cluster
nodes. One example of this concern is the power circuit considerations that are documented in HP
9000 Enterprise Servers Configuration Guide, Chapter 6 and in the Serviceguard for Linux Order
and Configuration Guide (for details contact your HP Sales Representative). Another area where it is
possible to have a greater than 50% node failure is in the use of partitioned systems within the cluster.
Configuration considerations for preventing this situation are described in the section “Partition
Interactions.”
Quorum arbitration
Should two equal-sized groups of nodes (exactly 50% of the cluster in each group) become separated
from each other, quorum arbitration allows one group to achieve quorum and form the cluster, while
the other group is denied quorum and cannot start a cluster. This prevents the possibility of split-brain
activitytwo sub-clusters running at the same time. If the two sub-clusters are of unequal size, the sub-
cluster with greater than 50% of the previous quorum forms the new cluster and the cluster lock is not
used.
For obvious reasons, two-node cluster configurations are required to configure some type of quorum
arbitration. By definition, failure of a node or loss of communication in a two-node cluster results in a
50% partition. Due to the assumption that nodes fail independently of each other (independent failure
assumption), the use of quorum arbitration for cluster configurations with three or more nodes is
optional, though highly recommended.
There are several techniques for providing quorum arbitration in Serviceguard clusters:
On HP-UX 11i v2 and HP-UX 11i v3 through a cluster lock disk which must be accessed during the
arbitration process. The cluster lock disk is a disk area located in a volume group that is shared by
all nodes in the cluster. Each sub-cluster attempts to acquire the cluster lock. The sub-cluster that gets
the cluster lock forms the new cluster and the nodes that were unable to get the lock cease activity.
A cluster lock disk can be used in Serviceguard clusters of up to four nodes.
On Linux, HP-UX 11iv2 and HP-UX 11i v3 through a Lock LUN which must be accessed during the
arbitration process. The Lock LUN is a logical Unit, usually a “disk” defined in an Array that is
shared by all nodes in the cluster. Each sub-cluster attempts to acquire the Lock LUN. The sub-cluster
that gets the Lock LUN forms the new cluster and the nodes that were unable to get the lock cease
activity. A Lock LUN can be used in Linux Serviceguard clusters of up to four nodes.
Through an arbitrator node which provides tie breaking when an entire site fails, as in a disaster
scenario. An arbitrator node is a cluster member typically located in a separate data center. Its
main function is to increase the Serviceguard cluster size so that an equal partition of nodes is
unlikely between production data centers. This can be used in Serviceguard clusters running HP-UX
or Linux.