Understanding and Designing Serviceguard Disaster Recovery Architectures

Figure 9 Two Data Centers and Third Location with Arbitrators
Highly Available Network
Highly Available Network
Arbitrator 1
Arbitrator 2
Robust Data
Replication
Data Center A
Data Center B
Arbitrators
Third Location
Node 3
Node 4
Node 1
Node 2
pkg A pkg B
pkg C pkg D
Terms and Concepts
Arbitration
When the cluster is part of a disaster recovery solution that has nodes located in more than one
data center, loss of communication can easily occur unless redundant networking is implemented
with different routing for the redundant links. A network split can result in a cluster reformation
such that there are two sets of running nodes, and each set attempts to form a cluster. Now, if both
sets are allowed to re-form the cluster, there will be two instances of the same cluster running in
two locations. In this situation, the same application might start up in two different places and make
the data inconsistent. This is an example of data corruption. The mechanism to avoid the formation
of multiple clusters in a network split is called arbitration.
Cluster Quorum
When a cluster initially forms, all systems must be available to form the cluster (100% Quorum
requirement). A quorum is dynamic and is recomputed after each system failure. For instance, if
you start out with an eight-node cluster and two systems fail, six nodes remain or a 75% quorum.
The cluster size is reset to six nodes. If two more nodes fail, therefore four nodes remain, quorum
is 67%. Each time a cluster forms, there must be more than 50% quorum to reform the cluster.
When cluster quorum is exactly 50%, an arbitrations mechanism is required to act as a tie-breaker.
Understanding Metrocluster 21