Understanding and Designing Serviceguard Disaster Recovery Architectures

Arbitrator Nodes
Arbitrator nodes is one of the arbitration mechanisms available in Serviceguard. A network split
in a four-node cluster can result in two equal-sized partitions, but in a five-node cluster it cannot.
The fifth node in the cluster, acts as the arbitrator by virtue of the fact that it makes the number of
nodes in the cluster odd. This kind of arbitration is especially useful when nodes in the cluster are
separated by significant distances, as in extended distance clusters or metropolitan clusters.
Arbitrator nodes may be configured to run non-clustered applications, or they can be set up purely
as arbitrators, with no other applications running other than Serviceguard.
Quorum Server
A quorum server, located in a third data center, is another arbitration mechanism. The quorum
server process runs on a machine outside the cluster. The quorum server listens to connection
requests from the Serviceguard nodes on a known port. The server maintains a special area in
memory for each cluster, and when a node obtains the cluster lock, this area is marked so that
other nodes recognize the lock as “taken. When there is a network split that divides the cluster
nodes into two equal sets, the quorum server allows the set with the node that has acquired the
lock to form the cluster. The other set of nodes halt. This way the quorum server can arbitrate cluster
re-formation.
An advantage of the quorum server is that additional cluster nodes need not be configured for
arbitration. A single quorum server can be used for multiple Serviceguard clusters.
General Requirements
NOTE: The maximum supported distance between Metrocluster nodes depends on the distance
achievable while ensuring the replication technology latency requirements and the Serviceguard
network latency requirements.
In the disaster recovery architecture, it is expected that each data center is self-contained such
that the loss of one data center does not cause the entire cluster to fail. It is important to
eliminate all single points of failure (SPOF) so that surviving systems can continue to run if one
or more systems fail.
The networks between the data centers must be redundant and routed in such a way that the
loss of any one data center does not cause the failure of the network between surviving data
centers.
Exclusive volume group activation must be used for all Volume Groups (VG) associated with
packages that use disks managed by LVM in a Metrocluster environment. According to the
design of Metrocluster, only one system in the cluster will have a VG activated at any time.
Types of Configuration
HP Metrocluster supports the following configurations:
Two data centers and a third location architecture with one or two arbitrator systems
Two data centers and a third location architecture with a quorum server system.
Two Data Centers and Third Location with Arbitrators or a Quorum Server System
This is the recommended and supported disaster recovery architecture for use with Metrocluster.
This architecture consists of two main data centers with an equal number of nodes and a third
location with one or more arbitrator nodes or a quorum server node.
22 Metrocluster and Continentalclusters