Understanding and Designing Serviceguard Disaster Recovery Architectures

Arbitrator Nodes

Arbitrator nodes is one of the arbitration mechanisms available in Serviceguard. A network split

in a four-node cluster can result in two equal-sized partitions, but in a five-node cluster it cannot.

The fifth node in the cluster, acts as the arbitrator by virtue of the fact that it makes the number of

nodes in the cluster odd. This kind of arbitration is especially useful when nodes in the cluster are

separated by significant distances, as in extended distance clusters or metropolitan clusters.

Arbitrator nodes may be configured to run non-clustered applications, or they can be set up purely

as arbitrators, with no other applications running other than Serviceguard.

Quorum Server

A quorum server, located in a third data center, is another arbitration mechanism. The quorum

server process runs on a machine outside the cluster. The quorum server listens to connection

requests from the Serviceguard nodes on a known port. The server maintains a special area in

memory for each cluster, and when a node obtains the cluster lock, this area is marked so that

other nodes recognize the lock as “taken”. When there is a network split that divides the cluster

nodes into two equal sets, the quorum server allows the set with the node that has acquired the

lock to form the cluster. The other set of nodes halt. This way the quorum server can arbitrate cluster

re-formation.

An advantage of the quorum server is that additional cluster nodes need not be configured for

arbitration. A single quorum server can be used for multiple Serviceguard clusters.

General Requirements

NOTE: The maximum supported distance between Metrocluster nodes depends on the distance

achievable while ensuring the replication technology latency requirements and the Serviceguard

network latency requirements.

• In the disaster recovery architecture, it is expected that each data center is self-contained such

that the loss of one data center does not cause the entire cluster to fail. It is important to

eliminate all single points of failure (SPOF) so that surviving systems can continue to run if one

or more systems fail.

• The networks between the data centers must be redundant and routed in such a way that the

loss of any one data center does not cause the failure of the network between surviving data

centers.

• Exclusive volume group activation must be used for all Volume Groups (VG) associated with

packages that use disks managed by LVM in a Metrocluster environment. According to the

design of Metrocluster, only one system in the cluster will have a VG activated at any time.

Types of Configuration

HP Metrocluster supports the following configurations:

• Two data centers and a third location architecture with one or two arbitrator systems

• Two data centers and a third location architecture with a quorum server system.

Two Data Centers and Third Location with Arbitrators or a Quorum Server System

This is the recommended and supported disaster recovery architecture for use with Metrocluster.

This architecture consists of two main data centers with an equal number of nodes and a third

location with one or more arbitrator nodes or a quorum server node.

22 Metrocluster and Continentalclusters