HP Serviceguard Extended Distance Cluster for Linux A.01.01 Deployment Guide, Third Edition, May 2008

Disaster Tolerance and Recovery in a Serviceguard Cluster

Understanding Types of Disaster Tolerant Clusters

Chapter 124

• Disk resynchronization is independent of CPU failure (that is, if the

hosts at the primary site fail but the disk remains up, the disk knows

it does not have to be resynchronized).

Differences Between Extended Distance Cluster and CLX

The major differences between an Extended Distance Cluster and a CLX

cluster are:

• The methods used to replicate data between the storage devices in

the two data centers. The two basic methods available for replicating

data between the data centers for Linux clusters are either

host-based or storage array-based. Extended Distance Cluster

always uses host-based replication (MD mirroring on Linux). Any

(mix of) Serviceguard supported Fibre Channel storage can be

implemented in an Extended Distance Cluster. CLX always uses

array-based replication/mirroring, and requires storage from the

same vendor in both data centers (that is, a pair of XPs with

Continuous Access, or a pair of EVAs with Continuous Access).

• Data centers in an Extended Distance Cluster can span up to 100km,

whereas the distance between data centers in a Metrocluster is

defined by the shortest of the following distances:

— Maximum distance that guarantees a network latency of no more

than 200ms

— Maximum distance supported by the data replication link

— Maximum supported distance for DWDM as stated by the

provider

• In an Extended Distance Cluster, there is no built-in mechanism for

determining the state of the data being replicated. When an

application fails over from one data center to another, the package is

allowed to start up if the volume group(s) can be activated. A CLX

implementation provides a higher degree of data integrity; that is,

the application is only allowed to start up based on the state of the

data and the disk arrays.

It is possible for data to be updated on the disk system local to a

server running a package without remote data being updated. This

happens if the data link between sites is lost, usually as a precursor

to a site going down. If that occurs and the site with the latest data

then goes down, that data is lost. The period of time from the link

lost to the site going down is called the "recovery point". An