HP Serviceguard Extended Distance Cluster for Linux A.01.01 Deployment Guide, Third Edition, May 2008

Disaster Tolerance and Recovery in a Serviceguard Cluster
Understanding Types of Disaster Tolerant Clusters
Chapter 124
Disk resynchronization is independent of CPU failure (that is, if the
hosts at the primary site fail but the disk remains up, the disk knows
it does not have to be resynchronized).
Differences Between Extended Distance Cluster and CLX
The major differences between an Extended Distance Cluster and a CLX
cluster are:
The methods used to replicate data between the storage devices in
the two data centers. The two basic methods available for replicating
data between the data centers for Linux clusters are either
host-based or storage array-based. Extended Distance Cluster
always uses host-based replication (MD mirroring on Linux). Any
(mix of) Serviceguard supported Fibre Channel storage can be
implemented in an Extended Distance Cluster. CLX always uses
array-based replication/mirroring, and requires storage from the
same vendor in both data centers (that is, a pair of XPs with
Continuous Access, or a pair of EVAs with Continuous Access).
Data centers in an Extended Distance Cluster can span up to 100km,
whereas the distance between data centers in a Metrocluster is
defined by the shortest of the following distances:
Maximum distance that guarantees a network latency of no more
than 200ms
Maximum distance supported by the data replication link
Maximum supported distance for DWDM as stated by the
provider
In an Extended Distance Cluster, there is no built-in mechanism for
determining the state of the data being replicated. When an
application fails over from one data center to another, the package is
allowed to start up if the volume group(s) can be activated. A CLX
implementation provides a higher degree of data integrity; that is,
the application is only allowed to start up based on the state of the
data and the disk arrays.
It is possible for data to be updated on the disk system local to a
server running a package without remote data being updated. This
happens if the data link between sites is lost, usually as a precursor
to a site going down. If that occurs and the site with the latest data
then goes down, that data is lost. The period of time from the link
lost to the site going down is called the "recovery point". An