Understanding and Designing Serviceguard Disaster Recovery Architectures

NOTE: “Metrocluster and Continentalclusters” (page 20), provides an overview of HP’s
implementation of Metropolitan Cluster and Continental cluster while Chapter 3 provides an
overview of Extended Distance Clusters.
Disaster Recovery Architecture Guidelines
The disaster recovery architectures represent a shift away from the massive central data centers
and towards more distributed data processing facilities. While each architecture is different to suit
specific availability needs, there are a few basic guidelines to design a disaster recovery architecture
so that it protects against the loss of an entire data center:
Protect nodes through geographic dispersion
Protect data through replication
Use alternative power sources
Create highly available networks
These guidelines are in addition to the standard high-availability guidelines of redundant components
such as multipathing, network cards, power supplies, and disks.
Protecting Nodes through Geographic Dispersion
Redundant nodes in a disaster recovery architecture must be geographically dispersed. If they are
in the same data center, it is not a disaster recovery architecture. Figure 2 (page 8) shows a
cluster architecture with nodes in two data centers: A and B. If all nodes in data center A fail,
applications can fail over to the nodes in data center B and continue to provide service for clients.
The type of disaster you are protecting against and the available technology determines the location
of the nodes, the nodes can be as close as another room in the same building, or as far away as
another city. The minimum recommended dispersion is a single building with redundant nodes in
different data centers using different power sources. Specific architectures based on geographic
dispersion are discussed in “Understanding types of disaster recovery clusters” (page 8).
Protecting Data through Replication
During a disaster you might lose access to data or the data itself. You can protect against this loss
through data replication, that is, create extra copies of the data. During Data replication you must
verify the following:
Ensure data consistency by replicating data in a logical order so that it is immediately usable
or recoverable. Inconsistent data is unusable and is not recoverable for processing. Consistent
data might not be current.
Ensure data currency by replicating data quickly so that a replica of the data can be recovered
to include all committed disk writes that were applied to the local disks.
Ensure data recoverability so that there is some action that can be taken to make the data
consistent, such as applying logs or rolling a database.
Minimize data loss by configuring data replication to address consistency, currency, and
recoverability.
The advantages of the different data replication methods may vary with regards to data consistency
and currency. The type of data replication methods that you select to use is dependant on the type
of disaster recovery architecture you require.
Offline Data Replication
Offline data replication is the most common method used today. It involves two or more data
centers that store their data on tape and either send it to each other (via an express service, if need
Disaster Recovery Architecture Guidelines 9