Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

Restoring Disaster Tolerance

After a failover to a cluster occurs, restoring disaster tolerance has many challenges, the most

significant of which are:

• Restoring the failed cluster.

Depending on the nature of the disaster it may be necessary to either create a new cluster or

to restore the cluster.

Before starting up the new or the failed cluster, make sure the AUTO_RUN flag for all of the

Continentalclusters application packages is disabled. This is to prevent starting the packages

unexpectedly with the cluster.

• Resynchronizing the data

To resynchronize the data, you either restore the data to the cluster and continue with the

same data replication procedure, or set up data replication to function in the other direction.

The following sections briefly outline some scenarios for restoring disaster tolerance.

Restore Clusters to their Original Roles

If the disaster did not destroy the cluster, there is the option to return both clusters in a recovery

pair to their original roles. To do this:

1. Make sure that both clusters are up and running, with the recovery packages continuing to

run on the surviving cluster.

2. Compare the clusters to make sure their configurations are consistent. Correct any

inconsistencies.

3. For each recovery group where the repaired cluster will run the primary package:

a. Synchronize the data from the disks on the surviving cluster to the disks on the repaired

cluster. This may be time-consuming.

b. Halt the recovered application on the surviving cluster if necessary, and start it on the

repaired cluster.

c. To keep application down time to a minimum, start the primary package on the cluster

before resynchronizing the data of the next recovery group.

4. View the status of the Continentalcluster.

# cmviewconcl

Primary Packages Remaining on the Surviving Cluster

Configure the failed cluster in a recovery pair as a recovery-only cluster and the surviving cluster

as a primary-only cluster. This minimizes the downtime involved with moving the applications back

to the restored cluster. It also assumes that the surviving cluster has sufficient resources to handle

running all critical applications indefinitely.

NOTE: In a multiple recovery pairs scenario, where more than one primary cluster are configured

to share the same recovery cluster, the following procedure to switch the role of the failed cluster

and the surviving cluster should not be used.

Use the following:

1. Halt the monitor packages. Issue the following command on each cluster:

# cmhaltpkg ccmonpkg

2. Edit the Continentalclusters ASCII configuration file. It is necessary to change the definitions

of monitoring clusters, and switch the names of primary and recovery packages in the definitions

of recovery groups. It may also be necessary to re-create data sender and data receiver

packages.

98 Designing Continentalclusters