Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

ManualsBrandsHP ManualsSoftwareHP Serviceguard Metrocluster with EMC SRDF

100

7. If using physical data replication, do not resync from the recovery cluster to the primary cluster.

Instead, manually issue a command that will overwrite any changes on the recovery disk array

that may inadvertently have been made.

8. Start the package up on the primary cluster and allow connection to the application.

Testing Continentalclusters Operations

Use the following procedures to exercise typical Continentalclusters behaviors:

1. Halt both clusters in a recovery pair, then restart both clusters. The monitor packages on both

clusters should start automatically. The Continentalclusters packages (primary, data sender,

data receiver, and recovery) should not start automatically. Any other packages may

or may not start automatically, subject to their configuration.

NOTE: If an UP status is configured for a cluster, then an appropriate alert notification (email,

SNMP, etc.) should be received at the configured time interval from the node running the

monitor package on the other cluster. Due to delays in email or SNMP, the notifications may

arrive later than expected.

In addition to alerts/alarms sent using the mechanisms defined in the Continentalclusters

configuration file, they are also recorded in the file /var/opt/resmon/log/cc/eventlog

on the system reporting the event.

2. While the monitor package is running on a monitoring cluster, halt the monitored cluster

(cmhaltcl -f). An appropriate alert notification (email, SNMP, etc.) should be received at

the configured time interval from the node running the monitor package. Run cmrecovercl.

The command should fail. Additional notifications should be received at the configured time

intervals. After the alarm notification is received, run cmrecovercl. Any data receiver

packages on the monitoring cluster should halt and the recovery package(s) should start with

package switching enabled. Halt the recovery packages.

3. Test 2 should be rerun under a variety of conditions (and multiple conditions) such as the

following:

• Rebooting and powering off systems one at a time

• Rebooting and powering off all systems at the same time

Running the monitor package on each node in each cluster◦

◦ Disconnecting the WAN connection between the clusters

If physical data replication is used disconnect the physical replication links between the

disk arrays:

◦ Powering off the disk array at the primary site

◦ Powering off the disk array at the recovery site

• Testing cmrecovercl -f as well as cmrecovercl

Depending on the condition, the primary packages should be running to test real life failures

and recovery procedures.

4. After each scenario in tests 2-4, restore both clusters to their production state, restart the

primary package(s) (as well as any data sender and data receiver packages) and note any

issues, time delays, etc.

5. Halt the monitor package on one cluster. Halt the other cluster. No notifications are generated

that the other cluster has failed. What mechanism is available to the organization to monitor

the monitor?

6. Halt the packages on one cluster, but do not halt the cluster. No notifications are generated

that the packages on that cluster have failed. What mechanism is available to the organization

to monitor package status?

Testing the Continentalclusters 93