Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

• Notification that a cluster came down for any reason.

• Notification that a cluster has been in an unreachable state for a short period of time. An alert

is sent in this case as a warning that an alarm might be issued later if the cluster’s state remains

unreachable for a longer time.

The expected process in dealing with alerts is to continue watching for additional notifications and

to contact individuals at the site of the monitored cluster to see whether problems exist.

Alarms

Alarms are intended to indicate that a cluster failure might have taken place. The most common

example of an alarm is the following:

• Notification that a specified cluster has been in an unreachable state for a significant amount

of time.

The expected process in dealing with cluster events that persist at the alarm level is to obtain as

much information as possible, including authorization to recover, if your business practices require

this. At which point, issue the Continentalclusters recovery command, cmrecovercl.

Creating Notifications for Failure Events

For events that indicate potential cluster failure, display the escalation of concern of the cluster

health by defining alerts followed by one or more alarms. The following is a typical sequence:

• cluster alert at 5 minutes

• cluster alert at 10 minutes

• cluster alarm at 15 minutes

This could be accomplished by entering two CLUSTER_ALERT lines in the configuration file, and

one CLUSTER_ALARM line. A detailed example is provided in the comments in the ASCII

configuration file template, shown in “Editing Section 3—Monitoring Definitions” (page 82).

Creating Notifications for Events that Indicate a Return of Service

For those events that indicate that the cluster is back online or that communication with the monitor

has been restored, use cluster alerts to show the de-escalation of concern. In this case, use a

CLUSTER_ALERT line in the configuration file with a time of zero (0), so that notifications are sent

as soon as the return to service is detected.

Maintenance Mode for Recovery Groups

A recovery group in maintenance mode allows the recovery group to be exempted from a recovery.

This implies that the recovery package cannot be started in a recovery cluster. By default, all

recovery groups in the Continentalclusters configuration are not in the maintenance mode. To move

a recovery group in continentalclusters into the maintenance mode, you must disable it. To move

a recovery group out of the maintenance mode, you must enable it. You can complete rehearsal

operations on a recovery group only when the recovery group is in the maintenance mode. For

more information on rehearsal operations, see “Performing a Rehearsal Operation in your

Environment” (page 103).

Use the cmrecovercl -d -g command to move a recovery group into the maintenance mode.

To move the recovery group out of the maintenance mode, use the cmrecovercl -e -g

command.

42 Designing Continentalclusters