Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

Notification that a cluster came down for any reason.
Notification that a cluster has been in an unreachable state for a short period of time. An alert
is sent in this case as a warning that an alarm might be issued later if the cluster’s state remains
unreachable for a longer time.
The expected process in dealing with alerts is to continue watching for additional notifications and
to contact individuals at the site of the monitored cluster to see whether problems exist.
Alarms
Alarms are intended to indicate that a cluster failure might have taken place. The most common
example of an alarm is the following:
Notification that a specified cluster has been in an unreachable state for a significant amount
of time.
The expected process in dealing with cluster events that persist at the alarm level is to obtain as
much information as possible, including authorization to recover, if your business practices require
this. At which point, issue the Continentalclusters recovery command, cmrecovercl.
Creating Notifications for Failure Events
For events that indicate potential cluster failure, display the escalation of concern of the cluster
health by defining alerts followed by one or more alarms. The following is a typical sequence:
cluster alert at 5 minutes
cluster alert at 10 minutes
cluster alarm at 15 minutes
This could be accomplished by entering two CLUSTER_ALERT lines in the configuration file, and
one CLUSTER_ALARM line. A detailed example is provided in the comments in the ASCII
configuration file template, shown in “Editing Section 3—Monitoring Definitions” (page 82).
Creating Notifications for Events that Indicate a Return of Service
For those events that indicate that the cluster is back online or that communication with the monitor
has been restored, use cluster alerts to show the de-escalation of concern. In this case, use a
CLUSTER_ALERT line in the configuration file with a time of zero (0), so that notifications are sent
as soon as the return to service is detected.
Maintenance Mode for Recovery Groups
A recovery group in maintenance mode allows the recovery group to be exempted from a recovery.
This implies that the recovery package cannot be started in a recovery cluster. By default, all
recovery groups in the Continentalclusters configuration are not in the maintenance mode. To move
a recovery group in continentalclusters into the maintenance mode, you must disable it. To move
a recovery group out of the maintenance mode, you must enable it. You can complete rehearsal
operations on a recovery group only when the recovery group is in the maintenance mode. For
more information on rehearsal operations, see “Performing a Rehearsal Operation in your
Environment” (page 103).
Use the cmrecovercl -d -g command to move a recovery group into the maintenance mode.
To move the recovery group out of the maintenance mode, use the cmrecovercl -e -g
command.
42 Designing Continentalclusters