User's Guide

NOTE: Continentalclusters monitors cluster status, but not package status.
7. View the status of the Continentalclusters.
# cmviewconcl
Switching to the Recovery Packages in Case of Disaster
Once the clusters are configured and tested, packages will be able to fail over to an alternate
node in another data center and still have access to the data they need to function. The primary
steps for failing over a package are:
1. Receive notification that a monitored cluster is unavailable.
2. Verify that it is necessary and safe to start the recovery packages.
3. Use the recovery command to stop data replication and start recovery packages.
4. View the status of the Continentalclusters.
# cmviewconcl
It is important to have a well-defined recovery process, and that all members at both sites are
educated on how to use this process.
Receiving Notification
Once the monitor is started, as described in “Starting the Continentalclusters Monitor Package
(page 89), the monitor will send notifications as configured. The following types of notifications
are generated as configured in cmclconf.ascii:
CLUSTER_ALERT is a change in the status of a cluster. Recovery via the cmrecovercl
command is not enabled by default. This should be treated as information that the cluster
either may be developing a problem or may be recovering from a problem.
CLUSTER_ALARM is a change in the status of a cluster that indicates that the cluster has been
unavailable for an unacceptable amount of time. Recovery via the cmrecovercl command
is enabled.
The issuing of notifications takes place at the timing intervals specified for each cluster event.
However, it sometimes may appear that an alert or alarm takes longer than configured. Keep in
mind that if several changes of cluster state (for example, Down to Error to Unreachable to Down)
take place in a smaller time than the configured interval for an alert or alarm, the timer is reset to
0 after each change of state; thus, the time to the alert or alarm will be the configured interval plus
the time used by all the earlier state changes.
NOTE: The cmrecovercl command is fully enabled only after a CLUSTER_ALARM is issued;
however, the command may be used with the -f option when a CLUSTER_ALERT has been
issued.
Verifying that Recovery is Needed
It is important to follow the established protocol for coordinating with the remote site to determine
whether moving the package is required. This includes initiating person-to-person communication
between sites. For example, it may be possible that the WAN network failed, causing the cluster
alarm.
Some network failures, such as those that prevent clients from using the application, may require
recovery. Other network failures, such as those that only prevent the two clusters from
communicating, may not require recovery. Following an established protocol for communicating
with the remote site would verify this. See Figure 20 (page 91) for an example of a recovery
checklist.
94 Designing Continentalclusters