Designing Disaster Tolerant High Availability Clusters, 10th Edition, March 2003 (B7660-90013)

Building a Continental Cluster
Understanding Continental Cluster Concepts
Chapter 5 183
You verify that the monitored cluster has failed.
You issue the cluster recovery command.
Monitoring over a Wide Area Network
A monitor package running on one cluster tracks the health another
cluster and sends notification to system administrators if the state of the
monitored cluster changes. (If a cluster contains any recovery packages it
must be monitored.) The monitor software polls the monitored cluster at
a specific MONITOR_INTERVAL defined in an ASCII configuration file,
which also indicates when and where to send messages if there is a state
change.
The physical separation between clusters will require communication by
way of a Wide Area Network (WAN). Since the polling takes place across
the WAN, interruptions of WAN service cannot always be differentiated
from cluster failure states. This means that if the WAN is unreliable, the
monitoring facility will often detect and report an unreachable state for
the monitored cluster that is actually an interruption of WAN service.
Because the monitoring is indeterminate in some instances, information
from independent sources must be gathered to determine the need for
proceeding with the recovery process. For these reasons, cluster recovery
is not automatic, but must be initiated by a root user. Once initiated,
however, the cluster recovery is automated to reduce the chance of
human error that might occur if manual steps were needed. In
ContinentalClusters, a system of cluster events and notifications is
provided so that events can be easily tracked, and so that users will
know when to seek additional information before initiating recovery.
Cluster Events
A cluster event is a change of state in a monitored cluster. The four
cluster states reported by the monitor are Unreachable, Down, Up,
and Error. Table 5-1 summarizes possible causes for the cluster events