Designing Disaster Tolerant High Availability Clusters, 10th Edition, March 2003 (B7660-90013)

Building a Continental Cluster

Understanding Continental Cluster Concepts

Chapter 5 183

• You verify that the monitored cluster has failed.

• You issue the cluster recovery command.

Monitoring over a Wide Area Network

A monitor package running on one cluster tracks the health another

cluster and sends notification to system administrators if the state of the

monitored cluster changes. (If a cluster contains any recovery packages it

must be monitored.) The monitor software polls the monitored cluster at

a specific MONITOR_INTERVAL defined in an ASCII configuration file,

which also indicates when and where to send messages if there is a state

change.

The physical separation between clusters will require communication by

way of a Wide Area Network (WAN). Since the polling takes place across

the WAN, interruptions of WAN service cannot always be differentiated

from cluster failure states. This means that if the WAN is unreliable, the

monitoring facility will often detect and report an unreachable state for

the monitored cluster that is actually an interruption of WAN service.

Because the monitoring is indeterminate in some instances, information

from independent sources must be gathered to determine the need for

proceeding with the recovery process. For these reasons, cluster recovery

is not automatic, but must be initiated by a root user. Once initiated,

however, the cluster recovery is automated to reduce the chance of

human error that might occur if manual steps were needed. In

ContinentalClusters, a system of cluster events and notifications is

provided so that events can be easily tracked, and so that users will

know when to seek additional information before initiating recovery.

Cluster Events

A cluster event is a change of state in a monitored cluster. The four

cluster states reported by the monitor are Unreachable, Down, Up,

and Error. Table 5-1 summarizes possible causes for the cluster events