Designing Disaster Tolerant High Availability Clusters, 10th Edition, March 2003 (B7660-90013)

Building a Continental Cluster
Understanding Continental Cluster Concepts
Chapter 5 185
NOTE There is only one condition under which cmclsentryd will determine
that the cluster has Error status: all nodes are unreachable except those
which have ServiceGuard Error status. (If any nodes are Down or Up,
then the cluster status will take one of those values, rather than Error.)
Interpreting the Significance of Cluster Events
Because some cluster events (e.g., Up -> Unreachable) can be caused by
changes in either a cluster state or a WAN state, additional independent
information is required to achieve the primary objective of determining
whether you need to recover a clusters applications. Sources of
independent information include:
Contact with the WAN provider
Contact with the administrator of the monitored cluster
Contact with local cluster administrator
Contact with company executives
When worrisome cluster events persist, you obtain as much information
as possible, including authorization to recover, if your business practices
require this, and then issue the recovery command.
Unreachable -> Up Cluster nodes were rebooted
and the cluster started
WAN came up and the
cluster was already
running
Error -> Up Error resolved, cluster is up WAN problem was
fixed, cluster is up
Table 5-1 Monitored States and Possible Causes (Continued)
Cluster Event
(Old state -> New
state)
Cluster-related causes WAN-related causes