Managing Serviceguard A.11.20, March 2013
Manual Startup of Entire Cluster
A manual startup forms a cluster out of all the nodes in the cluster configuration. Manual startup
is normally done the first time you bring up the cluster, after cluster-wide maintenance or upgrade,
or after reconfiguration.
Before startup, the same binary cluster configuration file must exist on all nodes in the cluster. The
system administrator starts the cluster in Serviceguard Manager or with the cmruncl command
issued from one node. The cmruncl command can only be used when the cluster is not running,
that is, when none of the nodes is running the cmcld daemon.
During startup, the cluster manager software checks to see if all nodes specified in the startup
command are valid members of the cluster, are up and running, are attempting to form a cluster,
and can communicate with each other. If they can, then the cluster manager forms the cluster.
Automatic Cluster Startup
An automatic cluster startup occurs any time a node reboots and joins the cluster. This can follow
the reboot of an individual node, or it may be when all nodes in a cluster have failed, as when
there has been an extended power failure and all SPUs went down.
Automatic cluster startup will take place if the flag AUTOSTART_CMCLD is set to 1 in /etc/
rc.config.d/cmcluster. When any node reboots with this parameter set to 1, it will rejoin
an existing cluster, or if none exists it will attempt to form a new cluster.
Dynamic Cluster Re-formation
A dynamic re-formation is a temporary change in cluster membership that takes place as nodes
join or leave a running cluster. Re-formation differs from reconfiguration, which is a permanent
modification of the configuration files. Re-formation of the cluster occurs under the following
conditions (not a complete list):
• An SPU or network failure was detected on an active node.
• An inactive node wants to join the cluster. The cluster manager daemon has been started on
that node.
• A node has been added to or deleted from the cluster configuration.
• The system administrator halted a node.
• A node halts because of a package failure.
• A node halts because of a service failure.
• Heavy network traffic prohibited the heartbeat signal from being received by the cluster.
• The heartbeat network failed, and another network is not configured to carry heartbeat.
Typically, re-formation results in a cluster with a different composition. The new cluster may contain
fewer or more nodes than in the previous incarnation of the cluster.
Cluster Quorum to Prevent Split-Brain Syndrome
In general, the algorithm for cluster re-formation requires a cluster quorum of a strict majority (that
is, more than 50%) of the nodes previously running. If both halves (exactly 50%) of a previously
running cluster were allowed to re-form, there would be a split-brain situation in which two instances
of the same cluster were running. In a split-brain scenario, different incarnations of an application
could end up simultaneously accessing the same disks. One incarnation might well be initiating
recovery activity while the other is modifying the state of the disks. Serviceguard’s quorum
requirement is designed to prevent a split-brain situation.
46 Understanding Serviceguard Software Components










