Veritas Storage Foundation 5.1 SP1 for Oracle RAC Administrator"s Guide (5900-1512, April 2011)

Table 3-20
Fencing startup issues on SF Oracle RAC cluster (client cluster)
nodes (continued)
Description and resolutionIssue
Assume the following situations to understand preexisting split-brain in server-based
fencing:
There are three CP servers acting as coordination points. One of the three CP servers
then becomes inaccessible. While in this state, also one client node leaves the cluster.
When the inaccessible CP server restarts, it has a stale registration from the node
which left the SF Oracle RAC cluster. In this case, no new nodes can join the cluster.
Each node that attempts to join the cluster gets a list of registrations from the CP
server. One CP server includes an extra registration (of the node which left earlier).
This makes the joiner node conclude that there exists a preexisting split-brain between
the joiner node and the node which is represented by the stale registration.
All the client nodes have crashed simultaneously, due to which fencing keys are not
cleared from the CP servers. Consequently, when the nodes restart, the vxfen
configuration fails reporting preexisting split brain.
These situations are similar to that of preexisting split-brain with coordinator disks, where
the problem is solved by the administrator running the vxfenclearpre command. A
similar solution is required in server-based fencing using the cpsadm command.
Run the cpsadm command to clear a registration on a CP server:
# cpsadm -s cp_server -a unreg_node
-c cluster_name -n nodeid
where cp_server is the virtual IP address or virtual hostname on which the CP server is
listening, cluster_name is the VCS name for the SF Oracle RAC cluster, and nodeid specifies
the node id of SF Oracle RAC cluster node. Ensure that fencing is not already running on
a node before clearing its registration on the CP server.
After removing all stale registrations, the joiner node will be able to join the cluster.
Preexisting split-brain
Issues during online migration of coordination points
During online migration of coordination points using the vxfenswap utility, the
operation is automatically rolled back if a failure is encountered during validation
of coordination points from all the cluster nodes.
Validation failure of the new set of coordination points can occur in the following
circumstances:
The /etc/vxfenmode file is not updated on all the SF Oracle RAC cluster nodes,
because new coordination points on the node were being picked up from an
old /etc/vxfenmode file.
Troubleshooting SF Oracle RAC
Troubleshooting I/O fencing
214