Serviceguard Extended Distance Cluster (EDC) with VxVM/CVM Mirroring on HP-UX, May 2008

Depending on where the application ran prior to the site failure, the following behavior can be
expected when a site completely fails:
Applications which ran on the failing site are expected to fail over automatically to the remaining
site.
The applications that ran at the remaining site are expected to keep running without manual
intervention or interruption. A short pause or hang might be experienced.
A site failure like this leaves each volume with just one mirror (plex) available. Updates to the volumes
are tracked in the FastResync maps of the associated DCO volumes for the time the plexes are missing
from the failed site. The expected reaction to a site failure is the same regardless if the CVM master
node is located at the failed site or not.
Restoring a failed site
The tasks described in this section are examples of how to recover from a simulated site failure. These
failures are simulated, in that none of the cluster components were physically damaged or
experienced loss of configuration information. The recovery from a real site failure could involve
different steps depending on whether some of the devices were permanently damaged and had to be
replaced with new ones.
These are the basic steps of a site recovery in an SMS A.01.0x EDC on HP-UX 11i v2:
Power up/boot up all cluster components that were failed during the tests (nodes, arrays, network
and FC switches)
Validate connectivity (network and Fibre Channel) within the cluster
Make the LUNs visible to the OS by executing ioscan(1M), in case the node booted up before
all storage devices became visible. This step is unnecessary on HP-UX 11i v3 since the OS
automatically registers devices as they become available again.
Make the LUNs known to DMP and rebuild the device tree by running the vxdctl enable
command, in case the node booted up before all storage devices became visible.
Re-attach the LUNs that are reported with the status NODEVICE, by issuing the vxreattach
command. Specifying the “-r” option would automatically recover (resynchronize) the volumes if the
disk contains any. Otherwise, the recovery may be initiated individually with the vxrecover
command. The synchronization is incremental and the amount of synchronization needed depends
on the information in the Fast Resync maps.
VxVM/CVM 5.0 Tip:
The “site-awareness” feature allows site consistent detach and re-attach for
an entire disk group. Instead of having to issue individual commands for
each disk and volume, the re-attach and recovery happen on a per disk
group granularity. Chapter 14 of the VERITAS Volume Manager 5.0
Administrator’s Guide referred to in the Related Documents
section provides
further information on administering sites and remote mirrors.
Make the failed nodes rejoin the cluster using cmrunnode. The nodes can only join the cluster if they
are able to see the same shared disks that the running cluster nodes are able to see.
19