Serviceguard Extended Distance Cluster (EDC) with VxVM/CVM Mirroring on HP-UX, May 2008

Depending on where the application ran prior to the site failure, the following behavior can be

expected when a site completely fails:

• Applications which ran on the failing site are expected to fail over automatically to the remaining

site.

• The applications that ran at the remaining site are expected to keep running without manual

intervention or interruption. A short pause or hang might be experienced.

A site failure like this leaves each volume with just one mirror (plex) available. Updates to the volumes

are tracked in the FastResync maps of the associated DCO volumes for the time the plexes are missing

from the failed site. The expected reaction to a site failure is the same regardless if the CVM master

node is located at the failed site or not.

Restoring a failed site

The tasks described in this section are examples of how to recover from a simulated site failure. These

failures are simulated, in that none of the cluster components were physically damaged or

experienced loss of configuration information. The recovery from a real site failure could involve

different steps depending on whether some of the devices were permanently damaged and had to be

replaced with new ones.

These are the basic steps of a site recovery in an SMS A.01.0x EDC on HP-UX 11i v2:

• Power up/boot up all cluster components that were failed during the tests (nodes, arrays, network

and FC switches)

• Validate connectivity (network and Fibre Channel) within the cluster

• Make the LUNs visible to the OS by executing ioscan(1M), in case the node booted up before

all storage devices became visible. This step is unnecessary on HP-UX 11i v3 since the OS

automatically registers devices as they become available again.

• Make the LUNs known to DMP and rebuild the device tree by running the vxdctl enable

command, in case the node booted up before all storage devices became visible.

• Re-attach the LUNs that are reported with the status NODEVICE, by issuing the vxreattach

command. Specifying the “-r” option would automatically recover (resynchronize) the volumes if the

disk contains any. Otherwise, the recovery may be initiated individually with the vxrecover

command. The synchronization is incremental and the amount of synchronization needed depends

on the information in the Fast Resync maps.

VxVM/CVM 5.0 Tip:

The “site-awareness” feature allows site consistent detach and re-attach for

an entire disk group. Instead of having to issue individual commands for

each disk and volume, the re-attach and recovery happen on a per disk

group granularity. Chapter 14 of the VERITAS Volume Manager 5.0

Administrator’s Guide referred to in the Related Documents

section provides

further information on administering sites and remote mirrors.

• Make the failed nodes rejoin the cluster using cmrunnode. The nodes can only join the cluster if they

are able to see the same shared disks that the running cluster nodes are able to see.