Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

Failback Scenarios
There is no failback counterpart to the “pushbutton” failover from the source disk site to the target
disk site. Failback is dependent on the original nature of the failover, the state of primary and
secondary Symmetrix SRDF volumes (R1 and R2) and the condition of the source disk site. In
Chapter 2: “Designing Continentalclusters”, there is a discussion of failback mechanisms and
methodologies in the section “Restoring Disaster Tolerance” (page 98).
The goal of HP Continentalclusters is to maximize system and application availability. However,
even systems configured with Continentalclusters can experience hardware failures at the primary
site or the recovery site, as well as the hardware or networking failures connecting the two sites.
The following discussion addresses some of those failures and suggests recovery approaches
applicable to the environments using data replication provided by Symmetrix Disk Arrays and
Symmetrix Remote Data Facility SRDF.
Scenario 1
The primary site has lost power, including backup power (UPS), to both the systems and disk arrays
that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the
Symmetrix or the operating systems of the systems at the primary site. After reception of the
Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed
processes and recovery procedures to start the protected applications on the target disk site. The
Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the
status of the R1 and R2 paired group volumes. The command symrdf list will display status of the
device group.
Source (R1) View Target (R2) View MODES
-------------------------------- ------------------------ ----- ------------
ST LI ST
Standard A N A
Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair
Device Dev E Tracks Tracks S Dev E Tracks Tracks MDA STATE
-------------------------------- -- ------------------------ ----- ------------
DEV001 009F WD 0 0 NR 00A5 RW 0 0 S.. Failed Over
DEV002 00A0 WD 0 0 NR 00A6 RW 0 0 S.. Failed Over
After power is restored to the primary site, the Symmetrix device groups may be in the status of
Failed Over. The procedure to move the application packages back to the primary site are different
depending on the status of the device groups.
The following procedure applies to the situation where the device groups have a status of “Failed
Over”:
1. Halt the Continentalclusters recovery packages at the recovery site.
# cmhaltpkg <pkg_name>
This will halt any applications, remove any floating IP addresses, unmount file systems and
deactivate volume groups as programmed into the package control files. The status of the
device groups will remain “Synchronized” at the recovery site and “Failed Over at the primary
site.
2. Halt the recovery cluster, which also halts the monitor package ccmonpkg.
3. Start the cluster at the primary site. Assuming they have been properly configured the
Continentalclusters primary packages should not start. The monitor package should start
automatically.
4. Manually start the Continentalclusters primary packages at the primary site.
# cmrunpkg <pkg_name> or
# cmmodpkg -e <pkg_name>
The control script is programmed to handle this case. The control script will issue an SRDF
failback command to move the device group back to the R1 side and to resynchronize the R1
from the R2 side. Until the resynchronization is complete, the SRDF “read-through” feature will
Building Continentalclusters Solution with EMC SRDF 305