Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

ManualsBrandsHP ManualsSoftwareHP Serviceguard Metrocluster with EMC SRDF

301

302

303

304

305

306

307

308

309

310

Failback Scenarios

There is no failback counterpart to the “pushbutton” failover from the source disk site to the target

disk site. Failback is dependent on the original nature of the failover, the state of primary and

secondary Symmetrix SRDF volumes (R1 and R2) and the condition of the source disk site. In

Chapter 2: “Designing Continentalclusters”, there is a discussion of failback mechanisms and

methodologies in the section “Restoring Disaster Tolerance” (page 98).

The goal of HP Continentalclusters is to maximize system and application availability. However,

even systems configured with Continentalclusters can experience hardware failures at the primary

site or the recovery site, as well as the hardware or networking failures connecting the two sites.

The following discussion addresses some of those failures and suggests recovery approaches

applicable to the environments using data replication provided by Symmetrix Disk Arrays and

Symmetrix Remote Data Facility SRDF.

Scenario 1

The primary site has lost power, including backup power (UPS), to both the systems and disk arrays

that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the

Symmetrix or the operating systems of the systems at the primary site. After reception of the

Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed

processes and recovery procedures to start the protected applications on the target disk site. The

Continentalclusters package control file will invoke Metrocluster with EMC SRDF to evaluate the

status of the R1 and R2 paired group volumes. The command symrdf list will display status of the

device group.

Source (R1) View Target (R2) View MODES

-------------------------------- ------------------------ ----- ------------

ST LI ST

Standard A N A

Logical T R1 Inv R2 Inv K T R1 Inv R2 Inv RDF Pair

Device Dev E Tracks Tracks S Dev E Tracks Tracks MDA STATE

-------------------------------- -- ------------------------ ----- ------------

DEV001 009F WD 0 0 NR 00A5 RW 0 0 S.. Failed Over

DEV002 00A0 WD 0 0 NR 00A6 RW 0 0 S.. Failed Over

After power is restored to the primary site, the Symmetrix device groups may be in the status of

Failed Over. The procedure to move the application packages back to the primary site are different

depending on the status of the device groups.

The following procedure applies to the situation where the device groups have a status of “Failed

Over”:

1. Halt the Continentalclusters recovery packages at the recovery site.

# cmhaltpkg <pkg_name>

This will halt any applications, remove any floating IP addresses, unmount file systems and

deactivate volume groups as programmed into the package control files. The status of the

device groups will remain “Synchronized” at the recovery site and “Failed Over” at the primary

site.

2. Halt the recovery cluster, which also halts the monitor package ccmonpkg.

3. Start the cluster at the primary site. Assuming they have been properly configured the

Continentalclusters primary packages should not start. The monitor package should start

automatically.

4. Manually start the Continentalclusters primary packages at the primary site.

# cmrunpkg <pkg_name> or

# cmmodpkg -e <pkg_name>

The control script is programmed to handle this case. The control script will issue an SRDF

failback command to move the device group back to the R1 side and to resynchronize the R1

from the R2 side. Until the resynchronization is complete, the SRDF “read-through” feature will

Building Continentalclusters Solution with EMC SRDF 305