Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

Timing Considerations
In a journal group, many journal volumes can be configured to hold a significant amount of the
journal data (host-write data). The package startup time may increase significantly when a
Metrocluster Continuous Access package fails over. Delay in package startup time will occur in
these situations:
1. When recovering from broken pair affinity. On failover, the SVOL pull all the journal data
from PVOL site. The time needed to complete all data transfer to SVOL depends on the amount
of outstanding journal data in the PVOL and the bandwidth of the Continuous Access links.
2. When host I/O faster than Continuous Access data replication. The outstanding data not
being replicated to the SVOL is accumulated in journal volumes. Upon package fail over to
the SVOL site, the SVOL pull all the journal data from PVOL site. The completion of the all
data transfer to the SVOL depends on the bandwidth of the Continuous Access links and
amount of outstanding data in the PVOL journal volume.
Data maintenance with the failure of a Metrocluster with Continuous Access for P9000 and XP
Failover
The following sections, “Swap Takeover Failure (Asynchronous/Journal mode)” and “Takeover
Timeout (for Continuous Access Journal mode)” describes data maintenance upon failure of a
Metrocluster with Continuous Access for P9000 and XP failover.
Swap Takeover Failure (Asynchronous/Journal mode)
When a device group pair state is SVOL-PAIR at a local site and is PVOL-PAIR at the remote site,
the Metrocluster Continuous Access performs a swap takeover. The swap takeover would fail if
there is an internal (unseen) error (for example, cache or shared memory failure) in the device
group pair. In this case, if the AUTO-NONCURDATA is set to 0, the package will not be started
and the SVOL state is change to SVOL-PSUE (SSWS) by the takeover command. The PVOL site
either remains in PVOL-PAIR or is changed to PVOL-PSUE.
The SVOL is in SVOL-PSUE(SSWS) meaning that the SVOL is read/write enabled and the data is
usable but not as current as PVOL.
In this case, either use FORCEFLAG to startup the package on SVOL site or fix the problem and
resume the data replication with the following procedures:
1. Split the device group pair completely (pairsplit -g <dg> -S).
2. Re-create a pair from original PVOL as source (use paircreate command).
3. Startup package on either the PVOL site or SVOL site.
Takeover Timeout (for Continuous Access Journal mode)
A takeover timeout occurs when a package failover to the secondary site (SVOL) and Metrocluster
Continuous Access issues takeover (either swap or SVOL takeover) command on SVOL. If the
journal group pair is flushing the journal data from PVOL to SVOL and takeover timeout occurs,
the package would not start and the following situations would occur:
1. The device group pair state remains in PVOL-PAIR/SVOL-PAIR.
2. The journal data is continuously transferring to the SVOL.
In this case, it is required to wait for the completion of the journal data flushing and the state for
each of the following:
Primary site: PVOL-PAIR or PVOL-PSUS(E)
Secondary site: SVOL-PSUS(SSWS) or SVOL-PSUE(SSWS)
At this point, execute either: (1) by using the FORCEFLAG to startup the package on SVOL site or
(2) to fix the problem (if any of Continuous Access links was failed) and resume the data replication
with the following procedures:
196 Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access for P9000 and XP