Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access XP P9000 for Linux B.01.00.00

the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the

personalities swapped, any data that has been written to the volume on the recovery site (now

PVOL) are then copied back to the SVOL (now running on the primary site). During this time the

package continues running on the recovery site.

Failback to the primary datacenter

After resynchronization is complete, you can halt the package on the recovery site, and restart it

on the primary site. Metrocluster will then swap the personalities between the PVOL and the SVOL,

returning PVOL status to the primary site.

NOTE: The preceding steps are automated provided the default value of 1 is being used for the

auto variable AUTO_PSUEPSUS. Once the Continuous Access link failure has been fixed, the user

only needs to halt the package on the recovery site and restart on the primary site. However, if

you want to reduce the amount of application downtime, you must manually invoke pairresync

before failback.

Factors affecting package startup time

In a journal group, many journal volumes can be configured to hold a significant amount of the

journal data (host-write data). The package startup time might increase significantly when a

Metrocluster Continuous Access package fails over. Delay in package startup time occurs in these

situations:

1. When recovering from broken pair affinity. On failover, the SVOL pull all the journal data

from PVOL site. The time required to complete all data transfer to SVOL depends on the amount

of outstanding journal data in the PVOL and the bandwidth of the Continuous Access links.

2. When host I/O faster than Continuous Access data replication. The outstanding data not

being replicated to the SVOL is accumulated in journal volumes. Upon package fail over to

the SVOL site, the SVOL pull all the journal data from PVOL site. The completion of the all

data transfer to the SVOL depends on the bandwidth of the Continuous Access links and

amount of outstanding data in the PVOL journal volume.

Data maintenance with the failure of a Metrocluster with Continuous Access XP P9000 for Linux

failover

The following sections, “Swap takeover failure (asynchronous/journal mode)” and “Takeover

timeout for Continuous Access journal mode” describes data maintenance upon failure of a

Metrocluster with Continuous Access XP P9000 for Linux failover.

Swap takeover failure (asynchronous/journal mode)

When a device group pair state is SVOL-PAIR at a local site (site where the package is starting)

and is PVOL-PAIR at the remote site, the Metrocluster Continuous Access performs a swap takeover.

The swap takeover will fail if there is an internal error (For Example, cache or shared memory

failure) in the device group pair. In this case, if the AUTO-NONCURDATA is set to 0, the package

will not be started and the SVOL state is change to SVOL-PSUE (SSWS) by the takeover command.

The PVOL site either remains in PVOL-PAIR or is changed to PVOL-PSUE.

The SVOL is in SVOL-PSUE(SSWS) that is, the SVOL is read/write enabled and the data is usable

but not as current as PVOL.

In this case, either use FORCEFLAG to startup the package on SVOL site or fix the problem and

resume the data replication with the following procedures:

1. Split the device group pair completely (pairsplit -g <dg> -S).

2. Re-create a pair from original PVOL as source (use paircreate command).

3. Startup package on either the PVOL site or SVOL site.

Administering a Metrocluster using Continuous Access XP/P9000 replication 43