Building Disaster Recovery Serviceguard Solutions Using Continentalclusters A.08.00

Following is a partial list of failures that require full resynchronization to restore disaster-tolerant
data protection. Resynchronization is automatically initiated by moving the application package
back to its primary host after repairing the failure.
Failure of the entire primary Data Center for a given application package.
Failure of all of the primary hosts for a given application package.
Failure of the primary P9000 and XP disk array for a given application package.
Failure of all Continuous Access links with application restart on a secondary host.
NOTE: The preceding steps are automated provided the default value of 1 is being used for the
auto variable AUTO_PSUEPSUS. After the Continuous Access link failure is fixed, you must halt
the package at the failover site and restart on the primary site. However, if you want to reduce
downtime, you must manually invoke pairresync before failback.
Full resynchronization must be manually initiated (as described in the next section) after repairing
the following failures:
Failure of the recovery P9000 and XP disk array for a given application package followed
by application startup on a primary host.
Failure of all Continuous Access links with Fence Level NEVER or ASYNC with restart of the
application on a primary host.
Pairs must be manually recreated if both the primary and recovery P9000 and XP disk arrays are
in the SMPL (simplex) state.
Ensure you periodically review the following files for messages, warnings, and recommended
actions. HP recommends to review these files after system, data center, and application failures.
/var/adm/syslog/syslog.log
/etc/cmcluster/<package-name>/<package-name>.log
/etc/cmcluster/<bkpackage-name/<bkpackage-name>.log
Using the pairresync command
The pairresync command can be used with special options after a failover in which the recovery
site has started the application and has processed transaction data on the disk at the recovery site,
but the disks on the primary site are intact. After the Continuous Access link is fixed, depending
on which site you are on, use the pairresync command in one of the following two ways:
pairresync -swapp—from the primary site.
pairresync -swaps—from the failover site.
These options take advantage of the fact that the recovery site maintains a bit-map of the modified
data sectors on the recovery array. Either version of the command will swap the personalities of
the volumes, with the PVOL becoming the SVOL and SVOL becoming the PVOL. With the
personalities swapped, data written to the volume on the failover site (now PVOL) are copied to
the SVOL, which is now running on the primary site. During this time, the package continues running
on the failover site. After resynchronization is complete, you can halt the package on the failover
site, and restart it on the primary site. Metrocluster swaps the personalities between the PVOL and
the SVOL, returning PVOL status to the primary site.
Additional points
This toolkit might increase package startup time by 5 minutes or more. Packages with many
disk devices will take longer to start up than those with fewer devices because of the time
required to get device status from the P9000 and XP disk array or to synchronize.
64 Administering Continentalclusters