Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

NOTE: The monitor package for a cluster checks the status of the other cluster and issues
alerts and alarms, as defined in the Continentalclusters configuration file, based on the other
cluster’s status.
8. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package
log file.
9. Start the primary packages on the source disk site using cmrunpkg. Test local failover within
the source disk site.
10. View the status of the Continentalclusters primary and target disk sites, including configured
event data.
# cmviewconcl -v
The Continentalclusters is ready for testing. (See Testing the Continentalclusters” (page 92))
Switching to the Recovery Cluster in Case of Disaster
It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm.
Network failures may produce false alarms. After validating a failure, start the recovery process
using the cmrecovercl [-f] command. Note the following:
During an alert, the cmrecovercl will not start the recovery packages unless the -f option
is used.
During an alarm, the cmrecovercl will start the recovery packages without the -f option.
When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery
packages on the target disk site. This condition applies not only when no alert or alarm was
issued, but also applies to the situation where there was an alert or alarm, but the source disk
site recovered and its current status is Up.
Failback Scenarios
The goal of HP Continentalclusters is to maximize system and application availability. However,
even systems configured with Continentalclusters can experience hardware failures at the primary
site or the recovery site, as well as the hardware or networking failures connecting the two sites.
The following discussion addresses some of those failures and suggests recovery approaches
applicable to environments using data replication provided by HP StorageWorks P9000 or XP
series disk arrays and Continuous Access. In Chapter 2: “Designing Continentalclusters”, there is
a discussion of failback mechanisms and methodologies in “Restoring Disaster Tolerance (page 98).
Scenario 1
The primary site has lost power, including backup power (UPS), to both the systems and disk arrays
that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the
P9000 or XP disk array or the operating systems of the systems at the primary site.
Scenario 2
The primary site P9000 or XP disk array experienced a catastrophic hardware failure and all data
was lost on the array.
Failback in Scenarios 1 and 2
After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site
follow the prescribed processes and recovery procedures to start the protected applications on the
target disk site. Each Continentalclusters package control script that invokes Metrocluster Continuous
Access P9000 or XP will evaluate the status of the P9000 and XP paired volumes. Since neither
the systems nor the P9000 or XP disk arrays at the primary site are accessible, the control file will
initially report the paired volumes with a local status of SVOL_PAIR or SVOL_PSUE (in ASYNC
206 Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access for P9000 and XP