Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

Switching to the Recovery Cluster in Case of Disaster
It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm.
Network failures may produce false alarms. After validating a failure, start the recovery process
using the cmrecovercl [-f] command. Note the following:
During an alert, the cmrecovercl will not start the recovery packages unless the -f option is
used.
During an alarm, the cmrecovercl will start the recovery packages without the -f option.
When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery
packages on the target disk site. This condition applies not only when no alert or alarm was
issued, but also applies to the situation where there was an alert or alarm, but the source disk
site recovered and its current status is Up.
Failover to Recovery Site
After reception of the Continentalcluster’s alerts and alarm, the administrators at the recovery site
follow the prescribed processes and recovery procedures to start the protected applications on the
target disk site.
The recovery package control script will evaluate the status of the DR group used by the package,
and will do the failover of the DR group to the EVA in the recovery site. This means after the failover
was successful, the DR group in the recovery site's EVA will be source and accessible with read/write
mode.
NOTE: If the Continuous Access links between the two EVAs are down, the recovery package
will only start up if one of the following conditions are true:
The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the
package’s environment file is set to “Availability_Preferred.
The package failover policy variable “DT_APPLICATION_STARTUP_POLICY” in the
package's environment file is set to “ Data_Currency_Preferred, and a FORCEFLAG
file exits in the package directory.
After the recovery package is up and running, the EVA in the recovery site will have more current
data than the one in the primary site.
Failover Scenarios
The goal of HP Continentalclusters is to maximize system and application availability. However,
even systems configured with Continentalclusters can experience hardware failures at the primary
site or the recovery site, as well as the hardware or networking failures connecting the two sites.
The following scenarios addresses some of those failures and suggests recovery approaches
applicable to environments using data replication provided by HP StorageWorks EVA series disk
arrays and Continuous Access.
Scenario 1
The primary site has lost power for a prolonged time, including backup power (UPS), to both the
systems and disk arrays that make up the Serviceguard Cluster at the primary site. There is no loss
of data on either the EVA disk array or the operating systems of the systems at the primary site.
Failback to the Primary Site
In this scenario, the EVA in the primary site is down due to the loss of power; therefore, the storage
configuration information and the application data prior to power failure remain intact in the EVA.
When the primary site’s power is restored, the EVA is up and running, and Continuous Access
links are up, Continuous Access EVA software will automatically resynchronize the data from the
recovery site's EVA back to the primary site’s EVA. If the resynchronization is a full copy operation,
252 Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access EVA