Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

ManualsBrandsHP ManualsSoftwareHP Serviceguard Metrocluster with EMC SRDF

201

202

203

204

205

206

207

208

209

210

NOTE: The monitor package for a cluster checks the status of the other cluster and issues

alerts and alarms, as defined in the Continentalclusters configuration file, based on the other

cluster’s status.

8. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package

log file.

9. Start the primary packages on the source disk site using cmrunpkg. Test local failover within

the source disk site.

10. View the status of the Continentalclusters primary and target disk sites, including configured

event data.

# cmviewconcl -v

The Continentalclusters is ready for testing. (See “Testing the Continentalclusters” (page 92))

Switching to the Recovery Cluster in Case of Disaster

It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm.

Network failures may produce false alarms. After validating a failure, start the recovery process

using the cmrecovercl [-f] command. Note the following:

• During an alert, the cmrecovercl will not start the recovery packages unless the -f option

is used.

• During an alarm, the cmrecovercl will start the recovery packages without the -f option.

• When there is neither an alert nor an alarm condition, cmrecovercl cannot start the recovery

packages on the target disk site. This condition applies not only when no alert or alarm was

issued, but also applies to the situation where there was an alert or alarm, but the source disk

site recovered and its current status is Up.

Failback Scenarios

The goal of HP Continentalclusters is to maximize system and application availability. However,

even systems configured with Continentalclusters can experience hardware failures at the primary

site or the recovery site, as well as the hardware or networking failures connecting the two sites.

The following discussion addresses some of those failures and suggests recovery approaches

applicable to environments using data replication provided by HP StorageWorks P9000 or XP

series disk arrays and Continuous Access. In Chapter 2: “Designing Continentalclusters”, there is

a discussion of failback mechanisms and methodologies in “Restoring Disaster Tolerance” (page 98).

Scenario 1

The primary site has lost power, including backup power (UPS), to both the systems and disk arrays

that make up the Serviceguard Cluster at the primary site. There is no loss of data on either the

P9000 or XP disk array or the operating systems of the systems at the primary site.

Scenario 2

The primary site P9000 or XP disk array experienced a catastrophic hardware failure and all data

was lost on the array.

Failback in Scenarios 1 and 2

After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site

follow the prescribed processes and recovery procedures to start the protected applications on the

target disk site. Each Continentalclusters package control script that invokes Metrocluster Continuous

Access P9000 or XP will evaluate the status of the P9000 and XP paired volumes. Since neither

the systems nor the P9000 or XP disk arrays at the primary site are accessible, the control file will

initially report the paired volumes with a local status of SVOL_PAIR or SVOL_PSUE (in ASYNC

206 Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access for P9000 and XP