Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

ManualsBrandsHP ManualsSoftwareHP Serviceguard Metrocluster with EMC SRDF

341

342

343

344

345

346

347

348

349

350

failure and there are no other nodes on the local site that it can run on with package switching

enabled. The workload packages can be halted and restarted using the cmhaltpkg and cmrunpkg

commands when the Site Controller Package is running. The Site Controller Package is not affected

when the workload packages are administratively halted using the cmhaltpkg command.

Site Failover

The Site Controller Package initiates a site failover when the site is lost or when the complex

workload has failed. The Site Controller Package performs a site failover by first failing over itself

to a node in the remote site. Before preparing the replicated storage, the Site Controller Package

first ensures that all the packages in the failed site have halted cleanly. On the node in the remote

site, the Site Controller Package prepares the replicated storage and starts the packages of the

complex workload’s redundant configuration.

An MNP package that is down is considered as halted clean only if all its instances have run the

halt scripts successfully. A failover package is considered as halted clean only if it has successfully

executed the halt script on the node where it last went down.

When an MNP package instance has not halted cleanly, Serviceguard will not allow the

corresponding node to be removed. To remove the node from the cluster, any resource of the

instance that may still be online on the node must be cleaned and the package's node switching

flag for the node must be enabled.

Following is a sample of a typical disaster tolerant RAC database that is configured in its Site

Controller Package configuration file:

site san_francisco

critical_package sfo_app

critical_package sfo_hrdb

managed_package sfo_hrdb_mp

managed_package sfo_hrdb_dg

site san_jose

critical_package sjc_app

critical_package sjc_hrdb

managed_package sjc_hrdb_mp

managed_package sjc_hrdb_dg

In this example, the Site Controller Package initiates and performs a site failover to the san_jose

site when either of the packages configured as the critical_package on the san_francisco

site has failed and halted cleanly in the cluster. So, when sfo_app or sfo_hrdb fails and is

halted cleanly in the cluster, the Site Controller Package initiates and performs a site failover to

the san_jose site.

Following is an example of a Site Controller Package configuration file where all the packages in

the workload are configured using the managed_package attribute.

site san_francisco

managed_package sfo_app

managed_package sfo_hrdb

managed_package sfo_hrdb_mp

managed_package sfo_hrdb_dg

site san_jose

managed_package sjc_app

managed_package sjc_hrdb

managed_package sjc_hrdb_mp

managed_package sjc_hrdb_dg

In this example, the Site Controller Package initiates and performs a site failover to the san_jose

site when all the configured managed packages in the san_francisco site have failed and

halted cleanly in the cluster. So when sfo_app, sfo_hrdb, sfo_hrdb_mp, and sfo_hrdb_dg

packages have failed and halted cleanly, the Site Controller Package initiates and performs a site

failover to the san_jose site.

Overview of Site Aware Disaster Tolerant Architecture 343