Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

failure and there are no other nodes on the local site that it can run on with package switching
enabled. The workload packages can be halted and restarted using the cmhaltpkg and cmrunpkg
commands when the Site Controller Package is running. The Site Controller Package is not affected
when the workload packages are administratively halted using the cmhaltpkg command.
Site Failover
The Site Controller Package initiates a site failover when the site is lost or when the complex
workload has failed. The Site Controller Package performs a site failover by first failing over itself
to a node in the remote site. Before preparing the replicated storage, the Site Controller Package
first ensures that all the packages in the failed site have halted cleanly. On the node in the remote
site, the Site Controller Package prepares the replicated storage and starts the packages of the
complex workload’s redundant configuration.
An MNP package that is down is considered as halted clean only if all its instances have run the
halt scripts successfully. A failover package is considered as halted clean only if it has successfully
executed the halt script on the node where it last went down.
When an MNP package instance has not halted cleanly, Serviceguard will not allow the
corresponding node to be removed. To remove the node from the cluster, any resource of the
instance that may still be online on the node must be cleaned and the package's node switching
flag for the node must be enabled.
Following is a sample of a typical disaster tolerant RAC database that is configured in its Site
Controller Package configuration file:
site san_francisco
critical_package sfo_app
critical_package sfo_hrdb
managed_package sfo_hrdb_mp
managed_package sfo_hrdb_dg
site san_jose
critical_package sjc_app
critical_package sjc_hrdb
managed_package sjc_hrdb_mp
managed_package sjc_hrdb_dg
In this example, the Site Controller Package initiates and performs a site failover to the san_jose
site when either of the packages configured as the critical_package on the san_francisco
site has failed and halted cleanly in the cluster. So, when sfo_app or sfo_hrdb fails and is
halted cleanly in the cluster, the Site Controller Package initiates and performs a site failover to
the san_jose site.
Following is an example of a Site Controller Package configuration file where all the packages in
the workload are configured using the managed_package attribute.
site san_francisco
managed_package sfo_app
managed_package sfo_hrdb
managed_package sfo_hrdb_mp
managed_package sfo_hrdb_dg
site san_jose
managed_package sjc_app
managed_package sjc_hrdb
managed_package sjc_hrdb_mp
managed_package sjc_hrdb_dg
In this example, the Site Controller Package initiates and performs a site failover to the san_jose
site when all the configured managed packages in the san_francisco site have failed and
halted cleanly in the cluster. So when sfo_app, sfo_hrdb, sfo_hrdb_mp, and sfo_hrdb_dg
packages have failed and halted cleanly, the Site Controller Package initiates and performs a site
failover to the san_jose site.
Overview of Site Aware Disaster Tolerant Architecture 343