Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

to be available in the cluster. However, as the Site Controller Package has failed in the cluster, the
complex workload configuration can no longer automatically failover to the remote site.
Site Failure
A site failure is a scenario where a disaster or an equivalent failure results in all nodes in a site
failing or going down. The Serviceguard cluster detects this failure, and reforms the cluster without
the nodes from the failed site. The Site Controller Package that was running on a node on the failed
site fails over to an adoptive node in the remote site.
When the remote site starts, the Site Controller Package detects that the active complex-workload
packages have failed and initiates a site failover by activating the passive complex-workload
packages that are configured in the current site.
The disaster tolerant complex workloads that have their active packages on the surviving site,
where the cluster reformed, continue to run without any interruption.
Failure Scenarios in Metrocluster for RAC
This sections elaborates on the failure scenarios in Metrocluster for RAC.
This section addresses the following topics:
“Oracle RAC Database Failure ” (page 394)
“Oracle RAC Database Instance Failure” (page 394)
“Oracle RAC Database Oracle Clusterware Daemon Failure” (page 395)
Oracle RAC Database Failure
When failures such as tablespace corruption, or errors arising out of insufficient storage space
occur, the RAC database instance processes on the nodes fail. When the Oracle RAC database
instance fails at a site, the RAC MNP package instance containing it also fails. The Site Controller
Package that monitors the RAC MNP package detects that the RAC MNP has failed. The database
failure is handled based on the manner in which the RAC MNP stack is configured with the Site
Controller Package.
When the RAC MNP package is configured as a critical_package, the Site Controller Package
considers only the RAC MNP package status to initiate a site failover. Since the RAC MNP package
fails when the contained RAC database fails, the Site Controller Package fails over to start on the
remote site node and initiates a site failover from the remote site.
When the RAC MNP package is configured as a managed_package along with other packages
in the stack, such as the CFS MP and CVM DG packages, the Site Controller Package considers
the status of all configured packages to determine a failure. When the RAC database fails, only
the RAC MNP package fails. All other managed packages continue to be up and running. As a
result, the Site Controller Package does not perform a site failover. The Site Controller Package
only logs a message in the syslog and continues to run on the same node where it was running
before the RAC database failed. Manual intervention is required to restart the RAC database MNP
package.
Oracle RAC Database Instance Failure
Certain error conditions in the run time environment of a node can cause the Oracle RAC database
instance on the node to fail. This, in turn, causes the corresponding RAC MNP package instance
on the node to go down. The RAC MNP package continues to run with one less instance being
up and the Site Controller Package continues to monitor the RAC MNP stack.
However, if the failed RAC database instance is the last surviving instance, the RAC MNP package
is halted, after failing in the cluster. The Site Controller Package detects the failure and initiates a
site failover if the RAC MNP is configured as a critical_package.
394 Designing a Disaster Recovery Solution Using Site Aware Disaster Tolerant Architecture