Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

Starting the Disaster Tolerant Oracle RAC Database with ASM in the Metrocluster
The procedure to start the disaster tolerant Oracle RAC database with ASM is identical to the
procedure for starting a complex workload in a Metrocluster. For more information on starting the
complex workload in the Metrocluster, see “Starting the Disaster Tolerant Complex Workload in
the Metrocluster” (page 359).
Understanding Failure Scenarios in a Site Aware Disaster Tolerant
Architecture
This section describes how various site failover scenarios are addressed in SADTA.
This section addresses the following topics:
“Failure Scenarios in a Complex Workload” (page 391)
“Failure Scenarios in Metrocluster for RAC” (page 394)
Failure Scenarios in a Complex Workload
This section elaborates on the failure scenarios that can occur when a complex workload is
configured using Site Aware Disaster Tolerant Architecture.
This section addresses the following topics:
“Site Failover ” (page 391)
“Node Failure and Rejoining the Cluster” (page 392)
“Network Partitions Across Sites” (page 393)
“Disk Array and SAN Failure” (page 393)
“Replication Link Failure” (page 393)
“Site Controller Package Failure” (page 393)
“Site Failure” (page 394)
Site Failover
When the Site Controller Package determines that a running package configuration of a disaster
tolerant complex workload has failed in the Metrocluster, or that the site hosting it has failed, it
fails over to the remote site node and initiates a site failover from the remote node. The site failover
starts the adoptive complex-workload package configuration by starting the packages configured
on the remote site.
The Site Controller Package monitors the active complex-workload packages, according to the
configuration, to detect a failure and initiate a site failover. When the complex-workload packages
are configured using the critical_package attribute, the Site Controller Package detects and
initiates a site failover even if one of the critical packages fail. In a configuration where all the
packages in the complex workload are configured with the managed_package attribute, the Site
Controller Package detects a failure and initiates site failover based on the cumulative status of all
the configured managed packages.
A complex-workload package that has failed or is halted, in addition to displaying a down state,
also displays a halted status. A special flag, package_halted is set to no when the complex-workload
package is down, having failed in the cluster. This special flag is set to yes when the
complex-workload package is down and manually halted. Serviceguard sets this flag to no only
when the last surviving instance of the complex workload package is halted as a result of a failure.
The flag is set to yes if the last surviving instance is manually halted, even if other instances are
halted earlier due to failures.
The Site Controller Package determines a failure by checking if the package_halted flag is set to
no for all monitored packages that are in the down state. When the monitored packages have
Understanding Failure Scenarios in a Site Aware Disaster Tolerant Architecture 391