Designing Disaster Recovery Clusters using Metroclusters and Continentalclusters, Reprinted October 2011 (5900-1881)

Oracle RAC Database Oracle Clusterware Daemon Failure
The Oracle Clusterware is an essential resource for all RAC databases in a site. When the crsd
or evmd daemons are aborted on account of a failure, they are automatically restarted on the
node. When the cssd daemon is aborted on account of a failure on a node, the node is restarted.
The RAC MNP stack continues to run with one less instance on the site.
The Site Controller Package continues to run uninterrupted as long as there is at least one RAC
MNP instance running and the RAC MNP package has not failed. However, if the failed RAC
database instance is the last surviving instance on the site, when the node is restarted, it initiates
a failover of the Site Controller Package to the remote site. The Site Controller Package, during
startup at the remote site, will detect the failure and perform a site failover starting up the RAC
MNP stack configured in that site.
Administering the Site Aware Disaster Tolerant Metrocluster Environment
This section describes the procedures that you must perform to administer the SADTA environment.
This section addresses the following topics:
Administering the SADTA Configuration” (page 395)
Administering Metrocluster for RAC” (page 400)
Administering the SADTA Configuration
This section elaborates the procedures that need to be followed to administer a SADTA configuration
in which complex workloads other than Oracle RAC are configured.
This section addresses the following topics:
“Maintaining a Node” (page 395)
“Maintaining the Site” (page 396)
“Maintaining the Metrocluster Environment File” (page 396)
“Moving the Site Controller Package to a Node at the Local Site” (page 396)
“Maintaining Site Controller Package” (page 396)
“Upgrading the Site Controller Package” (page 397)
“Deleting the Site Controller Package” (page 397)
“Starting a Complex Workload” (page 398)
“Shutting Down a Complex Workload” (page 398)
“Moving a Complex Workload to the Remote Site” (page 398)
“Restarting a Failed Site Controller Package” (page 399)
“Migrating Complex Workloads Using Legacy SG SMS CVM/CFS Packages to Modular SG
SMS CVM/CFS Packages with Minimal Downtime” (page 399)
Maintaining a Node
To perform maintenance procedures on a cluster node, the node must be removed from the cluster.
Run the cmhaltnode -f command to move the node out of the cluster. This command halts the
complex workload package instance running on the node. As long as there are other nodes in the
site and the Site Controller Package is still running on the site, the site aware disaster tolerant
workload continues to run with one less instance on the same site.
Once the node maintenance procedures are complete, join the node to the cluster using the
cmrunnode command. If the Site Controller Package is running on the site that the node belongs
to, the active complex-workload package instances on the site that have the auto_run flag set
Administering the Site Aware Disaster Tolerant Metrocluster Environment 395