Building Disaster Recovery Serviceguard Solutions Using Metrocluster with EMC SRDF

When the RAC MNP package is configured as a critical_package, the Site Controller Package
considers only the RAC MNP package status to initiate a site failover. Since the RAC MNP package
fails when the contained RAC database fails, the Site Controller Package fails over to start on the
remote site node and initiates a site failover from the remote site.
When the RAC MNP package is configured as a managed_package along with other packages
in the stack, such as the CFS MP and CVM DG packages, the Site Controller Package considers
the status of all the configured packages to determine a failure. When the RAC database fails,
only the RAC MNP package fails. All other managed packages continue to be up and running.
As a result, the Site Controller Package does not perform a site failover. The Site Controller Package
only logs a message in the syslog and continues to run on the same node where it was running
before the RAC database failed. Manual intervention is required to restart the RAC database MNP
package.
Oracle RAC database instance failure
Certain error conditions in the run time environment of a node can cause the Oracle RAC database
instance on the node to fail. This, in turn, causes the corresponding RAC MNP package instance
on the node to go down. The RAC MNP package continues to run with one less instance being
up and the Site Controller Package continues to monitor the RAC MNP stack.
However, if the failed RAC database instance is the last surviving instance, the RAC MNP package
is halted, after failing in the cluster. The Site Controller Package detects the failure and initiates a
site failover if the RAC MNP is configured as a critical_package.
Oracle RAC database Oracle Clusterware daemon failure
The Oracle Clusterware is an essential resource for all RAC databases in a site. When the crsd
or evmd daemons are aborted on account of a failure, they are automatically restarted on the
node. When the cssd daemon is aborted on account of a failure on a node, the node is restarted.
The RAC MNP stack continues to run with one less instance on the site.
The Site Controller Package continues to run uninterrupted as long as there is at least one RAC
MNP instance running and the RAC MNP package has not failed. However, if the failed RAC
database instance is the last surviving instance on the site, when the node is restarted, it initiates
a failover of the Site Controller Package to the remote site. The Site Controller Package, during
startup at the remote site, will detect the failure and perform a site failover starting up the RAC
MNP stack configured in that site.
Administering Metrocluster for RAC
This section elaborates the procedures that must be followed to administer Metrocluster for RAC.
Online addition and deletion of nodes
Metrocluster requires equal number of nodes to be configured at the primary and remote data
centers. Therefore, whenever a RAC database instance is added or deleted at primary site, you
must add or delete the replica database instance at the remote site as well.
Online node addition involves procedures on both the sites of the redundant RAC database
configuration.
1. Online node addition on the primary site where the RAC database package stack is running.
2. Online node addition on the remote site where the RAC database package stack is down.
Similarly, online node deletion involves performing the following tasks.
1. Online node deletion on the primary site where the RAC database package stack is running.
2. Online node deletion on the remote site with where the RAC database package stack is down.
NOTE: Add or delete nodes online when the Site Controller Package is halted in the DETACH
mode.
Adding nodes online on a primary site where the RAC database is running
To add nodes online on a primary site where the RAC database package stack is running:
Administering Metrocluster for RAC 127