Building Disaster Recovery Serviceguard Solutions Using Metrocluster with 3PAR Remote Copy

Restoring replication after a failover
When the Metrocluster package fails over to the remote site and the links are not up or the primary
storage system is not up, Metrocluster issues the setrcopygroup failover command. This
command changes the role of the Remote Copy volume group on the storage system in the recovery
site from Secondary to Primary-Rev. In this role, the data is not replicated from the recovery
site to the primary site. After the links are restored or the primary storage system is restored,
manually issue the setrcopygroup recover command on the storage system in the recovery
site to resynchronize the data from the recovery site to the primary site. This will result in the change
of the role of the Remote Copy volume group on the storage system in the primary site from Primary
to Secondary-Rev.
CAUTION: When the roles are Secondary-Rev and Primary-Rev, a disaster on the recovery
site will result in a failure of the Metrocluster package. To avoid this, immediately halt the package
on the recovery site and start it up on the primary site. This will restore the role of the Remote Copy
volume group to its original role of Primary and Secondary.
Administering the SADTA configuration
This section elaborates the procedures that need to be followed to administer a SADTA configuration.
Maintaining a node
To perform maintenance procedures on a cluster node, the node must be removed from the cluster.
Run the cmhaltnode -f command to move the node out of the cluster. This command halts the
complex workload package instance running on the node. As long as there are other nodes in the
site and the Site Controller Package is still running on the site, the site aware disaster tolerant
workload continues to run with one less instance on the same site.
After the node maintenance procedures are complete, join the node to the cluster using the
cmrunnode command. If the Site Controller package is running on the site on which the node
belongs, the active complex-workload package instances on the site that have the auto_run flag
is set to yes, will automatically start. If the auto_run flag set to no, these instances must be
manually started on the restarted node.
Before halting a node in the cluster, the Site Controller Package must be moved to a different node
in the site. See “Moving the Site Controller package to a node at the local site (page 50). However,
if the node that needs to be halted in the cluster is the last surviving node in the site, the Site
Controller packages running on this node fail over to the other site. In such scenarios, the site
aware disaster tolerant workload must be moved to the remote site before halting the node in the
cluster. For more information on moving a site aware disaster tolerant complex workload to a
remote site, see “Moving a complex workload to the remote site” (page 52).
Maintaining the site
Maintenance operation at a site might require that all the nodes on that site are down. In such
scenarios, the site aware disaster tolerant workload can be started on the other site to provide
continuous service. For more information on moving a site aware disaster tolerant complex workload
to a remote site, see “Moving a complex workload to the remote site” (page 52).
Moving the Site Controller package to a node at the local site
To complete maintenance operations on a node, there could be instances where a node in the
cluster needs to be brought down. In such cases, the Site Controller package that is running on
the node must be moved to another node in the local site.
To move the Site Controller package to another node in the local site:
50 Administering a Metrocluster with 3PAR Remote Copy