Building Disaster Recovery Serviceguard Solutions Using Metrocluster with 3PAR Remote Copy

Restoring replication after a failover

When the Metrocluster package fails over to the remote site and the links are not up or the primary

storage system is not up, Metrocluster issues the setrcopygroup failover command. This

command changes the role of the Remote Copy volume group on the storage system in the recovery

site from Secondary to Primary-Rev. In this role, the data is not replicated from the recovery

site to the primary site. After the links are restored or the primary storage system is restored,

manually issue the setrcopygroup recover command on the storage system in the recovery

site to resynchronize the data from the recovery site to the primary site. This will result in the change

of the role of the Remote Copy volume group on the storage system in the primary site from Primary

to Secondary-Rev.

CAUTION: When the roles are Secondary-Rev and Primary-Rev, a disaster on the recovery

site will result in a failure of the Metrocluster package. To avoid this, immediately halt the package

on the recovery site and start it up on the primary site. This will restore the role of the Remote Copy

volume group to its original role of Primary and Secondary.

Administering the SADTA configuration

This section elaborates the procedures that need to be followed to administer a SADTA configuration.

Maintaining a node

To perform maintenance procedures on a cluster node, the node must be removed from the cluster.

Run the cmhaltnode -f command to move the node out of the cluster. This command halts the

complex workload package instance running on the node. As long as there are other nodes in the

site and the Site Controller Package is still running on the site, the site aware disaster tolerant

workload continues to run with one less instance on the same site.

After the node maintenance procedures are complete, join the node to the cluster using the

cmrunnode command. If the Site Controller package is running on the site on which the node

belongs, the active complex-workload package instances on the site that have the auto_run flag

is set to yes, will automatically start. If the auto_run flag set to no, these instances must be

manually started on the restarted node.

Before halting a node in the cluster, the Site Controller Package must be moved to a different node

in the site. See “Moving the Site Controller package to a node at the local site” (page 50). However,

if the node that needs to be halted in the cluster is the last surviving node in the site, the Site

Controller packages running on this node fail over to the other site. In such scenarios, the site

aware disaster tolerant workload must be moved to the remote site before halting the node in the

cluster. For more information on moving a site aware disaster tolerant complex workload to a

remote site, see “Moving a complex workload to the remote site” (page 52).

Maintaining the site

Maintenance operation at a site might require that all the nodes on that site are down. In such

scenarios, the site aware disaster tolerant workload can be started on the other site to provide

continuous service. For more information on moving a site aware disaster tolerant complex workload

to a remote site, see “Moving a complex workload to the remote site” (page 52).

Moving the Site Controller package to a node at the local site

To complete maintenance operations on a node, there could be instances where a node in the

cluster needs to be brought down. In such cases, the Site Controller package that is running on

the node must be moved to another node in the local site.

To move the Site Controller package to another node in the local site:

50 Administering a Metrocluster with 3PAR Remote Copy