Building Disaster Recovery Serviceguard Solutions Using Metrocluster with 3PAR Remote Copy

When the complex workload is mounted as read only or is idle or is completing read-only

transactions when the replication link fails, it may not encounter any failure and continues to be

available from the site.

Site Controller package failure

The Site Controller package can fail for many reasons, such as node crash, while the active

complex-workload package stack on the site is up and running. The Site Controller package fails

over to an adoptive node, which can be a node on the same site or a node on the remote site.

The Site Controller package behavior is different under each scenario so that the complex workload

availability is not disrupted.

NOTE: When the adoptive node is a node in the same site, where the current active complex

workload stack is running, it is considered as a local failover for the Site Controller package.

On a Site Controller package local failover, the disaster tolerant complex workload remains

uninterrupted on that site. The Site Controller package continues to monitor the managed packages

or the critical packages on the site, as configured from the current node.

When the Site Controller package fails over to an adoptive node at the remote site, it is considered

a failover across sites for the Site Controller package. When the Site Controller package fails over

across sites while the active complex-workload package stack is running in the site, the Site

Controller package fails on the remote site adoptive node without affecting the running active

complex workload configuration stack in the cluster. The complex workload configuration continues

to be available in the cluster. However, as the Site Controller package has failed in the cluster,

the complex workload configuration can no longer automatically failover to the remote site.

Site failure

A site failure is a scenario where a disaster or an equivalent failure results in all nodes in a site

failing or going down. The Serviceguard cluster detects this failure, and reforms the cluster without

the nodes from the failed site. The Site Controller Package that was running on a node on the failed

site fails over to an adoptive node in the remote site.

When the remote site starts, the Site Controller Package detects that the active complex-workload

packages have failed and initiates a site failover by activating the passive complex-workload

packages that are configured in the current site.

The disaster tolerant complex workloads that have their active packages on the surviving site,

where the cluster reformed, continue to run without any interruption.

48 Understanding failover/failback scenarios