Building Disaster Recovery Serviceguard Solutions Using Metrocluster with EMC SRDF

ManualsBrandsHP ManualsSoftwareHP Serviceguard Metrocluster with Continuous Access EVA

complex-workload package is down, having failed in the cluster. This special flag is set to yes

when the complex-workload package is down and manually halted. Serviceguard sets this flag to

no only when the last surviving instance of the complex workload package is halted as a result of

a failure. The flag is set to yes if the last surviving instance is manually halted, even if other instances

are halted earlier due to failures.

The Site Controller package determines a failure by verifying whether the package_halted flag

is set to no for all the monitored packages that are in the down state. When the monitored packages

have failed but not halted, the Site Controller Package fails over to a remote site node to perform

a site failover.

Before starting the complex-workload packages configured at the remote site, the Site Controller

package ensures that it is safe to do so. The failed complex-workload packages might not have

halted cleanly, leaving stray processes and resources. In such scenarios, it is not safe to start the

identical complex workload configuration on the remote site. As a result, when it starts on the

remote site node, the Site Controller package verifies whether all the instances of the failed active

packages have halted cleanly. The Site Controller Package verifies the last_halt_failed flag

for each instance of the workload packages. The flag is set to yes for an instance whose halt script

execution resulted in an error. Even if one instance of any of the failed workload's packages did

not halt successfully, the Site Controller package aborts site failover. In these circumstances, the

Site Controller package halts and its state is displayed as failed on the remote site node. To restart

the Site Controller package and the complex workload configuration, the nodes on the site must

be manually cleaned.

After ensuring a clean halt for all the instances of the failed complex-workload packages, the Site

Controller package performs the following steps to activate the corresponding passive complex

workload configured in its current site:

1. Closes the Site Safety Latch for the failed complex-workload package nodes.

2. Waits for all the configured packages that are part of the failed complex-workload package

to halt successfully.

3. Deports the CVM disk groups used by the application on the failed site.

4. Prepares the replicated data storage on the current site using the Metrocluster environment

file on the node it is starting.

5. Imports the CVM disk groups used by the application in the current site.

6. Opens the Site Safety Latch in the current site.

7. Starts the complex-workload packages configured in the current site.

For the Site Controller package to successfully start the remote complex-workload package

configuration, the packages in the remote configuration must have node switching enabled on

their configured nodes. When the Site Controller package fails to start after successfully preparing

the storage on a site, it sets the Site Safety Latch to a transient state, which is displayed as

INTERMEDIATE. When the Site Safety Latch is in the INTERMEDIATE state, the corresponding Site

Controller package can be restarted only after cleaning the site where it previously failed to start.

For more information about cleaning the Site Controller package, see “Cleaning the site to restart

the Site Controller package” (page 71).

Node failure and rejoining the cluster

When a node in a cluster fails, all Multi-node packages (MNP) instances running on the failed

node also fails. The failover type packages fail over to the next available adoptive node. If no

other adoptive node is configured and available in the cluster, the failover package fails and is

halted.

When a node in the Metrocluster environment is restarted, the active complex-workload packages

on the node are halted before the node restarts. Once the node is restarted and joins a cluster,

the active complex-workload package instances on the site with the auto_run flag set to yes

automatically start. If the complex workload's packages have the auto_run flag set to no, you

must manually start these instances on the restarted node.

58 Understanding Failover/Failback scenarios