HP StorageWorks P9000 Cluster Extension Software Administrator Guide (TB534-96009, February 2011)

ManualsBrandsHP ManualsComputer AccessoriesHP 3PAR Cluster Extension Windows Array E-LTU

101

102

103

104

105

106

107

108

109

110

3. Restart the node that was shut down.

NOTE: The time to detect a storage outage due to failure of all paths to storage depends

on the setting for no_path_retry in the multipath software configuration. A value of fail

does not queue I/O in the event of a failure in all paths and returns an immediate failure. For

information about the recommended value for your environment, see the DM-Multipath

documentation.

Some resource agents, such as LVM, offer a mechanism called self_fence to take themselves

out of a cluster through node reboot when an underlying logical volume can no longer be

accessed. For supported options, see the RHCS documentation.

A storage replication link is down (RHCS)

If an Cluster Extension configuration uses DR groups with failsafemode enabled, the array disables

access to the disk when it cannot replicate the I/O to the remote array.

In this situation, if a replication link is broken, the resource agents of configured resources, such

as lvm or fs, may be able to detect and take appropriate actions. The stop operation on a service

might fail due to the inability to stop individual resources cleanly because the disk is no longer

accessible for read/write operations. When the stop operation on a service fails, RHCS marks the

service as failed and the service does not automatically fail over to another node.

To recover from this situation, use the following procedure:

1. Remove the node that lost access to the storage by shutting down the node.

2. Follow the steps required to bring up a service in a failed state, as documented in the RHCS

administration guide. This process involves disabling the service, and then enabling it on the

node where the service is allowed to come online.

3. Restart the node that was shut down.

NOTE: The time to detect a storage outage due to failure of all paths to storage depends

on the setting for no_path_retry in the multipath software configuration. A value of fail

does not queue I/O in the event of a failure in all paths and returns an immediate failure. For

information about the recommended value for your environment, see the DM-Multipath

documentation.

Some resource agents, such as LVM, offer a mechanism called self_fence to take themselves

out of a cluster through node reboot when an underlying logical volume can no longer be

accessed. For supported options, see the RHCS documentation.

A data center is down (SLE HA and RHCS)

RHCS and SLE HA expect an acknowledgement from the fencing device before services are failed

over to another node. In the event of complete site failure, including fencing devices, clusters do

not automatically fail over services to surviving cluster nodes at the remote site. Manual intervention

is required in this situation. For instructions on bringing a service online, see the cluster software

documentation.

Pair/resync monitor messages in syslog/errorlog/messages/event log

Using the pair/resync monitor will cause a message in the system log file of your operating system

(for any non-PAIR state of the device/copy group being monitored). Those messages might indicate

the following:

• The RAID Manager instance is not running or cannot be used to gather device/copy group

state information.

• The device/copy group is not in the PAIR state.

Pair/resync monitor messages in syslog/errorlog/messages/event log 103