HP StorageWorks P9000 Cluster Extension Software Administrator Guide (TB534-96009, February 2011)
Using the Domain user account (Windows Server 2008/2008 R2 only)
When using the Domain user account to manage the cluster, modifying HORCM files might not be
possible, and P9000 Cluster Extension tools might not run as expected. If you experience any of
these issues, turn off UAC.
To turn off UAC, select Control panel→User Accounts, and click Turn User Account Control on or
off. Clear the User Account Control (UAC) to help protect your computer check box. This might
resolve the issue and allow you to use the P9000 Cluster Extension tools with the Domain user
account.
Linux-specific error handling
P9000 Cluster Extension messages are logged by RHCS and SLE HA to the following location:
/var/log/messages. The P9000 Cluster Extension log file is called clxxplxcs.log.
Failover errors
P9000 Cluster Extension will fail to bring an RHCS service or SLE HA resource group online on
the local system if a configuration error occurs. In this case, P9000 Cluster Extension returns a
local error.
The RHCS service or SLE HA resource group will go into a failed state after a startup attempt on
any system in the same data center if the disk array status indicates that a problem experienced
locally would not be solved on another system connected to the same disk array. In this case,
P9000 Cluster Extension returns a data center error. This error could also occur if the
ApplicationStartup object is set to FASTFAILBACK.
If a disk array state that does not allow starting the RHCS service or SLE HA resource group on
any system in the cluster is discovered, a cluster error is reported and none of the systems will be
allowed to run the service or resource group. Such a state could be an SMLP state on both primary
and secondary disks, a suspended (PSUS/SSUS) state on either site, or a state mismatch in the
device/copy group for this RHCS service or SLE HA resource group. None of these scenarios
allows automatic recovery because P9000 Cluster Extension cannot determine which copy of the
data is the most current. In these cases, a storage or cluster administrator must investigate the
problem.
CAUTION: Do not start the RHCS service or SLE HA resource group again or try to start the failed
RHCS service or SLE HA resource group without investigating the problem. When an RHCS service
or SLE HA resource group using P9000 Cluster Extension fails, check the status of the disk pair on
each copy and decide whether it is safe to continue.
The FC link is down (RHCS)
In RHCS, the detection of a storage outage due to failure of all paths to the storage depends on
the monitoring capability of resources configured in the RHCS service. For example, the LVM and
filesystem resource agents distributed with RHCS can detect the loss of storage and take appropriate
actions. The stop operation on a service might fail due to the inability to stop individual resources
cleanly. This may be caused by the loss of paths to the storage. When the stop operation on a
service fails, RHCS marks the service as failed and the service does not automatically fail over to
another node.
To recover from this situation, use the following procedure:
1. Remove the node that lost access to the storage by shutting down the node.
2. Follow the steps required to bring up a service in a failed state, as documented in the RHCS
administration guide. This process involves disabling the service, and then enabling it on the
node where the service is allowed to come online.
102 Troubleshooting