Managing HP Serviceguard A.11.20.20 for Linux, May 2013

All cables
Disk interface cards
Some monitoring can be done through simple physical inspection, but for the most comprehensive
monitoring, you should examine the system log file (/var/log/messages) periodically for reports
on all configured HA devices. The presence of errors relating to a device will show the need for
maintenance.
8.3 Replacing Disks
The procedure for replacing a faulty disk mechanism depends on the type of disk configuration
you are using. Refer to your Smart Array documentation for issues related to your Smart Array.
8.3.1 Replacing a Faulty Mechanism in a Disk Array
You can replace a failed disk mechanism by simply removing it from the array and replacing it
with a new mechanism of the same type. The resynchronization is handled by the array itself.
There may be some impact on disk performance until the resynchronization is complete. For details
on the process of hot plugging disk mechanisms, refer to your disk array documentation.
8.3.2 Replacing a Lock LUN
You can replace an unusable lock LUN while the cluster is running. You can do this without any
cluster reconfiguration if you do not change the devicefile name; or, if you do need to change the
devicefile, you can do the necessary reconfiguration while the cluster is running.
If you need to use a different devicefile, you must change the name of the devicefile in the cluster
configuration file; see “Updating the Cluster Lock LUN Configuration Online” (page 233).
CAUTION: Before you start, make sure that all nodes have logged a message such as the following
in syslog:
WARNING: Cluster lock LUN /dev/sda1 is corrupt: bad label. Until this
situation is corrected, a single failure could cause all nodes in the
cluster to crash.
Once all nodes have logged this message, use a command such as the following to specify the
new cluster lock LUN:
cmdisklock reset /dev/sda1
CAUTION: You are responsible for determining that the device is not being used by LVM or any
other subsystem on any node connected to the device before using cmdisklock. If you use
cmdisklock without taking this precaution, you could lose data.
NOTE: cmdisklock is needed only when you are repairing or replacing a lock LUN; see the
cmdisklock (1m) manpage for more information.
Serviceguard checks the lock LUN every 75 seconds. After using the cmdisklock command,
review the syslog file of an active cluster node for not more than 75 seconds. By this time you
should see a message showing that the lock disk is healthy again.
8.4 Revoking Persistent Reservations after a Catastrophic Failure
For information about persistent reservations (PR) and how they work, see About Persistent
Reservations” (page 72).
Under normal circumstances, Serviceguard clears all persistent reservations when a package halts.
In the case of a catastrophic cluster failure however, you may need to do the cleanup yourself as
8.3 Replacing Disks 251