Managing HP Serviceguard A.11.20.20 for Linux, May 2013

ManualsBrandsHP ManualsSoftwareHP SAP Linux Serviceguard Cluster Extension

251

252

253

254

255

256

257

258

259

260

• All cables

• Disk interface cards

Some monitoring can be done through simple physical inspection, but for the most comprehensive

monitoring, you should examine the system log file (/var/log/messages) periodically for reports

on all configured HA devices. The presence of errors relating to a device will show the need for

maintenance.

8.3 Replacing Disks

The procedure for replacing a faulty disk mechanism depends on the type of disk configuration

you are using. Refer to your Smart Array documentation for issues related to your Smart Array.

8.3.1 Replacing a Faulty Mechanism in a Disk Array

You can replace a failed disk mechanism by simply removing it from the array and replacing it

with a new mechanism of the same type. The resynchronization is handled by the array itself.

There may be some impact on disk performance until the resynchronization is complete. For details

on the process of hot plugging disk mechanisms, refer to your disk array documentation.

8.3.2 Replacing a Lock LUN

You can replace an unusable lock LUN while the cluster is running. You can do this without any

cluster reconfiguration if you do not change the devicefile name; or, if you do need to change the

devicefile, you can do the necessary reconfiguration while the cluster is running.

If you need to use a different devicefile, you must change the name of the devicefile in the cluster

configuration file; see “Updating the Cluster Lock LUN Configuration Online” (page 233).

CAUTION: Before you start, make sure that all nodes have logged a message such as the following

in syslog:

WARNING: Cluster lock LUN /dev/sda1 is corrupt: bad label. Until this

situation is corrected, a single failure could cause all nodes in the

cluster to crash.

Once all nodes have logged this message, use a command such as the following to specify the

new cluster lock LUN:

cmdisklock reset /dev/sda1

CAUTION: You are responsible for determining that the device is not being used by LVM or any

other subsystem on any node connected to the device before using cmdisklock. If you use

cmdisklock without taking this precaution, you could lose data.

NOTE: cmdisklock is needed only when you are repairing or replacing a lock LUN; see the

cmdisklock (1m) manpage for more information.

Serviceguard checks the lock LUN every 75 seconds. After using the cmdisklock command,

review the syslog file of an active cluster node for not more than 75 seconds. By this time you

should see a message showing that the lock disk is healthy again.

8.4 Revoking Persistent Reservations after a Catastrophic Failure

For information about persistent reservations (PR) and how they work, see “About Persistent

Reservations” (page 72).

Under normal circumstances, Serviceguard clears all persistent reservations when a package halts.

In the case of a catastrophic cluster failure however, you may need to do the cleanup yourself as

8.3 Replacing Disks 251