HP Smart Array 6400 Series Controllers Support Guide, September 2007

ManualsBrandsHP ManualsNetwork HardwareHP Smart Array (RAID) Host Bus Adapters

Compromised Fault Tolerance

Compromised fault tolerance commonly occurs when more physical disks have failed than the

fault tolerance method can support. When fault tolerance fails, the logical volume also fails and

unrecoverable disk error messages are returned to the host. Data loss is likely to occur.

For example, one drive fails in an array configured with RAID 5 fault tolerance while another

drive in the same array is still being rebuilt. If the array has no online spare, the logical drive

fails.

Compromised fault tolerance can also be caused by non-disk problems, such as temporary power

loss to a storage system or a faulty cable. In such cases, the physical disks do not need to be

replaced. However, data can still be lost, especially if the system is busy at the time the problem

occurs.

Recovering from Fault Tolerance Failures

When fault tolerance has been compromised, inserting replacement disks does not improve the

condition of the logical drive. Instead, if your screen displays unrecoverable error messages,

follow these steps to recover data:

1. Power down the StorageWorks disk enclosure, and then power it back up. In some cases, a

marginal drive will work again for long enough to allow you to make copies of important

files.

2. Make copies of important data if possible.

3. Replace any failed disks.

4. After the failed disks have been replaced, the fault tolerance can again be compromised. If

so, power the disk enclosure off and back on again.

5. If you were not able to recover your data using the power-cycling procedure, you must

restore your data from backup media. Run the sautil <device_file>

accept_media_xchg <logical_drive_number> command on the affected logical

drive. This restores the logical drive’s configuration.

6. Restore your data from backup media. See “The sautil <device_file> accept_media_xchg

<logical_drive_number> Command” (page 63).

To minimize the risk of data loss due to compromised fault tolerance, make frequent backups

of all logical volumes.

Automatic Data Recovery

Automatic data recovery is an automatic background process that rebuilds data onto a spare or

replacement physical disk when another physical disk in the array fails. This process is also

called a rebuild.

If a disk in a fault-tolerant configuration is replaced while system power is off, a message is

displayed during the next system startup. This message states that an automatic data recovery

procedure has been initiated.

When automatic data recovery finishes, the Online LED of the replacement drive stops blinking

and glows steadily.

Approximately 15 minutes is required to rebuild each gigabyte. The actual rebuild time depends

on the following:

• The level of rebuild priority set for the logical drive (saconfig automatically sets the priority

to high)

• The amount of I/O activity occurring during the rebuild operation

• The speed of the physical disk

• The number of disks in the array (for RAID 5 and RAID ADG only)

For example, the rebuild time when using 9 GB Wide-Ultra disk drives in a RAID 5 configuration

varies from ten minutes per gigabyte (for three drives) to 20 minutes per gigabyte (for 14 drives).

70 Physical Disk Installation and Replacement