HP Smart Array 6400 Series Controllers Support Guide, September 2007
Compromised Fault Tolerance
Compromised fault tolerance commonly occurs when more physical disks have failed than the
fault tolerance method can support. When fault tolerance fails, the logical volume also fails and
unrecoverable disk error messages are returned to the host. Data loss is likely to occur.
For example, one drive fails in an array configured with RAID 5 fault tolerance while another
drive in the same array is still being rebuilt. If the array has no online spare, the logical drive
fails.
Compromised fault tolerance can also be caused by non-disk problems, such as temporary power
loss to a storage system or a faulty cable. In such cases, the physical disks do not need to be
replaced. However, data can still be lost, especially if the system is busy at the time the problem
occurs.
Recovering from Fault Tolerance Failures
When fault tolerance has been compromised, inserting replacement disks does not improve the
condition of the logical drive. Instead, if your screen displays unrecoverable error messages,
follow these steps to recover data:
1. Power down the StorageWorks disk enclosure, and then power it back up. In some cases, a
marginal drive will work again for long enough to allow you to make copies of important
files.
2. Make copies of important data if possible.
3. Replace any failed disks.
4. After the failed disks have been replaced, the fault tolerance can again be compromised. If
so, power the disk enclosure off and back on again.
5. If you were not able to recover your data using the power-cycling procedure, you must
restore your data from backup media. Run the sautil <device_file>
accept_media_xchg <logical_drive_number> command on the affected logical
drive. This restores the logical drive’s configuration.
6. Restore your data from backup media. See “The sautil <device_file> accept_media_xchg
<logical_drive_number> Command” (page 63).
To minimize the risk of data loss due to compromised fault tolerance, make frequent backups
of all logical volumes.
Automatic Data Recovery
Automatic data recovery is an automatic background process that rebuilds data onto a spare or
replacement physical disk when another physical disk in the array fails. This process is also
called a rebuild.
If a disk in a fault-tolerant configuration is replaced while system power is off, a message is
displayed during the next system startup. This message states that an automatic data recovery
procedure has been initiated.
When automatic data recovery finishes, the Online LED of the replacement drive stops blinking
and glows steadily.
Approximately 15 minutes is required to rebuild each gigabyte. The actual rebuild time depends
on the following:
• The level of rebuild priority set for the logical drive (saconfig automatically sets the priority
to high)
• The amount of I/O activity occurring during the rebuild operation
• The speed of the physical disk
• The number of disks in the array (for RAID 5 and RAID ADG only)
For example, the rebuild time when using 9 GB Wide-Ultra disk drives in a RAID 5 configuration
varies from ten minutes per gigabyte (for three drives) to 20 minutes per gigabyte (for 14 drives).
70 Physical Disk Installation and Replacement










