HP A7143A RAID160 SA Controller Support Guide

Hard Drive Installation and Replacement
Compromised Fault Tolerance
Appendix C146
Compromised Fault Tolerance
Compromised fault tolerance commonly occurs when more physical disks
have failed than the fault-tolerance method can endure. In this case, the
logical volume is failed and unrecoverable disk error messages are
returned to the host. Data loss is likely to occur.
An example of this situation would be an array configured with RAID 5
fault tolerance, in which one drive fails, while another drive in the same
array is still being rebuilt. If the array has no online spare, the logical
drive will fail.
Compromised fault tolerance may also be caused by non-drive problems,
such as temporary power loss to a storage system or a faulty cable. In
such cases, the physical drives do not need to be replaced. However, data
may still have been lost, especially if the system was busy at the time
that the problem occurred.
Example C-1 Procedure to Attempt Recovery
When fault tolerance has been compromised, inserting replacement
drives does not improve the condition of the logical volume. Instead, if
your screen displays unrecoverable error messages, try the following
procedure to recover data.
Power down the StorageWorks disk enclosure, and then power it back
up. In some cases, a marginal drive will work again for long enough to
allow you to make copies of important files.
Make copies of important data, if possible.
Replace any failed disks.
After the failed disks have been replaced, the fault tolerance may again
be compromised. If so, cycle the power again.
Run the sautil <device_file> accept_media_xchg
<logical_drive_number> command on the affected logical drive. This
will restore the logical drive’s configuration. Now restore your data from
backup media (see “sautil <device_file> accept_media_xchg
<logical_drive_number>” on page 129).
To minimize the risk of data loss due to compromised fault tolerance,
make frequent backups of all logical volumes.