HP A7143A RAID160 SA Controller Support Guide, February 2007
Hard Drive Installation and Replacement
Compromised Fault Tolerance
Appendix C
104
Compromised Fault Tolerance
Compromised fault tolerance commonly occurs when more physical disks have failed than the fault-tolerance
method can endure. In this case, the logical volume is failed and unrecoverable disk error messages are
returned to the host. Data loss is likely to occur.
An example of this situation would be an array configured with RAID 5 fault tolerance, in which one drive
fails, while another drive in the same array is still being rebuilt. If the array has no online spare, the logical
drive will fail.
Compromised fault tolerance may also be caused by non-drive problems, such as temporary power loss to a
storage system or a faulty cable. In such cases, the physical drives do not need to be replaced. However, data
may still have been lost, especially if the system was busy at the time that the problem occurred.
Example C-1 Procedure to Attempt Recovery
When fault tolerance has been compromised, inserting replacement drives does not improve the condition of
the logical volume. Instead, if your screen displays unrecoverable error messages, try the following procedure
to recover data.
Power down the StorageWorks™ disk enclosure, and then power it back up. In some cases, a marginal drive
will work again for long enough to allow you to make copies of important files.
Make copies of important data, if possible.
Replace any failed disks.
After the failed disks have been replaced, the fault tolerance may again be compromised. If so, cycle the power
again.
Run the sautil <device_file> accept_media_xchg <logical_drive_number> command on the affected
logical drive. This will restore the logical drive’s configuration. Now restore your data from backup media (see
“sautil <device_file> accept_media_xchg <logical_drive_number>” on page 91).
To minimize the risk of data loss due to compromised fault tolerance, make frequent backups of all logical
volumes.










