User guide

Recovering From Hard Drive Failure
During Automatic Data Recovery, if the online LED of the replacement drive
stops blinking and all other drives in the array are still online, the Automatic
Data Recovery process may have been abnormally terminated due to an non-
correctable read error from another physical drive during the recovery process.
The background Auto-Reliability Monitoring process is meant to help prevent
this problem. Reboot the system and a POST message should confirm the
diagnosis. Retrying Automatic Data Recovery may possibly help. If not, a
backup of all data on the system, surface analysis (using User Diagnostics),
and restore is the recommended course of action in this unfortunate situation.
During Automatic Data Recovery, if the online LED of the replacement drive
stops blinking and the replacement drive is failed (amber failure LED is
illuminated or other LEDs go out), the replacement drive is producing
unrecoverable disk errors. In this case, the replacement drive should be
removed and replaced with another replacement drive.
If fault tolerance is compromised due to failure of multiple drives, the
condition of the logical drive will be failed and unrecoverable errors will be
returned to the host. Data loss is probable. Insertion of replacement drives at
this time will not improve the condition of the logical drive. If this occurs, first
try turning the entire system off and on. In some cases, an intermittent drive
will appear to work again (perhaps long enough to make copies of
important files) after cycling power. If a 1779 POST message is displayed,
press F2 to re-enable the logical drives. Remember that data loss has likely
occurred and any data on the logical drive is suspect.
Fault tolerance may be compromised due to non-drive problems such as a
faulty cable, faulty storage system power supply, or a user accidentally turning
off an external storage system while the host system power was on. In such
cases, obviously the physical drives do not need to be replaced. However, data
loss can still occur in this situation, especially if the system was busy at the
time the problem developed.
In cases of legitimate drive failure, once copies of important data have been
made (if possible), replace any drives that have failed to prevent further drive
problems in the future. After these (multiple) drives are replaced, the fault
tolerance may again be compromised, power may need to be cycled, and the
1779 POST message may again be displayed. Press F2 to re-enable the logical
drives, recreate your partitions, and restore all data from backup.
Because of the risk that fault tolerance may be compromised at some point in
the future, make regular backups of all logical drives.