Users Guide

Table Of Contents

method is that while the array has a RAID puncture in it, uncorrectable errors will continue to be encountered whenever the

impacted data (if any) is accessed.

A RAID puncture can occur in the following three locations:

● In blank space that contains no data. That stripe will be inaccessible, but since there is no data in that location, it will have no

significant impact. Any attempts to write to a RAID punctured stripe by an OS will fail and data will be written to a different

location.

● In a stripe that contains data that isn't critical such as a README.TXT file. If the impacted data is not accessed, no errors

are generated during normal I/O. Attempts to perform a file system backup will fail to backup any files impacted by a RAID

puncture. Performing a Check Consistency or Patrol Read operations will generate Sense code: 3/11/00 for the applicable

LBA and/or stripes.

● In data space that is accessed. In such a case, the lost data can cause a variety of errors. T he errors can be minor errors

that do not adversely impact a production environment. The errors can also be more severe and can prevent the system

from booting to an operating system, or cause applications to fail.

An array that is RAID punctured will eventually have to be deleted and recreated to eliminate the RAID puncture. This procedure

causes all data to be erased. The data would then need to be recreated or restored from backup after the RAID puncture is

eliminated. The resolution for a RAID puncture can be scheduled for a time that is more advantageous to needs of the business.

If the data within a RAID punctured stripe is accessed, errors will continue to be reported against the affected bad LBAs with

no possible correction available. Eventually (this could be minutes, days, weeks, months, and so on), the Bad Block Management

(BBM) Table will fill up causing one or more drives to become flagged as predictive failure. As seen in the figure, drive 0 will

typically be the drive that gets flagged as predictive failure due to the errors on drive 1 and drive 2 being propagated to it. Drive

0 may actually be working normally, and replacing drive 0 will only cause that replacement to eventually be flagged predictive

failure as well.

A Check Consistency performed after a RAID puncture is induced will not resolve the issue. This is why it is very important to

perform a Check Consistency on a regular basis. It becomes especially important prior to replacing drives, when possible. The

array must be in an optimal state to perform the Check Consistency.

A RAID array that contains a single data error in conjunction with an additional error event such as a hard drive failure causes

a RAID puncture when the failed or replacement drive is rebuilt into the array. As an example, an optimal RAID 5 array includes

three members: drive 0, drive 1 and drive 2. If drive 0 fails and is replaced, the data and parity remaining on drives 1 and 2 are

used to rebuild the missing information on to the replacement drive 0. However, if a data error exists on drive 1 when the rebuild

operation reaches that error, there is insufficient information within the stripe to rebuild the missing data in that stripe. Drive 0

has no data, drive 1 has bad data and drive 2 has good data as it is being rebuilt. There are multiple errors within that stripe.

Drive 0 and drive 1 do not contain valid data, so any data in that stripe cannot be recovered and is therefore lost. The result as

shown in Figure 3 is that RAID punctures (in stripes 1 and 2) are created during the rebuild. The errors are propagated to drive

Figure 24. RAID punctures

Puncturing the array restores the redundancy and returns the array to an optimal state. This provides for the array to be

protected from additional data loss in the event of additional errors or drive failures.

How to fix a RAID puncture

Issue:

How to fix RAID arrays that have been subjected to a puncture?

Solution: Complete the following steps to resolve the issue:

Troubleshooting hardware issues 89