Users Guide

This advantage of puncturing an array is keeping the system available in production till the redundancy of the array is restored. The data in
the affected stripe is lost whether the RAID puncture occurs or not. The primary disadvantage of this method is that while the array has a
RAID puncture in it, uncorrectable errors will continue to be encountered whenever the impacted data (if any) is accessed.
A RAID puncture can occur in the following three locations:
In blank space that contains no data. That stripe will be inaccessible, but since there is no data in that location, it will have no
significant impact. Any attempts to write to a RAID punctured stripe by an OS will fail and data will be written to a different location.
In a stripe that contains data that isn't critical such as a README.TXT file. If the impacted data is not accessed, no errors are
generated during normal I/O. Attempts to perform a file system backup will fail to backup any files impacted by a RAID
puncture. Performing a Check Consistency or Patrol Read operations will generate Sense code: 3/11/00 for the applicable LBA and/or
stripes.
In data space that is accessed. In such a case, the lost data can cause a variety of errors. T he errors can be minor errors that do not
adversely impact a production environment. The errors can also be more severe and can prevent the system from booting to an
operating system, or cause applications to fail.
An array that is RAID punctured will eventually have to be deleted and recreated to eliminate the RAID puncture. This procedure causes all
data to be erased. The data would then need to be recreated or restored from backup after the RAID puncture is eliminated. The
resolution for a RAID puncture can be scheduled for a time that is more advantageous to needs of the business.
If the data within a RAID punctured stripe is accessed, errors will continue to be reported against the affected bad LBAs with no possible
correction available. Eventually (this could be minutes, days, weeks, months, and so on), the Bad Block Management (BBM) Table will fill
up causing one or more drives to become flagged as predictive failure. As seen in the figure, drive 0 will typically be the drive that gets
flagged as predictive failure due to the errors on drive 1 and drive 2 being propagated to it. Drive 0 may actually be working normally, and
replacing drive 0 will only cause that replacement to eventually be flagged predictive failure as well.
A Check Consistency performed after a RAID puncture is induced will not resolve the issue. This is why it is very important to perform a
Check Consistency on a regular basis. It becomes especially important prior to replacing drives, when possible. The array must be in an
optimal state to perform the Check Consistency.
A RAID array that contains a single data error in conjunction with an additional error event such as a hard drive failure causes a RAID
puncture when the failed or replacement drive is rebuilt into the array. As an example, an optimal RAID 5 array includes three members:
drive 0, drive 1 and drive 2. If drive 0 fails and is replaced, the data and parity remaining on drives 1 and 2 are used to rebuild the missing
information on to the replacement drive 0. However, if a data error exists on drive 1 when the rebuild operation reaches that error, there is
insufficient information within the stripe to rebuild the missing data in that stripe. Drive 0 has no data, drive 1 has bad data and drive 2 has
good data as it is being rebuilt. There are multiple errors within that stripe. Drive 0 and drive 1 do not contain valid data, so any data in that
stripe cannot be recovered and is therefore lost. The result as shown in Figure 3 is that RAID punctures (in stripes 1 and 2) are created
during the rebuild. The errors are propagated to drive 0.
Figure 24. RAID punctures
Puncturing the array restores the redundancy and returns the array to an optimal state. This provides for the array to be protected from
additional data loss in the event of additional errors or drive failures.
How to fix a RAID puncture
Issue:
How to fix RAID arrays that have been subjected to a puncture?
Solution: Complete the following steps to resolve the issue:
WARNING: Following these steps will result in the loss of all data on the array. Ensure that you are
prepared to restore from backup or other means prior to following these steps. Use caution so
that following these steps does not impact any other arrays.
Troubleshooting hardware issues 85