Users Guide

Table Of Contents
method is that while the array has a RAID puncture in it, uncorrectable errors will continue to be encountered whenever the
impacted data (if any) is accessed.
A RAID puncture can occur in the following three locations:
In blank space that contains no data. That stripe will be inaccessible, but since there is no data in that location, it will have no
significant impact. Any attempts to write to a RAID punctured stripe by an OS will fail and data will be written to a different
location.
In a stripe that contains data that isn't critical such as a README.TXT file. If the impacted data is not accessed, no errors
are generated during normal I/O. Attempts to perform a file system backup will fail to backup any files impacted by a RAID
puncture. Performing a Check Consistency or Patrol Read operations will generate Sense code: 3/11/00 for the applicable
LBA and/or stripes.
In data space that is accessed. In such a case, the lost data can cause a variety of errors. T he errors can be minor errors
that do not adversely impact a production environment. The errors can also be more severe and can prevent the system
from booting to an operating system, or cause applications to fail.
An array that is RAID punctured will eventually have to be deleted and recreated to eliminate the RAID puncture. This procedure
causes all data to be erased. The data would then need to be recreated or restored from backup after the RAID puncture is
eliminated. The resolution for a RAID puncture can be scheduled for a time that is more advantageous to needs of the business.
If the data within a RAID punctured stripe is accessed, errors will continue to be reported against the affected bad LBAs with
no possible correction available. Eventually (this could be minutes, days, weeks, months, and so on), the Bad Block Management
(BBM) Table will fill up causing one or more drives to become flagged as predictive failure. As seen in the figure, drive 0 will
typically be the drive that gets flagged as predictive failure due to the errors on drive 1 and drive 2 being propagated to it. Drive
0 may actually be working normally, and replacing drive 0 will only cause that replacement to eventually be flagged predictive
failure as well.
A Check Consistency performed after a RAID puncture is induced will not resolve the issue. This is why it is very important to
perform a Check Consistency on a regular basis. It becomes especially important prior to replacing drives, when possible. The
array must be in an optimal state to perform the Check Consistency.
A RAID array that contains a single data error in conjunction with an additional error event such as a hard drive failure causes
a RAID puncture when the failed or replacement drive is rebuilt into the array. As an example, an optimal RAID 5 array includes
three members: drive 0, drive 1 and drive 2. If drive 0 fails and is replaced, the data and parity remaining on drives 1 and 2 are
used to rebuild the missing information on to the replacement drive 0. However, if a data error exists on drive 1 when the rebuild
operation reaches that error, there is insufficient information within the stripe to rebuild the missing data in that stripe. Drive 0
has no data, drive 1 has bad data and drive 2 has good data as it is being rebuilt. There are multiple errors within that stripe.
Drive 0 and drive 1 do not contain valid data, so any data in that stripe cannot be recovered and is therefore lost. The result as
shown in Figure 3 is that RAID punctures (in stripes 1 and 2) are created during the rebuild. The errors are propagated to drive
0.
Figure 24. RAID punctures
Puncturing the array restores the redundancy and returns the array to an optimal state. This provides for the array to be
protected from additional data loss in the event of additional errors or drive failures.
How to fix a RAID puncture
Issue:
How to fix RAID arrays that have been subjected to a puncture?
Solution: Complete the following steps to resolve the issue:
Troubleshooting hardware issues 89