Users Guide

1. Discard Preserved Cache, if it exists.
2. Clear foreign configurations, if any.
3. Delete the array.
4. Shift the position of the drives by one.
Move Disk 0 to slot 1, Disk 1 to slot 2, and Disk 2 to slot 0.
5. Recreate the array as desired.
6. Perform a Full Initialization of the array (not a Fast Initialization).
7. Perform a Check Consistency on the array.
If the Check Consistency completes without errors, you can safely assume that the array is now healthy and the
puncture is removed. Data can now be restored to the healthy array.
Preventing problems before they happen and solving punctures after
they occur
Dell's RAID controllers contain a number of features to prevent many types of problems and to handle a variety of errors that do occur.
The primary job of a RAID controller is to preserve the integrity of the data contained on its array(s). Even in the more extreme cases of
damage (such as punctures), the array's data is often available and the server can remain in production. Part of any maintenance plan
should be the proactive maintenance of the RAID arrays. Dell's RAID controllers are highly reliable and very good at managing its arrays
without user intervention. Disregarding proper maintenance can cause even the most sophisticated technologies to experience problems
over time. There are a number of things that can help maintain the health of arrays, and prevent the majority of data errors, double faults
and punctures.
It is highly recommended to perform routine and regular maintenance. Proactive maintenance can correct existing errors, and prevent
some errors from occurring. It is not possible to prevent all errors from occurring, but most serious errors can be mitigated significantly
with proactive maintenance. For storage and RAID subsystems these steps are:
Update drivers and firmware on controllers, hard drives, backplanes and other devices.
Perform routine Check Consistency operations (Dell recommends every 30 days).
Inspect cabling for signs of wear and damage and ensure good connections.
Review logs for indications of problems.
This doesn’t have to be a high level technical review, but could simply be a cursory view of the logs looking for extremely obvious
indications of potential problems. Contact Dell Technical Support with any questions or concerns.
Troubleshooting thermal issue
Thermal issues can occur due to malfunctioning ambient temperature sensors, malfunctioning fans, dusty heat sinks, and malfunctioning
thermal sensors and so on.
To resolve the thermal issues:
1. Check the LCD and Embedded System Management (ESM) logs for any additional error messages to identify the faulty component.
2. Ensure that airflow to the machine is not blocked. Placing it in an enclosed area or blocking the air vent, can cause it to overheat. If
installed in a rack, ensure that the rack cooling system is working normally.
3. Check for the ambient temperature is within acceptable levels.
4. Check the internal system fans for obstructions and ensure that all fans are spinning properly. Swap any failing fans with a known-
good fan for testing.
5. Ensure that all the required shrouds and blanks are installed.
6. Check if all the fans are functioning properly, the heat sink is installed correctly, and thermal grease is applied.
Input/Output errors while reseating SAS IOM
storage sled on hardware configurations
Reseating SAS IOM/ storage sled on the following hardware configurations, setup as Failover clusters with shared storage and multi path
enabled, results in IO errors. MX7000 chassis with compute nodes as cluster nodes and MX5016s sled for
86
Troubleshooting hardware issues