Intel® RAID Basic Troubleshooting Guide Technical Summary Document Revision 2.
Revision History Intel® RAID Basic Troubleshooting Guide Revision History Date April, 2008 Revision Number 1.0 June, 2009 2.0 Modifications Initial Release Update the RAID Log extraction method and the detail explanation for VD, PD and BBU related RAID events Disclaimers ® Information in this document is provided in connection with Intel products. No license, express or implied, by estoppel or otherwise, to any intellectual property rights is granted by this document.
Intel® RAID Basic Troubleshooting Guide Table of Contents Table of Contents 1. Introduction ..........................................................................................................................7 1.1 2. 3. 4. Purpose of this Document ............................................................................................... 7 Drive State Definition ...........................................................................................................8 2.
List of Figures Intel® RAID Basic Troubleshooting Guide List of Figures Figure 1. Troubleshooting Flow Chart................................................................................................................ 10 iv Revision 2.
Intel® RAID Basic Troubleshooting Guide List of Tables List of Tables Table 1. Common Problems and Solutions......................................................................................................... 15 Table 2. Physical Drive Related Events and Messages....................................................................................... 15 Table 3. BBU Related Events and Messages......................................................................................................
Intel® RAID Basic Troubleshooting Guide Introduction 1. Introduction 1.1 Purpose of this Document This troubleshooting guide is designed to provide information on basic troubleshooting for Intel® SAS/SATA RAID Controller related issues. It is designed for use by knowledgeable system integrators and is not intended to address broader system related failures. This guide provides a high level review of troubleshooting options that can be used to identify and resolve RAID related problems or failures.
Drive State Definition Intel® RAID Basic Troubleshooting Guide 2. Drive State Definition 2.1 Physical Drive (PD) State The SAS Software Stack firmware defines the following states for physical disks connected to the controller. 2.2 Unconfigured Good – A disk is accessible to the RAID controller but is not configured as part of a virtual disk. For example, a new drive inserted into a system. Online – A disk accessible to the RAID controller and configured as part of a virtual disk.
Intel® RAID Basic Troubleshooting Guide Tips and Tricks 3. Tips and Tricks 3.1 Setup Tips Check cables for proper connection. Verify that all the cable ends are properly seated and the pins are not bent. Verify that an approved cable is used. Cables must be speed compatible and meet signal integrity specifications. Note: SATA cables are designed to connect directly from the RAID controller to the hard drive or drive enclosure. 3.
Troubleshooting Intel® RAID Basic Troubleshooting Guide 4. Troubleshooting You should never rely on a RAID subsystem as your only disaster protection. Always keep an independent backup of critical data in a separate physical location. If there is an issue, gather as much information as possible and evaluate all options before shutting down, restarting the system, or taking any action that will change either the status of a physical or logical drive or the RAID configuration. 4.
Intel® RAID Basic Troubleshooting Guide Troubleshooting Listed below are some basic troubleshooting scenarios and guidance. For more information please refer to http://support.intel.com. 4.1.1 If there is an issue, follow these steps before any other actions are taken: - Do not reboot the system during a drive rebuild or if a drive is offline until the issue is identified or all other troubleshooting efforts have been exhausted. Make sure a verified backup is available. 4.1.
Troubleshooting - Intel® RAID Basic Troubleshooting Guide Failed physical drive. Excessive number of hard drive grown defects or hard drive block redirection events. Unexpected sense code errors (such as drive medium errors). Data bus errors. Power interruptions or an unexpected reboot. Processor, power supply, drive enclosure, or hard drive thermal issues.
Intel® RAID Basic Troubleshooting Guide - Troubleshooting If the virtual disk drive is degraded, verify the status of the physical drives from within the RAID management tool. If a drive has failed and a hot spare is present, determine if the hot spare is on line and a rebuild has started. If a drive has failed and a hot spare is not present, remove the failed drive and replace it with a drive of the same or larger capacity. Do not reuse a previously failed drive.
Troubleshooting - - 4.2 Intel® RAID Basic Troubleshooting Guide Provide exact system configuration including Firmware and BIOS versions, system memory configuration, RAID configuration, and configuration of other adapters in the system. List the steps to reproduce the failure and include a history of the system, the simplest failure mode, and all troubleshooting completed. Provide a copy of all available logs.
Intel® RAID Basic Troubleshooting Guide 4.3 Troubleshooting Questions and Answers Table 1. Common Problems and Solutions Problem RAID controller not detected by the OS management Utility or detected during POST Virtual Disk Drive Degraded Possible Causes Action RAID controller is not seated properly. Reseat controller. Bad memory on controller. Replace memory (if configurable). Bad controller. Replace controller. Physical drive marked failed.
Troubleshooting Intel® RAID Basic Troubleshooting Guide Problem Possible Causes Action enclosure, and server board and update accordingly. Grown Defects and Bad Block Redirection Errors 4.4 Failing hard disk drive. Replace hard drive. FAQ Table 2. Physical Drive Related Events and Messages Message Description WARNING: Removed: PD 08(e1/s0) Check the cable, power connection, backplane, SATA/SAS port, and the hard drive, to find out why the specific physical drive is plugged out.
Intel® RAID Basic Troubleshooting Guide Troubleshooting Table 3. BBU Related Events and Messages Message Description WARNING:BBU disabled; changing WB virtual disks to WT The BBU is not connected or is not fully charged. You can still use the Bad BBU mode under RAID Web Console 2 to enable Write Back mode on Virtual disks. Unexpected power failure may cause data loss. Wait until the BBU is fully charged before rebooting the system.
Appendix A: PD Related RAID Event Annotation Intel® RAID Basic Troubleshooting Guide Appendix A: PD Related RAID Event Annotation The following table lists the Intel® RAID Web Console 2 PD related event log messages Num Type Description Indication Actions Background Initialization detected uncorrectable multiple medium errors (%s at %lx on %s) B 1,2,3,4 50 F 51 C Background Initialization failed on %s A, 2,3 60 F Consistency Check detected uncorrectable multiple medium errors (%s at %lx on
Intel® RAID Basic Troubleshooting Guide Appendix A: PD Related RAID Event Annotation Type W=Warning, C=Critical, F=Fatal, D=Dead Indication / possible causes A) A specific physical drive failed. B) This is likely a hard drive issue. C) This is likely a cable connection / backplane / vibration issue. D) The messages might appear before the full background initialization is completed, or if drives have errors. E) The specific virtual disk failed due to medium errors.
Appendix B: VD Related RAID Event Annotation Intel® RAID Basic Troubleshooting Guide Appendix B: VD Related RAID Event Annotation The following table lists the Intel® RAID Web Console 2 VD related event log messages ID Type Description Indication Actions 61 C Consistency Check failed on %s A, N/A 62 F Consistency Check completed with uncorrectable errors on %s A, N/A 64 W Consistency Check inconsistency logging disabled on %s (too many inconsistencies) A, B, 1,2,3,4 79 F Reconstructi
Intel® RAID Basic Troubleshooting Guide Appendix B: VD Related RAID Event Annotation M) Global affinity hot spare usually is for a virtual disk in the same enclosure. This log could be recorded if action is planned to commission the Global affinity Hot Spare in a different enclosure.
Appendix C: BBU Related RAID Event Annotation Intel® RAID Basic Troubleshooting Guide Appendix C: BBU Related RAID Event Annotation The following table lists the Intel® RAID Web Console 2 BBU related event log messages.
Intel® RAID Basic Troubleshooting Guide Reference Documents Appendix D: Reference Documents Refer to the following documents for additional information: Intel® RAID Controller Command Line Tool 2 User Guide, Version 1.0. Intel® RAID Controller SAS-SATA Logged Alert Decode, Version 1.0 Revision 2.