Reference Guide

ManualsBrandsDell ManualsMotherboardsOpenManage Server Administrator Version 5.1

www.dell.com | support.dell.com

Dell OpenManage™ Server

Administrator

Messages Reference Guide

Summary of content (132 pages)

PAGE 1
Dell OpenManage™ Server Administrator Messages Reference Guide w w w. d e l l . c o m | s u p p o r t . d e l l .
PAGE 2
Notes and Notices NOTE: A NOTE indicates important information that helps you make better use of your computer. NOTICE: A NOTICE indicates either potential damage to hardware or loss of data and tells you how to avoid the problem. ____________________ Information in this document is subject to change without notice. © 2003–2006 Dell Inc. All rights reserved. Reproduction in any manner whatsoever without the written permission of Dell Inc. is strictly forbidden.
PAGE 3
Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . What’s New in this Release . . . . . . . . . . . . . . . . . . . . . . . . . . . . Messages Not Described in This Guide . 7 . . . . . . . . . . . . . . . . . . . . . 7 Understanding Event Messages . . . . . . . . . . . . . . . . . . . . . . . . . . 8 Sample Event Message Text . . . . . . . . . . . . . . . . . . . . . . . . . 9 Viewing Alerts and Event Messages . . . . . . . . . . . . . . . . . . . . . .
PAGE 4
Pluggable Device Messages Battery Sensor Messages 3 . . . . . . . . . . . . . . . . . . . . . . . . . . 41 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 System Event Log Messages for IPMI Systems . . . . . . . . . 45 . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 Temperature Sensor Events Voltage Sensor Events . Fan Sensor Events . . . . . . .
PAGE 5
4 Storage Management Message Reference Alert Monitoring and Logging . . . . . . . . . . . 59 . . . . . . . . . . . . . . . . . . . . . . . . . . 59 . . . . . . . . . . . . . . 59 . . . . . . . . . . . . . . . . . . . . . . . . . 62 Alert Message Format with Substitution Variables . Alert Message Change History Alert Descriptions and Corrective Actions Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Contents
PAGE 7
Introduction Dell OpenManage™ Server Administrator produces event messages stored primarily in the operating system or Server Administrator event logs and sometimes in SNMP traps. This document describes the event messages created by Server Administrator version 5.1 or later and displayed in the Server Administrator Alert log. Server Administrator creates events in response to sensor status changes and other monitored parameters.
PAGE 8
Understanding Event Messages This section describes the various types of event messages generated by the Server Administrator. When an event occurs on your system, the Server Administrator sends information about one of the following event types to the systems management console: Table 1-1. Understanding Event Messages Icon Alert Severity Component Status OK/Normal An event that describes the successful operation of a unit.
PAGE 9
• Fan Enclosure Sensor — Monitors protective fan enclosures by detecting their removal from and insertion into the system, and by measuring how long a fan enclosure is absent from the chassis. This sensor monitors the chassis and any attached systems. • AC Power Cord Sensor — Monitors the presence of AC power for an AC power cord. • Hardware Log Sensor — Monitors the size of a hardware log. • Processor Sensor — Monitors the processor status in the system.
PAGE 10
The location of the event log file depends on the operating system you are using. • In the Microsoft® Windows® 2000 Advanced Server and Windows Server™ 2003 operating systems, messages are logged to the system event log and optionally to a unicode text file, dcsys32.log (viewable using Notepad), that is located in the install_path\omsa\log directory. The default install_path is C:\Program Files\Dell\SysMgt.
PAGE 11
...
PAGE 12
Understanding the Event Description Table 1-2 lists in alphabetical order each line item that may appear in the event description. Table 1-2.
PAGE 13
Table 1-2.
PAGE 14
Table 1-2.
PAGE 15
Event Message Reference The following tables lists in numerical order each event ID and its corresponding description, along with its severity and cause. NOTE: For corrective actions, see the appropriate documentation. Miscellaneous Messages Miscellaneous messages in Table 2-1 indicate that certain alert systems are up and working. Table 2-1. Miscellaneous Messages Event ID Description Severity Cause 0000 Log was cleared Information User cleared the log from Server Administrator.
PAGE 16
Table 2-1. Miscellaneous Messages (continued) Event ID Description Severity Cause 1005 SMBIOS data is absent Warning The system does not contain the required systems management BIOS version 2.2 or higher, or the BIOS is corrupted. 1006 Automatic System Recovery (ASR) action was performed Error This message is generated when an automatic system recovery action is performed due to a hung operating system. The action performed and the time of action are provided.
PAGE 17
Temperature Sensor Messages Temperature sensors listed in Table 2-2 help protect critical components by alerting the systems management console when temperatures become too high inside a chassis. The temperature sensor messages use additional variables: sensor location, chassis location, previous state, and temperature sensor value or state. Table 2-2.
PAGE 18
Table 2-2. Temperature Sensor Messages (continued) Event ID Description Severity Cause 1052 Information A temperature sensor on the backplane board, system board, or drive carrier in the specified system returned to a valid range after crossing a failure threshold. The sensor location, chassis location, previous state, and temperature sensor value are provided. Warning A temperature sensor on the backplane board, system board, or drive carrier in the specified system exceeded its warning threshold.
PAGE 19
Table 2-2. Temperature Sensor Messages (continued) Event ID Description Severity Cause 1054 Error A temperature sensor on the backplane board, system board, or drive carrier in the specified system exceeded its failure threshold. The sensor location, chassis location, previous state, and temperature sensor value are provided. Error A temperature sensor on the backplane board, system board, or drive carrier in the specified system detected an error from which it cannot recover.
PAGE 20
Cooling Device Messages Cooling device sensors listed in Table 2-3 monitor how well a fan is functioning. Cooling device messages provide status and warning information for fans in a particular chassis. Table 2-3. Cooling Device Messages Event ID Description Severity Cause 1100 Information A fan sensor in the specified system is not functioning. The sensor location, chassis location, previous state, and fan sensor value are provided.
PAGE 21
Table 2-3. Cooling Device Messages (continued) Event ID Description Severity Cause 1104 Error A fan sensor in the specified system detected the failure of one or more fans. The sensor location, chassis location, previous state, and fan sensor value are provided. Error A fan sensor detected an error from which it cannot recover. The sensor location, chassis location, previous state, and fan sensor value are provided.
PAGE 22
Table 2-4. Voltage Sensor Messages (continued) Event ID Description Severity Cause 1151 Information A voltage sensor in the specified system could not obtain a reading. The sensor location, chassis location, previous state, and a nominal voltage sensor value are provided. Information A voltage sensor in the specified system returned to a valid range after crossing a failure threshold. The sensor location, chassis location, previous state, and voltage sensor value are provided.
PAGE 23
Table 2-4. Voltage Sensor Messages (continued) Event ID Description Severity Cause 1154 Error A voltage sensor in the specified system exceeded its failure threshold. The sensor location, chassis location, previous state, and voltage sensor value are provided. Error A voltage sensor in the specified system detected an error from which it cannot recover. The sensor location, chassis location, previous state, and voltage sensor value are provided.
PAGE 24
Current Sensor Messages Current sensors listed in Table 2-5 measure the amount of current (in amperes) that is traversing critical components. Current sensor messages provide status and warning information for current sensors in a particular chassis. Table 2-5. Current Sensor Messages Event ID Description Severity Cause 1200 Information A current sensor on the power supply for the specified system failed. The sensor location, chassis location, previous state, and current sensor value are provided.
PAGE 25
Table 2-5. Current Sensor Messages (continued) Event ID Description Severity Cause 1202 Information A current sensor on the power supply for the specified system returned to a valid range after crossing a failure threshold. The sensor location, chassis location, previous state, and current sensor value are provided. Warning A current sensor on the power supply for the specified system exceeded its warning threshold.
PAGE 26
Table 2-5. Current Sensor Messages (continued) Event ID Description Severity Cause 1204 Error A current sensor on the power supply for the specified system exceeded its failure threshold. The sensor location, chassis location, previous state, and current sensor value are provided. Error A current sensor in the specified system detected an error from which it cannot recover. The sensor location, chassis location, previous state, and current sensor value are provided.
PAGE 27
Chassis Intrusion Messages Chassis intrusion messages listed in Table 2-6 are a security measure. Chassis intrusion means that someone is opening the cover to a system’s chassis. Alerts are sent to prevent unauthorized removal of parts from a chassis. Table 2-6. Chassis Intrusion Messages Event ID Description 1250 Severity Chassis intrusion sensor has Information failed Sensor location: Cause A chassis intrusion sensor in the specified system failed.
PAGE 28
Table 2-6. Chassis Intrusion Messages (continued) Event ID Description Severity Cause 1253 Warning A chassis intrusion sensor in the specified system detected that a system cover is currently being opened and the system is operating. The sensor location, chassis location, previous state, and chassis intrusion state are provided. Error A chassis intrusion sensor in the specified system detected that the system cover was opened while the system was operating.
PAGE 29
The number of devices required for full redundancy is provided as part of the message, when applicable, for the redundancy unit and the platform. For details on redundancy computation, see the respective platform documentation. Table 2-7. Redundancy Unit Messages Event ID Description Severity Cause 1300 Redundancy sensor has failed Information A redundancy sensor in the specified system failed.
PAGE 30
Table 2-7. Redundancy Unit Messages (continued) Event ID Description Severity Cause 1304 Redundancy regained Information A redundancy sensor in the specified system detected that a “lost” redundancy device has been reconnected or replaced; full redundancy is in effect. The redundancy unit location, chassis location, previous redundancy state, and the number of devices required for full redundancy are provided.
PAGE 31
Power Supply Messages Power supply sensors monitor how well a power supply is functioning. Power supply messages listed in Table 2-8 provide status and warning information for power supplies present in a particular chassis. Table 2-8. Power Supply Messages Event ID Description Severity Cause 1350 Information A power supply sensor in the specified system failed. The sensor location, chassis location, previous state, and additional power supply status information are provided.
PAGE 32
Table 2-8. Power Supply Messages (continued) Event ID Description Severity Cause 1352 Information A power supply has been reconnected or replaced. The sensor location, chassis location, previous state, and additional power supply status information are provided. Warning A power supply sensor reading in the specified system exceeded a user-definable warning threshold. The sensor location, chassis location, previous state, and additional power supply status information are provided.
PAGE 33
Table 2-8. Power Supply Messages (continued) Event ID Description 1354 Severity Power supply detected a failure Error Sensor location: Chassis location: Cause A power supply has been disconnected or has failed. The sensor location, chassis location, previous state, and additional power supply status information are provided.
PAGE 34
Memory Device Messages Memory device messages listed in Table 2-9 provide status and warning information for memory modules present in a particular system. Memory devices determine health status by monitoring the ECC memory correction rate and the type of memory events that have occurred. NOTE: A critical status does not always indicate a system failure or loss of data. In some instances, the system has exceeded the ECC correction rate.
PAGE 35
Fan Enclosure Messages Some systems are equipped with a protective enclosure for fans. Fan enclosure messages listed in Table 2-10 monitor whether foreign objects are present in an enclosure and how long a fan enclosure is missing from a chassis. Table 2-10. Fan Enclosure Messages Event ID Description Severity Cause 1450 Information The fan enclosure sensor in the specified system failed. The sensor location and chassis location are provided.
PAGE 36
Table 2-10. Fan Enclosure Messages (continued) Event ID Description Severity Cause 1454 Error A fan enclosure has been removed from the specified system for a user-definable length of time. The sensor location and chassis location are provided. Error A fan enclosure sensor in the specified system detected an error from which it cannot recover. The sensor location and chassis location are provided.
PAGE 37
Table 2-11. AC Power Cord Messages (continued) Event ID Description Severity Cause 1502 Information An AC power cord that did not have AC power has had the power restored. The sensor location and chassis location information are provided. Warning An AC power cord has lost its power, but there is sufficient redundancy to classify this as a warning. The sensor location and chassis location information are provided.
PAGE 38
Table 2-12. Hardware Log Sensor Messages Event ID Description Severity Cause 1550 Information A hardware log sensor in the specified system is disabled. The log type information is provided. Information A hardware log sensor in the specified system could not obtain a reading. The log type information is provided. Information The hardware log on the specified system is no longer near or at its capacity, usually as the result of clearing the log. The log type information is provided.
PAGE 39
Processor Sensor Messages Processor sensors monitor how well a processor is functioning. Processor messages listed in Table 2-13 provide status and warning information for processors in a particular chassis. Table 2-13. Processor Sensor Messages Event ID Description Severity Cause 1600 Information A processor sensor in the specified system is not functioning. The sensor location, chassis location, previous state and processor sensor status are provided.
PAGE 40
Table 2-13. Processor Sensor Messages (continued) Event ID Description Severity Cause 1603 Warning A processor sensor in the specified system is in a throttled state. The sensor location, chassis location, previous state and processor sensor status are provided. Error A processor sensor in the specified system is disabled, has a configuration error, or experienced a thermal trip. The sensor location, chassis location, previous state and processor sensor status are provided.
PAGE 41
Pluggable Device Messages The pluggable device messages listed in Table 2-14 provide status and error information when some devices, such as memory cards, are added or removed. Table 2-14. Pluggable Device Messages Event ID Description Severity Cause 1650 Information A pluggable device event message of unknown type was received. The device location, chassis location, and additional event details, if available, are provided. Information A device was added in the specified system.
PAGE 42
Battery Sensor Messages Battery sensors monitor how well a battery is functioning. Battery messages listed in Table 2-15 provide status and warning information for batteries in a particular chassis. Table 2-15. Battery Sensor Messages Event ID Description Severity Cause 1700 Information A battery sensor in the specified system is not functioning. The sensor location, chassis location, previous state, and battery sensor status are provided.
PAGE 43
Table 2-15. Battery Sensor Messages (continued) Event ID Description Severity Cause 1704 Error A battery sensor in the specified system detected that a battery has failed. The sensor location, chassis location, previous state, and battery sensor status are provided. Error A battery sensor in the specified system detected that a battery has failed. The sensor location, chassis location, previous state, and battery sensor status are provided.
PAGE 44
Event Message Reference
PAGE 45
System Event Log Messages for IPMI Systems The following tables list the system event log (SEL) messages, their severity, and cause. NOTE: For corrective actions, see the appropriate documentation. Temperature Sensor Events The temperature sensor event messages help protect critical components by alerting the systems management console when the temperature rises inside the chassis.
PAGE 46
Voltage Sensor Events The voltage sensor event messages monitor the number of volts across critical components. These messages provide status and warning information for voltage sensors for a particular chassis. Table 3-2. Voltage Sensor Events Event Message Severity voltage Critical sensor detected a failure where is the entity that this sensor is monitoring. Cause The voltage of the monitored device has exceeded the critical threshold.
PAGE 47
Fan Sensor Events The cooling device sensors monitor how well a fan is functioning. These messages provide status warning and failure messages for fans for a particular chassis. Table 3-3. Fan Sensor Events Event Message Severity Fan Critical sensor detected a failure where is the entity that this sensor is monitoring. For example "BMC Back Fan" or "BMC Front Fan.
PAGE 48
Processor Status Events The processor status messages monitor the functionality of the processors in a system. These messages provide processor health and warning information of a system. Table 3-4. Processor Status Events 48 Event Message Severity Cause status processor sensor IERR, where is the processor that generated the event. For example, PROC for a single processor system and PROC # for multiprocessor system.
PAGE 49
Power Supply Events The power supply sensors monitor the functionality of the power supplies. These messages provide status and warning information for power supplies for a particular system. Table 3-5. Power Supply Events Event Message Severity Cause power supply sensor removed. Critical This event is generated when the power supply sensor is removed. power supply sensor AC recovered.
PAGE 50
Memory ECC Events The memory ECC event messages monitor the memory modules in a system. These messages monitor the ECC memory correction rate and the type of memory events that occurred. Table 3-6. Memory ECC Events Event Message Severity Cause ECC error correction detected on Bank # DIMM [A/B]. Information This event is generated when there is a memory error correction on a particular Dual Inline Memory Module (DIMM). ECC uncorrectable error detected on Bank # [DIMM].
PAGE 51
Memory Events The memory modules can be configured in different ways in particular systems. These messages monitor the status, warning, and configuration information about the memory modules in the system. Table 3-8. Memory Events Event Message Severity Cause Memory RAID redundancy degraded. Information This event is generated when there is a memory failure in a RAID-configured memory configuration. Memory RAID redundancy lost.
PAGE 52
Drive Events The drive event messages monitor the health of the drives in a system. These events are generated when there is a fault in the drives indicated. Table 3-10. Drive Events Event Message Severity Drive asserted fault Critical state. Cause This event is generated when the specified drive in the array is faulty. Drive de-asserted fault state. Information This event is generated when the specified drive recovers from a faulty condition.
PAGE 53
Table 3-10. Drive Events (continued) Event Message Severity Cause Drive in failed array was deasserted Informational This event is generated when the drive is removed from the fail array. Drive Informational This event is generated when the drive is rebuilding. rebuild in progress was asserted Drive Warning rebuild aborted was asserted This event is generated when the drive rebuilding process is aborted.
PAGE 54
BIOS Generated System Events The BIOS generated messages monitor the health and functionality of the chipsets, I/O channels, and other BIOS-related functions. These system events are generated by the BIOS. Table 3-12. BIOS Generated System Events Event Message Severity System Event I/O channel chk. Critical Cause This event is generated when a critical interrupt is generated in the I/O Channel. System Event PCI Parity Err.
PAGE 55
Table 3-12. BIOS Generated System Events (continued) Event Message Severity Cause Memory Removed Information This event is generated when memory is removed from the system. Critical This event is generated when memory configuration is incorrect for the system. Information This event is generated when memory redundancy is regained. Warning This event is generated when correctable ECC errors have increased from a normal rate.
PAGE 56
Table 3-12. BIOS Generated System Events (continued) Event Message Severity Cause Hdwr version err Information This event is generated when the earlier mismatch between the BMC firmware and the processor is corrected. Critical This event is generated when there is a mismatch between the BMC firmware and the processor in use or vice versa. Information This event is generated when an earlier hardware mismatch is corrected.
PAGE 57
R2 Generated System Events Table 3-13. R2 Generated Events Description Severity Cause System Event: OS stop event OS graceful shutdown detected Information The OS was shutdown/restarted normally. OEM Event data record (after Information OS graceful shutdown/restart event) Comment string accompanying an OS shutdown/restart. System Event: OS stop event runtime critical stop Critical The OS encountered a critical error and was stopped abnormally.
PAGE 58
Entity Presence Events The entity presence messages are used for detecting different hardware devices. Table 3-16. Entity Presence Events Description Severity Cause Information This event is generated when the device was detected. Critical This event is generated when the device was not detected.
PAGE 59
Storage Management Message Reference The Dell OpenManage™ Server Administrator Storage Management’s alert or event management features let you monitor the health of storage resources such as controllers, enclosures, physical disks, and virtual disks. Alert Monitoring and Logging The Storage Management Service performs alert monitoring and logging. By default, the Storage Management Service starts when the managed system starts up.
PAGE 60
For other alerts, the alert message text is constructed from information passed directly from the controller (or another storage component) to the Alert Log. In these cases, the variable information is represented with a % (percent sign) in the Storage Management documentation. An example of such an alert is shown for alert 2334 in Table 4-1. Table 4-1.
PAGE 61
Table 4-2. Message Format with Variables for Each Storage Object (continued) Storage Object Message Variables A, B, C and X, Y, Z in the following examples are variables representing the storage object name or number. Virtual Disk Message Format: Virtual Disk X (Name) Controller A (Name) Message Format: Virtual Disk X Controller A Example: 2057 Virtual disk degraded: Virtual Disk 11 (Virtual Disk 11) Controller 1 (PERC 5/E Adapter) NOTE: The virtual disk and controller names are not always displayed.
PAGE 62
Table 4-2. Message Format with Variables for Each Storage Object (continued) Storage Object Message Variables A, B, C and X, Y, Z in the following examples are variables representing the storage object name or number.
PAGE 63
Table 4-3. Alert Message Change History (continued) Storage Management 2.1 Documentation Changes Comments Documentation updated to indicate clear Starting with Dell OpenManage 5.0, Array alert status. Manager is no longer an installable option. If you have an Array Manager installation and Reference to SNMP trap variables wish to see how the Array Manager events removed.
PAGE 64
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2049 Warning / Cause: A physical disk has been removed Non-critical from the disk group. This alert can also be caused by loose or defective cables or by problems with the enclosure. Physical disk removed Cause and Action Clear SNMP Event Trap Number Numbers 2052 903 2158 903 Warning / Cause: A physical disk has reported an error None Non-critical condition and may be degraded.
PAGE 65
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2053 Ok / Normal Cause: This alert is for informational purposes. None Virtual disk created Cause and Action Clear SNMP Event Trap Number Numbers 1201 Action: None 2054 Virtual disk deleted Warning / Cause: A virtual disk has been deleted. Non-critical "Performing a Reset Configuration" may detect that a virtual disk has been deleted and generate this alert.
PAGE 66
Table 4-4. Storage Management Messages (continued) Event Description ID 2057 Severity Cause and Action Clear SNMP Event Trap Number Numbers Virtual disk degraded Warning / Cause 1: This alert message occurs when a None Non-critical physical disk included in a redundant virtual disk fails. Because the virtual disk is redundant (uses mirrored or parity information) and only one physical disk has failed, the virtual disk can be rebuilt.
PAGE 67
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2064 Ok / Normal Cause: This alert is for informational purposes. 2091 Virtual disk rebuild started Cause and Action Clear SNMP Event Trap Number Numbers 1201 Action: None 2065 Physical disk rebuild Ok / Normal Cause: This alert is for informational purposes.
PAGE 68
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action 2076 Critical / Failure / Error Cause: A physical disk included in the virtual None disk failed or there is an error in the parity information. A failed physical disk can cause errors in parity information. Virtual disk check consistency failed Clear SNMP Event Trap Number Numbers 1204 Action: Replace the failed physical disk.
PAGE 69
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action 2082 Critical / Failure / Error Cause: A physical disk included in the virtual None disk has failed or is corrupt. A user may also have cancelled the rebuild. Virtual disk rebuild failed Clear SNMP Event Trap Number Numbers 1204 Action: Replace the failed or corrupt disk. You can identify a disk that has failed by locating the disk that has a red “X” for its status. Restart the virtual disk rebuild.
PAGE 70
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2094 Warning / Cause: The physical disk is predicted to fail. None Non-critical Many physical disks contain Self Monitoring Analysis and Reporting Technology (SMART). When enabled, SMART monitors the health of the disk based on indications such as the number of write operations that have been performed on the disk. Predictive Failure reported. Cause and Action Action: Replace the physical disk.
PAGE 71
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2095 Warning / Cause: A physical disk has failed, is corrupt, Non-critical or is otherwise experiencing a problem. SCSI sense data. Cause and Action Clear SNMP Event Trap Number Numbers None 903 Ok / Normal Cause: A user has assigned a physical disk as a None global hot spare. This alert is for informational purposes. 901 Action: Replace the physical disk.
PAGE 72
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2100 Warning / Cause: The physical disk enclosure is too hot. 2353 Non-critical A variety of factors can cause the excessive temperature. For example, a fan may have failed, the thermostat may be set too high, or the room temperature may be too hot. Temperature exceeded the maximum warning threshold Cause and Action Clear SNMP Event Trap Number Numbers 1053 Action: Check for factors that may cause overheating.
PAGE 73
Table 4-4. Storage Management Messages (continued) Event Description ID Severity SNMP Clear Event Trap Number Numbers Cause and Action 2104 Controller battery is Ok / Normal Cause: This alert is for informational purposes.
PAGE 74
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2108 Warning / Cause: A disk has received a SMART alert None Non-critical (predictive failure). The disk is likely to fail in the near future. Smart warning Cause and Action Action: Replace the disk that has received the SMART alert. If the physical disk is a member of a non-redundant virtual disk, then back up the data before replacing the disk.
PAGE 75
Table 4-4. Storage Management Messages (continued) SNMP Clear Trap Event Number Numbers Event Description ID Severity 2109 Warning / Cause: A disk has reached an unacceptable None Non-critical temperature and received a SMART alert (predictive failure). The disk is likely to fail in the near future. SMART warning temperature Cause and Action 903 Action 1: Determine why the physical disk has reached an unacceptable temperature. A variety of factors can cause the excessive temperature.
PAGE 76
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2110 Warning / Cause: A disk is degraded and has received a None Non-critical SMART alert (predictive failure). The disk is likely to fail in the near future. SMART warning degraded Cause and Action Clear SNMP Event Trap Number Numbers 903 Action: Replace the disk that has received the SMART alert. If the physical disk is a member of a non-redundant virtual disk, then back up the data before replacing the disk.
PAGE 77
Table 4-4. Storage Management Messages (continued) Event Description ID 2115 Severity Cause and Action Clear SNMP Event Trap Number Numbers A consistency check Ok / Normal Cause: This alert is for informational purposes. Clear on a virtual disk has The check consistency operation on a virtual event been resumed disk has resumed processing after being paused by a user. 1201 Action: None 2116 A virtual disk and its Ok / Normal Cause: This alert is for informational purposes.
PAGE 78
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2121 Ok / Normal Cause: This alert is for informational purposes. Clear A device that was previously in an error state event has returned to a normal state. Device returned to normal Cause and Action Clear SNMP Event Trap Number Numbers For example, if an enclosure became too hot and subsequently cooled down, then you may receive this alert.
PAGE 79
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2123 Warning / Cause: A virtual disk or an enclosure has lost 2124 Non-critical data redundancy. In the case of a virtual disk, one or more physical disks included in the virtual disk have failed. Due to the failed physical disk or disks, the virtual disk is no longer maintaining redundant (mirrored or parity) data. The failure of an additional physical disk will result in lost data.
PAGE 80
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2126 Warning / Cause: A sector of the physical disk is None Non-critical corrupted and data cannot be maintained on this portion of the disk. This alert is for informational purposes. SCSI sense sector reassign Cause and Action Clear SNMP Event Trap Number Numbers 903 NOTICE: Any data residing on the corrupt portion of the disk may be lost and you may need to restore your data from backup.
PAGE 81
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2131 Warning / Cause: The firmware on the controller is not None Non-critical a supported version. Firmware version mismatch Cause and Action Clear SNMP Event Trap Number Numbers 753 Action: Install a supported version of the firmware. If you do not have a supported version of the firmware available, it can be downloaded from the Dell support site at support.dell.com.
PAGE 82
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2137 Warning / Cause: The controller is unable to communicate 2162 Non-critical with an enclosure. There are several reasons why communication may be lost. For example, there may be a bad or loose cable. An unusual amount of I/O may also interrupt communication with the enclosure. In addition, communication loss may be caused by software, hardware, or firmware problems, bad or failed power supplies, and enclosure shutdown.
PAGE 83
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2141 Ok / Normal Cause: This alert is for informational purposes. None Portions of the physical disk were formerly inaccessible. The disk space from these dead segments has been recovered and is now usable. Any data residing on these dead segments has been lost.
PAGE 84
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2148 Warning / Cause: A portion of a physical disk is Non-critical damaged. Bad block medium error Cause and Action Clear SNMP Event Trap Number Numbers None 753 None 753 None 753 Ok / Normal Cause: This alert is for informational purposes. None A user has changed the enclosure asset tag. 851 Action: See the Dell OpenManage Server Administrator Storage Management online help for more information.
PAGE 85
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2155 Ok / Normal Cause: This alert is for informational purposes. None A user has changed the value for the minimum temperature probe warning threshold. Minimum temperature probe warning threshold value changed Cause and Action Clear SNMP Event Trap Number Numbers 1051 Action: None 2156 Controller alarm has Ok / Normal Cause: This alert is for informational purposes.
PAGE 86
Table 4-4. Storage Management Messages (continued) Event Description ID 2164 Severity See the Readme file Ok / Normal for a list of validated controller driver versions Cause and Action Clear SNMP Event Trap Number Numbers Cause: This alert is for informational purposes. None Storage Management is unable to determine whether the system has the minimum required versions of the RAID controller drivers. 101 Action: See the Readme file for driver and firmware requirements.
PAGE 87
Table 4-4. Storage Management Messages (continued) Event Description ID 2167 2168 2169 Severity Cause and Action Clear SNMP Event Trap Number Numbers The current kernel Warning / version and the non- Non-critical RAID SCSI driver version are older than the minimum required levels. See readme.txt for a list of validated kernel and driver versions. Cause: The version of the kernel and the None driver do not meet the minimum requirements.
PAGE 88
Table 4-4. Storage Management Messages (continued) Event Description ID 2171 Severity Cause and Action Clear SNMP Event Trap Number Numbers The controller Warning / Cause: The battery may be recharging, the 2172 battery temperature Non-critical room temperature may be too hot, or the fan is above normal. in the system may be degraded or failed. 1153 Action: If this alert was generated due to a battery recharge, the situation will correct when the recharge is complete.
PAGE 89
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action Clear SNMP Event Trap Number Numbers 2176 The controller battery Learn cycle has started. Ok / Normal Cause: This alert is for informational purposes. 2177 2177 The controller battery Learn cycle has completed. Ok / Normal Cause: This alert is for informational purposes. Clear event Action: None 1151 2178 The controller battery Learn cycle has timed out.
PAGE 90
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action Clear SNMP Event Trap Number Numbers 2182 Critical / Failure / Error Cause: The controller and attached enclosures are not cabled correctly. None 754 The controller cache Warning / Cause: The controller has flushed the cache None has been discarded. Non-critical and any data in the cache has been lost.
PAGE 91
Table 4-4. Storage Management Messages (continued) Event Description ID 2191 2192 Severity Cause and Action Clear SNMP Event Trap Number Numbers Multiple enclosures Critical / are attached to the Failure / controller. This is an Error unsupported configuration. Cause: Many enclosures are attached to the None controller port. When the enclosure limit is exceeded, the controller loses contact with all enclosures attached to the port.
PAGE 92
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2201 Warning / Cause: The controller is unable to None Non-critical communicate with a disk that is assigned as a global hot spare. The disk may have failed or has been removed. There may also be a bad or loose cable. A global hot spare failed. Cause and Action Clear SNMP Event Trap Number Numbers 903 Action: Check if the disk is healthy and that it has not been removed. Check the cables.
PAGE 93
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2205 Warning / Cause: The hot spare is no longer required None Non-critical because the virtual disk it was assigned to has been deleted. 2206 A dedicated hot spare has been automatically unassigned. Cause and Action Clear SNMP Event Trap Number Numbers 903 Action: None. The only hot spare Warning / available is a Non-critical SATA disk. SATA disks cannot replace SAS disks.
PAGE 94
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2214 Battery charge in progress OK/Normal Cause: This alert is for informational purposes. None Battery charge process interrupted OK/Normal Cause: This alert is for informational purposes. None 2215 Clear SNMP Event Trap Number Numbers 1151 Action: None. 1151 Action: None. 2232 The controller alarm Ok / Normal Cause: This alert is for informational purposes. None is silenced.
PAGE 95
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action Clear SNMP Event Trap Number Numbers 2245 A virtual disk blink has ceased. Ok / Normal Cause: This alert is for informational purposes. None 2246 The controller battery is degraded. Warning / Cause: The controller battery charge is weak. None Non-critical Action: As the charge weakens, the charger should automatically recharge the battery.
PAGE 96
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action Clear SNMP Event Trap Number Numbers 2262 SMART thermal Ok / Normal Cause: This alert is for informational purposes. None shutdown is enabled. Action: None 101 2263 SMART thermal shutdown is disabled. Ok / Normal Cause: This alert is for informational purposes. None 101 A device is missing. Warning / Cause: The controller cannot communicate Non-critical with a device. The device may be removed.
PAGE 97
Table 4-4. Storage Management Messages (continued) Event Description ID 2268 2269 2270 Severity %1, Storage Critical / Management has Failure / lost communication Error with the controller. An immediate reboot is strongly recommended to avoid further problems. If the reboot does not restore communication, then contact technical support for more information. Cause and Action Clear SNMP Event Trap Number Numbers Cause: Storage Management has lost None communication with a controller.
PAGE 98
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action 2273 Critical / Failure / Error Cause: A source (array) disk in a redundant None virtual disk has a bad disk block. The algorithm that maintains redundant data has created a similar bad block on the target redundant disk to maintain consistency in disk block addressing. Data has been lost. Bad media. Clear SNMP Event Trap Number Numbers 904 Action: Restore from backup.
PAGE 99
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2280 Ok / Normal Cause: A disk media error was detected while the controller was completing a background task. A bad disk block was identified. The disk block has been remapped. A disk media error has been corrected. Cause and Action Clear SNMP Event Trap Number Numbers None 1201 Virtual disk has inconsistent data. Ok / Normal Cause: This alert is for informational purposes. None 1201 Hot spare SMART polling failed.
PAGE 100
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action Clear SNMP Event Trap Number Numbers 2286 A Learn cycle start is Ok / Normal Cause: This alert is for informational purposes. None pending while the Action: None battery charges. 1151 2287 The Patrol Read is paused. Ok / Normal Cause: This alert is for informational purposes. 2288 751 2288 The patrol read has resumed. Ok / Normal Cause: This alert is for informational purposes.
PAGE 101
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action 2292 Critical / Failure / Error Cause: The controller has lost communication 2162 with an EMM. The cables may be loose or defective. Communication with the enclosure has been lost. Clear SNMP Event Trap Number Numbers 854 Action: Make sure the cables are attached securely. Reboot the system. 2293 The EMM has failed.
PAGE 102
Table 4-4. Storage Management Messages (continued) Event Description ID 2298 Severity Cause and Action There is a bad sensor Warning / Cause: The enclosure has a bad sensor. The on an enclosure. Non-critical enclosure sensors monitor the fan speeds, temperature probes, etc. Clear SNMP Event Trap Number Numbers None 853 Cause: There is a problem with a physical None connection or PHY. The %1 indicates a substitution variable.
PAGE 103
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action Clear SNMP Event Trap Number Numbers 2301 Critical / Failure / Error Cause: The enclosure or an enclosure component is in a Failed or Degraded state. None 854 The enclosure is not Critical / responding. Failure / Error Cause: The enclosure or an enclosure component is in a Failed or Degraded state. None 854 2303 The enclosure Ok / Normal Cause: This alert is for informational purposes.
PAGE 104
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action 2307 Critical / Failure / Error Cause: The bad block table is used for None remapping bad disk blocks. This table fills, as bad disk blocks are remapped. When the table is full, bad disk blocks can no longer be remapped and disk errors can no longer be corrected. At this point, data loss can occur. The %1 indicates a substitution variable.
PAGE 105
Table 4-4. Storage Management Messages (continued) Event Description ID 2311 Severity The firmware on the Warning / EMMs is not the Non-critical same version. EMM0 %1 EMM1 %2 Cause and Action Clear SNMP Event Trap Number Numbers Cause: The firmware on the EMM modules None is not the same version. It is required that both modules have the same version of the firmware. This alert may be caused if you attempt to insert an EMM module that has a different firmware version than an existing module.
PAGE 106
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action 2316 Critical / Failure / Error Cause: A diagnostics test failed. The %1 None indicates a substitution variable. The text for this substitution variable is generated by the utility that ran the diagnostics and is displayed with the alert in the Alert Log. This text can vary depending on the situation.
PAGE 107
Table 4-4. Storage Management Messages (continued) Event Description ID 2321 2322 Severity Cause and Action Clear SNMP Event Trap Number Numbers Single-bit ECC error. Critical / The DIMM is Failure / critically degraded. Error There will be no further reporting. Cause: The DIMM is malfunctioning. Data loss or data corruption is imminent. The DIMM must be replaced immediately. No further alerts will be generated. None The DC power supply is switched off. Cause: The power supply unit is switched off.
PAGE 108
Table 4-4. Storage Management Messages (continued) Event Description ID 2327 2328 Severity The NVRAM has Warning / corrupted data. The Non-critical controller is reinitializing the NVRAM. The NVRAM has corrupt data. Cause and Action Clear SNMP Event Trap Number Numbers Cause: The NVRAM has corrupted data. None This may occur after a power surge, a battery failure, or for other reasons. The controller is reinitializing the NVRAM. 753 Action: None.
PAGE 109
Table 4-4. Storage Management Messages (continued) Event Description ID 2331 Severity Cause and Action Clear SNMP Event Trap Number Numbers A bad disk block has Warning / Cause: The disk has a bad block. Data has been reassigned. Non-critical been readdressed to another disk block and no data loss has occurred. None 903 2332 A controller hot plug Ok / Normal Cause: This alert is for informational purposes. None has been detected.
PAGE 110
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2335 Warning / Cause: The %1 indicates a substitution None Non-critical variable. The text for this substitution variable is generated by the controller and is displayed with the alert in the Alert Log. This text is from events in the controller event log that were generated while Storage Management was not running. This text can vary depending on the situation.
PAGE 111
Table 4-4. Storage Management Messages (continued) Event Description ID 2340 Severity The BGI completed Critical / with uncorrectable Failure / errors. Error Cause and Action Clear SNMP Event Trap Number Numbers Cause: The BGI task encountered errors that None cannot be corrected. The virtual disk contains physical disks that have unusable disk space or disk errors that cannot be corrected. 1204 Action: Replace the physical disk that contains the disk errors.
PAGE 112
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action Clear SNMP Event Trap Number Numbers 2345 Critical / Failure / Error Cause: The controller cannot communicate with the attached devices. A disk may be removed or contain errors. The cables may also be loose or defective. None The virtual disk initialization failed. 1204 Action: Check the health of attached devices.
PAGE 113
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action 2349 Critical / Failure / Error Cause: A write operation could not complete None because the disk contains bad disk blocks that could not be reassigned. Data loss may have occurred and data redundancy may also be lost. A bad disk block could not be reassigned during a write operation. Clear SNMP Event Trap Number Numbers 904 Action: Replace the disk.
PAGE 114
Table 4-4. Storage Management Messages (continued) Event Description ID Severity 2355 Warning / Cause: The system was unable to download None Non-critical firmware to the enclosure. The controller may have lost communication with the enclosure. There may have been problems with the data transfer or the download media may be corrupt. Enclosure firmware download failed. Cause and Action Clear SNMP Event Trap Number Numbers 853 Action: Attempt to download the enclosure firmware again.
PAGE 115
Table 4-4. Storage Management Messages (continued) Event Description ID Severity Cause and Action 2357 Critical / Failure / Error Cause: The %1 indicates a substitution None variable. The text for this substitution variable is generated by the firmware and is displayed with the alert in the Alert Log. This text can vary depending on the situation. SAS expander error: %1 Clear SNMP Event Trap Number Numbers 754 Action: There may be a problem with the enclosure.
PAGE 116
Table 4-4. Storage Management Messages (continued) Event Description ID Cause and Action Clear SNMP Event Trap Number Numbers 2362 Physical disk(s) have Ok / Normal Cause: This alert is for informational purposes. None been removed from a Action: None. virtual disk. The virtual disk will be in Failed state during the next system reboot. 751 2363 A virtual disk and all Ok / Normal Cause: This alert is for informational purposes. None of its member Action: None.
PAGE 117
Table 4-4. Storage Management Messages (continued) Event Description ID 2368 Severity Cause and Action Clear SNMP Event Trap Number Numbers The SCSI Enclosure Ok / Normal Cause: This alert is for informational purposes. None Processor (SEP) has Action: None. been rebooted as part of the firmware download operation and will be unavailable until the operation completes.
PAGE 118
Storage Management Message Reference
PAGE 119
Index Symbols 1052, 18 1254, 28 %1, Storage Management has lost communication with this RAID controller and attached storage. An immediate reboot is strongly recommended to avoid further problems. If the reboot does not restore communication, there may be a hardware failure.
PAGE 120
Index 1503, 37 2053, 65 2095, 71 1504, 37 2054, 65 2098, 71 1505, 37 2055, 65 2099, 71 1550, 38 2056, 65 2100, 72 1551, 38 2057, 66 2101, 72 1552, 38 2058, 66 2102, 72 1553, 38 2059, 66 2103, 72 1554, 38 2061, 66 2104, 73 1555, 38 2062, 66 2105, 73 1600, 39 2063, 66 2106, 73 1601, 39 2064, 67 2107, 73 1602, 39 2065, 67 2108, 74 1603, 40 2067, 67 2109, 75 1604, 40 2070, 67 2110, 76 1605, 40 2074, 67 2111, 76 1650, 41 2076, 68 2112, 76 1651, 41 2077, 68
PAGE 121
2130, 80 2164, 86 2202, 92 2131, 81 2165, 86 2203, 92 2132, 81 2166, 86 2204, 92 2135, 81 2167, 87 2205, 93 2136, 81 2168, 87 2206, 93 2137, 82 2169, 87 2207, 93 2138, 82 2170, 87 2211, 93 2139, 82 2171, 88 2212, 93 2140, 82 2173, 88 2213, 93 2141, 83 2174, 88 2214, 94 2142, 83 2175, 88 2215, 94 2143, 83 2176, 89 2232, 94 2144, 83 2177, 89 2233, 94 2145, 83 2178, 89 2234, 94 2146, 83 2179, 89 2235, 94 2147, 83 2180, 89 2237, 94 2148, 84 2181, 89 2238, 94 21
PAGE 122
Index 2254, 95 2288, 100 2319, 106 2255, 95 2289, 100 2320, 106 2259, 95 2290, 100 2321, 107 2260, 95 2291, 100 2322, 107 2261, 95 2292, 101 2323, 107 2262, 96 2293, 101 2324, 107 2263, 96 2294, 101 2325, 107 2264, 96 2295, 101 2326, 107 2265, 96 2296, 101 2327, 108 2266, 96 2297, 101 2328, 108 2267, 96 2298, 102 2329, 108 2268, 97 2299, 102 2330, 108 2269, 97 2300, 102 2331, 109 2270, 97 2301, 103 2332, 109 2271, 97 2302, 103 2333, 109 2272, 97 2303, 103
PAGE 123
2349, 113 2350, 113 2351, 113 2352, 113 2353, 113 2354, 113 2355, 114 2356, 114 2357, 115 2358, 115 2359, 115 2360, 115 2361, 115 2362, 116 2363, 116 2364, 116 A consistency check on a virtual disk has been resumed, 77 A controller hot plug has been detected., 109 A Learn cycle start is pending while the battery charges., 100 A dedicated hot spare failed., 92 A mirrored virtual disk has been unmirrored, 77 A dedicated hot spare has been automatically unassigned., 93 A physical disk is incompatible.
PAGE 124
Index A virtual disk and all of its member physical disks have been removed while the system was shut down. This removal was discovered during system start-up., 116 An EMM has been inserted., 101 A virtual disk and its mirror have been split, 77 An enclosure blink operation has initiated., 95 A virtual disk blink has been initiated., 94 An enclosure temperature sensor differential has been detected., 109 A virtual disk blink has ceased., 95 A virtual disk is permanently degraded.
PAGE 125
chassis intrusion messages, 27 cooling device messages, 20 Driver version mismatch, 81 Chassis intrusion returned to normal, 27 current sensor, 8 drives messages, 52 chassis intrusion sensor, 8 Chassis intrusion sensor detected a nonrecoverable value, 28, 49 Chassis intrusion sensor has failed, 27 Chassis intrusion sensor value unknown, 27, 49 Communication regained, 85 Communication timeout, 82 Communication with the enclosure has been lost.
PAGE 126
Index fan enclosure messages, 35 G M Fan enclosure removed from system, 35 Global hot spare assigned, 71 Maximum temperature probe warning threshold value changed, 84 Fan enclosure removed from system for an extended amount of time, 36 Global hot spare unassigned, 71 fan enclosure sensor, 9 H Fan enclosure sensor detected a non-recoverable value, 36 hardware log sensor, 9 Fan enclosure sensor has failed, 35 Fan enclosure sensor value unknown, 35 fan sensor, 8 Fan sensor detected a failure
PAGE 127
messages (continued) fan enclosure, 35 fan sensor, 47 hardware log sensor, 51 intrusion, 53 memory device, 34 memory ecc, 50 memory modules, 51 miscellaneous, 15 pluggable device, 41, 54 power supply, 31, 49 processor sensor, 39 processor status, 48 r2 generated system, 57 redundancy unit, 28 storage management, 63 temperature sensor, 17, 45 voltage sensor, 21, 46 Minimum temperature probe warning threshold value changed, 85 Multi-bit ECC error., 100 Multiple enclosures are attached to the controller.
PAGE 128
Index R SCSI sense data, 71 r2 generated system messages, 57 SCSI sense sector reassign, 80 Rebuild completed with errors, 85 Rebuild not possible as SAS/SATA is not supported in the same virtual disk.
PAGE 129
Temperature sensor detected a failure value, 19 The Check Consistency made corrections and completed., 111 The controller battery Learn cycle has timed out., 89 Temperature sensor detected a non-recoverable value, 19 The Check Consistency rate has changed., 94 The controller battery Learn cycle will start in % days., 89 Temperature sensor detected a warning value, 18 The Clear operation has cancelled., 95 The controller battery needs to be replaced.
PAGE 130
Index The current kernel version and the non-RAID SCSI driver version are older than the minimum required levels.See the Readme file for a list of validated kernel and driver versions., 87 The DC power supply is switched off., 107 The dedicated hot spare is too small., 98 The EMM has failed., 101 The enclosure cannot support both SAS and SATA physical disks. Physical disks may be disabled., 103 The enclosure has a hardware error., 103 The enclosure is not responding., 103 The enclosure is unstable.
PAGE 131
The rebuild failed due to errors on the source physical disk., 112 The rebuild failed due to errors on the target physical disk., 112 The SCSI Enclosure Processor (SEP) has been rebooted as part of the firmware download operation and will be unavailable until the operation completes., 117 U understanding event description, 12 Unsupported configuration detected. The SCSI rate of the enclosure management modules (EMMs) is not the same.
PAGE 132
Index Voltage sensor detected a warning value, 22 Voltage Sensor Events, 46 Voltage sensor has failed, 21, 47 voltage sensor messages, 21, 46 Voltage sensor returned to a normal value, 22 Voltage sensor value unknown, 22, 47 132 Index