System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Intel order number G90620-003 Revision 1.
Revision History System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Revision History Date January 2013 Revision Number 1.0 Initial release September 2013 1.1 Added MIC Thermal Margin sensors C4 through C7. Added MIC Status sensors A2, A3, A6, and A7. Added voltage sensors EA, EB, EC, ED, and EF. Corrected typographical errors. Made corrections to Firmware Update Status table.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Disclaimers Disclaimers INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.
Table of Contents System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table of Contents 1. Introduction ........................................................................................................................ 1 1.1 Purpose.................................................................................................................. 1 1.2 Industry Standard ....................................................
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table of Contents 5.2.1 Threshold-based Temperature Sensors ............................................................... 49 5.2.2 Thermal Margin Sensors ...................................................................................... 51 5.2.3 Processor Thermal Control Sensors ..................................................................... 53 5.2.
Table of Contents System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 9.2.1 System Firmware Progress (Formerly Post Error) – Next Steps ........................... 89 10. Chassis Subsystem .......................................................................................................... 97 10.1 Physical Security .................................................................................................. 97 10.1.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table of Contents 13.5.1 Node Manager Alert Threshold Exceeded – Next Steps ..................................... 124 14. Microsoft Windows* Records ........................................................................................ 125 14.1 Boot up Event Records ...................................................................................... 125 14.
List of Tables System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families List of Tables Table 1: SEL Record Format....................................................................................................... 4 Table 2: Event Request Message Event Data Field Contents ..................................................... 7 Table 3: OEM SEL Record (Type C0h-DFh) ...............................................................
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families List of Tables Table 39: Thermal Margin Sensors Event Triggers – Description .............................................. 52 Table 40: Thermal Margin Sensors – Next Steps ...................................................................... 52 Table 41: Processor Thermal Control Sensors Typical Characteristics .....................................
List of Tables System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 79: SMI Timeout Sensor Typical Characteristics ........................................................... 103 Table 80: System Event Log Cleared Sensor Typical Characteristics ..................................... 104 Table 81: System Event – PEF Action Sensor Typical Characteristics ....................................
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 1. Introduction Introduction The server management hardware that is part of the Intel® Server Boards and Intel® Server Platforms serves as a vital part of the overall server management strategy.
Introduction 1.2 System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Industry Standard 1.2.1 Intelligent Platform Management Interface (IPMI) The key characteristic of the Intelligent Platform Management Interface (IPMI) is that the inventory, monitoring, logging, and recovery control functions are available independently of the main processors, BIOS, and operating system.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Introduction board; it sends alerts and logs events when certain parameters exceed their preset thresholds, indicating a potential failure of the system. The administrator can also remotely communicate with the BMC to take some corrective action such as resetting or power cycling the system to get a hung OS running again.
Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 2. Basic Decoding of a SEL Record The System Event Log (SEL) record format is defined in the IPMI Specification. The following section provides a basic definition for each of the fields in a SEL. For more details see the IPMI Specification. The definitions for the standard SEL can be found in Table 1.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Basic Decoding of a SEL Record Byte Field Description Byte 1 2 [7:1] – 7-bit I C Slave Address, or 7-bit system software ID [0] 0b = ID is IPMB Slave Address 1b = System software ID Software ID values: 0001h – BIOS POST for POST errors, RAS Configuration/State, Timestamp Synch, OS Boot events 0033h – BIOS SMI Handler 0020h – BMC Firmware 002Ch – ME Firmware 0041h –
Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 02h-0ch = Discrete 6Fh = Sensor-Specific 70-7Fh = OEM 6 14 Event Data 1 (ED1) 15 Event Data 2 (ED2) 16 Event Data 3 (ED3) Per Table 2 Intel order number G90620-003 Revision 1.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Basic Decoding of a SEL Record Table 2: Event Request Message Event Data Field Contents Sensor Class Event Data Threshold Event Data 1 [7:6] – 00b = Unspecified Event Data 2 01b = Trigger reading in Event Data 2 10b = OEM code in Event Data 2 11b = Sensor-specific event extension code in Event Data 2 [5:4] – 00b = Unspecified Event Data 3 01b = Trigger threshold value in Ev
Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Class Event Data 11b = Reserved [5:4] – 00b = Unspecified Event Data 3 01b = Reserved 10b = OEM code in Event Data 3 11b = Reserved [3:0] – Offset from Event/Reading Type Code Event Data 2 [7:4] – Optional OEM code bits or offset from “Severity” Event/Reading Type Code (0Fh if unspecified).
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Basic Decoding of a SEL Record Byte 11 12 13 14 15 16 Field OEM Defined Description OEM Defined. This is defined according to the manufacturer identified by the Manufacturer ID field. Table 4: OEM SEL Record (Type E0h-FFh) Byte Revision 1.2 Field Description 1 2 Record ID (RID) ID used for SEL Record access.
Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 2.2 Notes on SEL Logs and Collecting SEL Information Whenever you capture the SEL log, you should always collect both the text/human readable version and the hex version. Because some of the data is OEM-specific, some utilities cannot decode the information correctly.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Basic Decoding of a SEL Record RID (Record ID) = 011Ah RT (Record Type) = 02h = system event record TS (Timestamp) = 4E6A4957h GID (Generator ID = 0001h = BIOS POST ER (Event Message Revision) = 04 = IPMI v2.
Basic Decoding of a SEL Record System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families RT (Record Type) = 02h = system event record TS (Timestamp) = 502E9B0Ah GID (Generator ID = 0033h = BIOS SMI Handler ER (Event Message Revision) = 04 = IPMI v2.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Cross Reference List 3. Sensor Cross Reference List This section contains a cross reference to help find details on any specific SEL entry. 3.1 BMC owned Sensors (GID = 0020h) The following table can be used to find the details of sensors owned by the BMC.
Sensor Cross Reference List System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Number 14 Sensor Name Details Section Next Steps 0Ah BMC Watchdog (BMC Watchdog) BMC Watchdog Sensor BMC Watchdog Sensor – Next Steps 0Bh Voltage Regulator Watchdog (VR Watchdog) Voltage Regulator Watchdog Timer Sensor Voltage Regulator Watchdog Timer Sensor – Next Steps 0Ch Fan Redundancy (Fan Redundancy) Fan Presence and Redu
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Cross Reference List Sensor Number Sensor Name Details Section Next Steps 18h PCI Riser 4 Temperature (PCI Riser 4 Temp) Threshold-based Temperature Sensors Table 37: Temperature Sensors – Next Steps 19h Baseboard +1.05V Processor3 Vccp (BB +1.05Vccp P3) Threshold-based Voltage Sensors Table 13: Threshold-based Voltage Sensors – Next Steps 1Ah Baseboard +1.
Sensor Cross Reference List System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Number Sensor Name Details Section Next Steps 2Ch PCI Riser 2 Temperature (PCI Riser 2 Temp) Threshold-based Temperature Sensors Table 37: Temperature Sensors – Next Steps 2Dh SAS Module Temperature (SAS Mod Temp) Threshold-based Temperature Sensors Table 37: Temperature Sensors – Next Steps 2Eh Exit Air Temperature (Exit Air Tem
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Cross Reference List Sensor Number Sensor Name Details Section Next Steps 5Dh Power Supply 2 Temperature (PS2 Temperature) Power Supply Temperature Sensors Table 27: Power Supply Temperature Sensor – Event Trigger Offset – Next Steps 60h-68h Hard Disk Drive 15-23 Status (HDD 15-23 Status) Hard Disk Drive Monitoring Sensor Table 90: Hard Disk Drive Monitoring
Sensor Cross Reference List System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Number Sensor Name Details Section Next Steps 7Eh Processor 3 ERR2 Timeout (P3 ERR2) Processor ERR2 Timeout Sensor Processor ERR2 Timeout – Next Steps 7Fh Processor 4 ERR2 Timeout (P4 ERR2) Processor ERR2 Timeout Sensor Processor ERR2 Timeout – Next Steps 80h Catastrophic Error (CATERR) Catastrophic Error Sensor Table 50: Catas
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Cross Reference List Sensor Number Sensor Name Details Section Next Steps 94h Processor 1 Memory VRD Hot 0-1 (P1 Mem01 VRD Hot) Discrete Thermal Sensors Table 45: Discrete Thermal Sensors – Next Steps 95h Processor 1 Memory VRD Hot 2-3 (P1 Mem23 VRD Hot) Discrete Thermal Sensors Table 45: Discrete Thermal Sensors – Next Steps 96h Processor 2 Memory VRD Hot
Sensor Cross Reference List System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Number Sensor Name Next Steps A5h Power Supply 2 Fan Tachometer 2 (PS2 Fan Tach 2) A6h Intel Xeon Phi Coprocessor Status 3 (MIC 3 Status) A7h Intel Xeon Phi Coprocessor Status 4 (MIC 4 Status) Intel Xeon Phi Coprocessor (MIC) Status Sensors Intel Xeon Phi Coprocessor (MIC) Status Sensors Next Steps B0h Processor 1 DIMM Aggregate T
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Cross Reference List Sensor Number Sensor Name Details Section Next Steps B7h Processor 4 DIMM Aggregate Thermal Margin 2 (P4 DIMM Thrm Mrgn2) Thermal Margin Sensors Table 40: Thermal Margin Sensors – Next Steps B8h Node Auto-Shutdown Sensor (Auto Shutdown) Node Auto Shutdown Sensor Node Auto Shutdown Sensor – Next Steps BAh-BFh Fan Tachometer Sensors (Chas
Sensor Cross Reference List System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Number Sensor Name Details Section Next Steps D3h Baseboard +5V Stand-by (BB +5.0V STBY) Threshold-based Voltage Sensors Table 13: Threshold-based Voltage Sensors – Next Steps D4h Baseboard +3.3V Auxiliary (BB +3.3V AUX) Threshold-based Voltage Sensors Table 13: Threshold-based Voltage Sensors – Next Steps D6h Baseboard +1.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Cross Reference List Sensor Number Sensor Name Details Section Next Steps E4h Baseboard +1.35V P1 Low Voltage Memory AB VDDQ (BB +1.35 P1LV AB) Threshold-based Voltage Sensors Table 13: Threshold-based Voltage Sensors – Next Steps E5h Baseboard +1.35V P1 Low Voltage Memory CD VDDQ (BB +1.
Sensor Cross Reference List System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 3.2 BIOS POST owned Sensors (GID = 0001h) The following table can be used to find the details of sensors owned by BIOS POST.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Cross Reference List Sensor Number 05h Sensor Name PCI Express* Correctable Error Details Section Next Steps PCI Express* Correctable Errors PCI Express* Correctable Error Sensor – Next Steps QPI Correctable Error Sensor QPI Correctable Error Sensor – Next Steps ® 06h Intel Quick Path Interface Correctable Error 07h Intel Quick Path Interface Fatal Error QP
Sensor Cross Reference List System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 3.5 Microsoft* OS owned Events (GID = 0041) The following table can be used to find the details of records that are owned by the Microsoft* Operating System (OS).
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems 4. Power Subsystems The BMC monitors the power subsystem including power supplies, select onboard voltages, and related sensors. 4.1 Threshold-based Voltage Sensors The BMC monitors the main voltage sources in the system, including the baseboard, memory, and processors, using IPMI-compliant analog/threshold sensors.
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 12: Threshold-based Voltage Sensors Event Triggers – Description Hex Event Trigger Description Assertion Severity Deassert Severity Description 00h Lower non-critical going low Degraded OK The voltage has dropped below its lower non-critical threshold.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems Sensor Number D0h D1h D2h D3h Revision 1.2 Sensor Name Next Steps Baseboard +12V (BB +12.0V) +12V is supplied by the power supplies. +12V is used by SATA drives, Fans, and PCI cards. In addition it is used to generate various processor voltages. 1. Ensure all cables are connected correctly. 2. Check connections on the fans and HDDs. 3.
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Number D4h D6h D7h D8h D9h 30 Sensor Name Next Steps Baseboard +3.3V Auxiliary (BB +3.3V AUX) +3.3V AUX is supplied by the main board. ® +3.3V AUX is used by the BMC, clock chips, PCI-E Slot, on-board NIC, Intel C600 series Chipset, and ICH. 1. Ensure all cables are connected correctly. 2. If the issue remains, replace the board. 3.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems Sensor Number Sensor Name Next Steps Baseboard +1.5V P2 Memory AB VDDQ (BB +1.5 P2MEM AB) This 1.5V line is supplied by the main board. This 1.5V line is used by processor 2 memory slots A and B. 1. Ensure all cables are connected correctly. 2. Check the DIMMs are seated properly. 3. Cross test the DIMMs.
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Number E5h E6h E7h EAh EBh 32 Sensor Name Next Steps Baseboard +1.35V P1 Low Voltage Memory CD VDDQ (BB +1.35 P1LV CD) This 1.35V line is supplied by the main board. This 1.35V line is used by processor 1 memory slots C and D. 1. Ensure all cables are connected correctly. 2. Check the DIMMs are seated properly. 3. Cross test the DIMMs.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems Sensor Number ECh EDh EEh EFh 4.2 Sensor Name Next Steps Baseboard +0.9V (BB 0.9V Core IB) +0.9V Core IB is supplied by the main board on specific platforms. +0.9V Core IB is used by the on-board Infiniband* controller on those specific platforms. 1. Ensure all cables are connected correctly. 2. If the issue remains, replace the board. 3.
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families If the SystemPowerGood signal has not asserted by the time the VR Watchdog Timer expires, the FW powers down the system, logs a SEL entry, and emits a beep code (1-5-1-2). This failure is termed as VR Watchdog Timeout. Table 14: Voltage Regulator Watchdog Timer Sensor Typical Characteristics Byte 4.2.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems Table 15: Power Unit Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 09h = Power Unit 12 Sensor Number 01h 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 6Fh (Sensor Specific) 14 Event Data 1 [7:6] – 00b = Unspecified Event Data 2 [5:4] – 00b = Unsp
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Specific Offset Hex Description 05h Soft Power Control Failure 06h 4.3.2 Power Unit Failure Description Next Steps Asserted if the system fails to power on due to the following power control sources: Chassis Control command PEF action BMC Watchdog Timer Power State Retention Power subsystem experienced a failure.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems Byte Field Description Event Type 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 0Bh (Generic Discrete) 14 Event Data 1 [7:6] – 00b = Unspecified Event Data 2 [5:4] – 00b = Unspecified Event Data 3 [3:0] – Event Trigger Offset as described in Table 18 15 Event Data 2 Not used 16 Event Data 3 Not used Table 18: Power Unit Redundanc
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families This sensor is only used for triggering SEL to indicate node or power auto shutdown assertion or deassertion. Table 19: Node Auto Shutdown Sensor Typical Characteristics Byte 4.3.3.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems Table 20: Power Supply Status Sensors Typical Characteristics Byte Field Description 11 Sensor Type 08h = Power Supply 12 Sensor Number 50h = Power Supply 1 Status 51h = Power Supply 2 Status 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 6Fh (Sensor Specific) 14 Event Data 1 [
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Sensor Specific Offset Hex 02h Description ED2 ED3 Check the data in ED2 and ED3 for more details.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems 4.4.2 Power Supply Power In Sensors These sensors will log an event when a power supply in the system is exceeding its AC power in threshold.
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 4.4.3 Power Supply Current Out % Sensors PMBus*-compliant power supplies may monitor the current output of the main 12v voltage rail and report the current usage as a percentage of the maximum power output for that rail.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Power Subsystems 4.4.4 Power Supply Temperature Sensors The BMC monitors one or two power supply temperature sensors for each installed PMBus*-compliant power supply.
Power Subsystems System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 4.4.5 Power Supply Fan Tachometer Sensors The BMC polls each installed power supply using the PMBus* fan status commands to check for failure conditions for the power supply fans. Table 28: Power Supply Fan Tachometer Sensors Typical Characteristics Byte 4.4.5.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Cooling Subsystem 5. Cooling Subsystem 5.1 Fan Sensors There are three types of fan sensors that can be present on Intel® Server Systems: speed, presence, and redundancy. The last two are only present in the systems with hot-swap redundant fans. 5.1.1 Fan Tachometer Sensors Fan tachometer sensors monitor the rpm signal on the relevant fan headers on the platform.
Cooling Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 30: Fan Tachometer Sensor – Event Trigger Offset – Next Steps Event Trigger Offset Assertion Severity Deassert Severity Description Hex Description 00h Lower non-critical going low Degraded OK The fan speed has dropped below its lower non-critical threshold.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Cooling Subsystem Byte Field Description 14 Event Data 1 [7:6] – 00b = Unspecified Event Data 2 [5:4] – 00b = Unspecified Event Data 3 [3:0] – Event Trigger Offset as described in Table 32 15 Event Data 2 Not used 16 Event Data 3 Not used The following table describes the severity of each of the event triggers for both assertion and deassertion.
Cooling Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 0Bh (Generic Discrete) 14 Event Data 1 [7:6] – 00b = Unspecified Event Data 2 [5:4] – 00b = Unspecified Event Data 3 [3:0] – Event Trigger Offset as described in Table 34 15 Event Data 2 Not used 16 Event Dat
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Cooling Subsystem 5.2 Temperature Sensors There are a variety of temperature sensors that can be implemented on Intel® Server Systems. They are split into various types each with their own events that can be logged.
Cooling Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte 16 Field Description Event Data 3 Threshold value that triggered event Table 36: Temperature Sensors Event Triggers – Description Hex Event Trigger Description Assertion Severity Deassert Severity Description 00h Lower non-critical going low Degraded OK The temperature has dropped below its lower non-critical threshold.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Cooling Subsystem Sensor Number Sensor Name 23h Baseboard Temperature 2 24h Baseboard Temperature 3 25h Baseboard Temperature 4 26h I/O Mod Temp 27h PCI Riser 1 Temp 28h IO Riser Temp 2Ch PCI Riser 2 Temp 2Dh SAS Mod Temp 2Eh Exit Air Temp 2Fh LAN NIC Temp 5.2.
Cooling Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 01h (Threshold) 14 Event Data 1 [7:6] – 01b = Trigger reading in Event Data 2 [5:4] – 01b = Trigger threshold in Event Data 3 [3:0] – Event Triggers as described in Table 39 15 Event Data 2 Reading that trigger
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Cooling Subsystem Sensor Number Sensor Name B3h P2 DIMM Thrm Mrgn2 B4h P3 DIMM Thrm Mrgn1 B5h P3 DIMM Thrm Mrgn2 B6h P4 DIMM Thrm Mrgn1 B7h P4 DIMM Thrm Mrgn2 C8h Agg Therm Mrgn 1 C9h Agg Therm Mrgn 2 CAh Agg Therm Mrgn 3 CBh Agg Therm Mrgn 4 CCh Agg Therm Mrgn 5 CDh Agg Therm Mrgn 6 CEh Agg Therm Mrgn 7 CFh Agg Therm Mrgn 8 5.2.3 Next Steps 4.
Cooling Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 7Bh = Processor 4 Thermal Control % 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 01h (Threshold) 14 Event Data 1 [7:6] – 01b = Trigger reading in Event Data 2 [5:4] – 01b = Trigger threshold in Event Data 3 [3:0] – Event Triggers as described in Table 42 15
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Cooling Subsystem Xeon® processor E5-4600/2600/2400/1600 product families, this requires significant BMC FW calculations to derive the sensor value. Intel® Xeon® processor E5-4600/2600/2400/1600 v2 product families are the follow-on processors to Intel® Xeon® processor E54600/2600/2400/1600 product families.
Cooling Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = See Table 45 14 Event Data 1 [7:6] – 00b = Unspecified Event Data 2 [5:4] – 00b = Unspecified Event Data 3 [3:0] – Event Trigger Offset as described in Table 45 15 Event Data 2 Not used 16 Event Data 3 Not u
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Cooling Subsystem Sensor Number Sensor Name 99h P4 Mem23 VRD Hot Processor 3 Memory 2/3 voltage regulator overheated 9Ah P4 Mem01 VRD Hot Processor 4 Memory 0/1 voltage regulator overheated 9Bh P4 Mem23 VRD Hot Processor 4 Memory 2/3 voltage regulator overheated 5.2.
Cooling Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 0-3 = CPU1-4 [4:3] – Channel 0-3 = Channel A, B, C, D for CPU1 Channel E, F, G, H for CPU2 Channel J, K, L, M for CPU3 Channel N, P, R, T for CPU4 [2:0] – DIMM 0-2 = DIMM 1-3 on Channel 5.2.6.1 1. 2. 3. 4. DIMM Thermal Trip Sensors – Next Steps Check for clear and unobstructed airflow into and out of the chassis.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Processor Subsystem 6. Processor Subsystem Intel® servers report multiple processor-centric sensors in the SEL. 6.1 Processor Status Sensor The BMC provides an IPMI sensor of type processor for monitoring status information for each processor slot.
Processor Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 48: Processor Status Sensors – Next Steps Event Trigger Offset Next Steps Internal error (IERR) 1. 2. 1h Thermal trip This event normally only happens due to failures of the thermal solution: 1. Verify heatsink is properly attached and has thermal grease. 2. If the system has a heatsink fan, ensure the fan is spinning. 3.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Processor Subsystem Byte Field Description 12 Sensor Number 80h 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 03h (Digital Discrete) 14 Event Data 1 [7:6] – 10b = OEM code in Event Data 2 [5:4] – 10b = OEM code in Event Data 3 [3:0] – Event Trigger Offset = 1h (State Asserted) 15 Event Data 2
Processor Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 6.3 CPU Missing Sensor The CPU Missing sensor is a discrete sensor reporting the processor is not installed. The most common instance of this event is due to a processor populated in the incorrect socket. Table 51: CPU Missing Sensor Typical Characteristics Byte 6.3.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Processor Subsystem The QPI Error sensors are reported by the BIOS SMI Handler to the BMC so the Generator ID will be 33h. 6.4.1 QPI Link Width Reduced Sensor BIOS POST has reduced the QPI Link Width because of an error condition seen during initialization. Table 52: QPI Link Width Reduced Sensor Typical Characteristics Byte 6.4.1.
Processor Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 6.4.2 QPI Correctable Error Sensor The system detected an error and corrected it. This is an informational event. Table 53: QPI Correctable Error Sensor Typical Characteristics Byte 6.4.2.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Processor Subsystem Table 54: QPI Fatal Error Sensor Typical Characteristics Byte Revision 1.
Processor Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families The QPI Fatal Error #2 is a continuation of QPI Fatal Error. Table 55: QPI Fatal #2 Error Sensor Typical Characteristics Byte 6.4.3.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Processor Subsystem 6.5 Processor ERR2 Timeout Sensor The BMC supports an ERR2 Timeout Sensor (1 per CPU) that asserts if a CPU’s ERR2 signal has been asserted for longer than a fixed time period (> 90 seconds).
Processor Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 6.5.1 Processor ERR2 Timeout – Next Steps 1. Check the SEL for any other events around the time of the failure. 2. Take note of all IPMI activity that was occurring around the time of the failure. Capture a System BMC Debug Log as soon as you can after experiencing this failure.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Processor Subsystem 6.6.1 Processor MSID Mismatch Sensor – Next Steps Verify the processor is supported by your baseboard. Check your boards Technical Product Specification (TPS). Revision 1.
Memory Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 7. Memory Subsystem Intel® servers report memory errors, status, and configuration in the SEL. 7.1 Memory RAS Configuration Status A Memory RAS Configuration Status event is logged after an AC power-on occurs, only if any RAS Mode is currently configured, and only if RAS Mode is successfully initiated.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Memory Subsystem Byte Field Description 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 09h (digital Discrete) 14 Event Data 1 [7:6] – 10b = OEM code in Event Data 2 [5:4] – 10b = OEM code in Event Data 3 [3:0] – Event Trigger Offset as described in Table 59 15 Event Data 2 RAS Configuration Error
Memory Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 7.2 Memory RAS Mode Select Memory RAS Mode Select events are logged to record changes in RAS Mode.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Memory Subsystem Byte Field 16 7.
Memory Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte 7.3.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Memory Subsystem Rank Sparing Mode protects memory data by reserving a “Spare Rank” on each channel that has memory installed on it. If a Correctable Error Threshold event occurs, the data from the failing rank is copied to the Spare Rank on the same channel, and the failing DIMM is disabled.
Memory Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte 16 7.4.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Memory Subsystem Table 63: Correctable and Uncorrectable ECC Error Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0033h = BIOS SMI Handler 11 Sensor Type 0ch = Memory 12 Sensor Number 02h 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 6Fh (Sensor Specific) 14 Event Dat
Memory Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 64: Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset – Next Steps Event Trigger Offset Hex Description 01h Uncorrectable ECC Error 00h 7.5.2 Correctable ECC Error threshold reached Description Next Steps An uncorrectable (multi-bit) ECC error has occurred.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Memory Subsystem Table 65: Address Parity Error Sensor Typical Characteristics Byte Field Description 8 9 Generator ID 0033h = BIOS SMI Handler 11 Sensor Type 0ch = Memory 12 Sensor Number 13h 13 Event Direction and Event Type [7] Event direction 0b = Assertion Event 1b = Deassertion Event [6:0] Event Type = 6Fh (Sensor Specific) 14 Event Data 1 [7:6] – 10b =
Memory Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description Channel J, K, L, M for CPU3 Channel N, P, R, T for CPU4 [2:0] – DIMM Slot ID (if valid) of the specific DIMM that was involved in the transaction that led to the parity error. This value will be indeterminate and should be ignored if ED2 Bit [3] is 0b. 0-2 = DIMM 1-3 on Channel All other values are reserved. 7.5.2.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families PCI Express* and Legacy PCI Subsystem 8. PCI Express* and Legacy PCI Subsystem The PCI Express* (PCIe) Specification defines standard error types under the Advanced Error Reporting (AER) capabilities. The BIOS logs AER events into the SEL. The Legacy PCI Specification error types are PERR and SERR. These errors are supported and logged into the SEL. 8.
PCI Express* and Legacy PCI Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel ®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 4h = PCI PERR 5h = PCI SERR 8.1.1.1 15 Event Data 2 PCI Bus number 16 Event Data 3 [7:3] – PCI Device number [2:0] – PCI Function number Legacy PCI Error Sensor – Next Steps 1. Decode the bus, device, and function to identify the card. 2. If this is an add-in card: a. Verify the card is inserted properly.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families PCI Express* and Legacy PCI Subsystem Byte Field Description 14 Event Data 1 [7:6] – 10b = OEM code in Event Data 2 [5:4] – 10b = OEM code in Event Data 3 [3:0] – Event Trigger 0h = Data Link Layer Protocol Error 1h = Surprise Link Down Error 2h = Completer Abort 3h = Unsupported Request 4h = Poisoned TLP 5h = Flow Control Protocol 6h = Completion Timeout 7h = Receiver Bu
PCI Express* and Legacy PCI Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel ®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte 8.1.2.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families PCI Express* and Legacy PCI Subsystem Table 69: PCI Express* Correctable Error Sensor Typical Characteristics Byte Revision 1.
PCI Express* and Legacy PCI Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel ®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 8.1.3.1 PCI Express* Correctable Error Sensor – Next Steps This is an informational event only. Correctable errors are acceptable and normal at a low rate of occurrence. If the error continues: 1. Decode the bus, device, and function to identify the card. 2. If this is an add-in card: a. Verify the card is inserted properly. b.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families System BIOS Events 9. System BIOS Events There are a number of events that are owned by the system BIOS. These events can occur during Power On Self Test (POST) or when coming out of a sleep state. Not all of these events signify errors. Some events are described in other chapters in this document (for example, memory events). 9.
System BIOS Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families The timestamp clock synchronization is run and the events are logged by the BIOS POST every time the system boots. In addition during the shutdown from some Operating Systems the BIOS SMI Handler is called to run timestamp clock synchronization and log the events.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families System BIOS Events 9.2 System Firmware Progress (Formerly Post Error) The BIOS logs any POST errors to the SEL. The 2-byte POST code gets logged in the ED2 and ED3 bytes in the SEL entry. This event will be logged every time a POST error is displayed. Even though this event indicates an error, it may not be a fatal error.
System BIOS Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 72: POST Error Codes Error Code 90 Error Message Response 0012 System RTC date/time not set Major 0048 Password check failed Major 0140 PCI component encountered a PERR error Major 0141 PCI resource conflict Major 0146 PCI out of resources error Major 0191 Processor core/thread count mismatch detected Fatal 0192 Processor cache s
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families System BIOS Events Error Code Revision 1.2 Error Message Response 8190 Watchdog timer failed on last boot Major 8198 OS boot watchdog timer failure Major 8300 Baseboard management controller failed self test Major 8305 Hot-Swap Controller failure Major 83A0 Management Engine (ME) failed self test Major 83A1 Management Engine (ME) Failed to respond.
System BIOS Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Error Code 92 Error Message Response 8534 DIMM_G3 failed test/initialization Major 8535 DIMM_H1 failed test/initialization Major 8536 DIMM_H2 failed test/initialization Major 8537 DIMM_H3 failed test/initialization Major 8538 DIMM_J1 failed test/initialization Major 8539 DIMM_J2 failed test/initialization Major 853A DIMM_J3 failed test
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families System BIOS Events Error Code Revision 1.
System BIOS Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Error Code 94 Error Message Response 8572 DIMM_G1 encountered a Serial Presence Detection (SPD) failure Major 8573 DIMM_G2 encountered a Serial Presence Detection (SPD) failure Major 8574 DIMM_G3 encountered a Serial Presence Detection (SPD) failure Major 8575 DIMM_H1 encountered a Serial Presence Detection (SPD) failure Major 8576 DIMM_H2 e
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families System BIOS Events Error Code Revision 1.
System BIOS Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Error Code 96 Error Message Response 8605 BIOS Settings are corrupted Major 8606 NVRAM variable space was corrupted and has been reinitialized Major 92A3 Serial port component was not detected Major 92A9 Serial port component encountered a resource conflict error Major A000 TPM device not detected.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Chassis Subsystem 10. Chassis Subsystem The BMC monitors several aspects of the chassis. Next to logging when the power and reset buttons get pressed, the BMC also monitors chassis intrusion if a chassis intrusion switch is included in the chassis, as well as looking at the network connections, and logging an event whenever the physical network link is lost. 10.
Chassis Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 15 Event Data 2 Not used 16 Event Data 3 Not used Table 74: Physical Security Sensor Event Trigger Offset – Next Steps Event Trigger Offset Hex Description Next Steps Description Somebody has opened the chassis (or the chassis intrusion sensor is not connected). 00h chassis intrusion 1. 2. 3.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Chassis Subsystem Table 75: FP (NMI) Interrupt Sensor Typical Characteristics Byte 10.2.
Chassis Subsystem System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 10.3 Button Sensor The BMC logs when the front panel power and reset buttons get pressed. This is purely for informational purposes and these events do not indicate errors.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Miscellaneous Events 11. Miscellaneous Events The miscellaneous events section addresses sensors not easily grouped with other sensor types. 11.1 IPMI Watchdog PCSD server systems support an IPMI watchdog timer, which can check to see whether the OS is still responsive. The timer is disabled by default, and has to be enabled manually.
Miscellaneous Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byt e Field Description 02h Power down 03h Power cycle 08h Timer interrupt be enabled manually. It then requires an IPMI-aware utility in the operating system that will reset the timer before it expires. If the timer does expire, the BMC can take action if it is configured to do so (reset, power down, power cycle, or generate a critical interrupt).
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Miscellaneous Events Event Trigger Offset Hex Next Steps Description 01h Hard reset 02h Power down 03h Power cycle 08h Timer interrupt Revision 1.2 Description responsive. The timer is disabled by default, and has to be enabled manually. It then requires an IPMI-aware utility in the operating system that will reset the timer before it expires.
Miscellaneous Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 11.2 SMI Timeout SMI stands for system management interrupt and is an interrupt that gets generated so the processor can service server management events (typically memory or PCI errors, or other forms of critical interrupts), in order to log them to the SEL. If this interrupt times out, the system is frozen.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Miscellaneous Events 11.3 System Event Log Cleared The BMC logs a SEL clear event. This is only ever the first event in the SEL. Cause of this event is either a manual SEL clear using selview or some other IPMI-aware utility, or is done in the factory as one of the last steps in the manufacturing process. This is an informational event only.
Miscellaneous Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 81: System Event – PEF Action Sensor Typical Characteristics Byte 11.4.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Miscellaneous Events 11.5 BMC Watchdog Sensor The BMC supports an IPMI sensor to report that a BMC reset has occurred due to an action taken by the BMC Watchdog feature. A SEL event will be logged whenever either the BMC FW stack is reset or the BMC CPU itself is reset. Table 82: BMC Watchdog Sensor Typical Characteristics Byte 11.5.
Miscellaneous Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 11.6 BMC FW Health Sensor The BMC tracks the health of each of its IPMI sensors and reports failures by providing a “BMC FW Health” sensor of the IPMI 2.0 sensor type Management Subsystem Health with support for the Sensor Failure offset. Only assertions will be logged into the SEL for the Sensor Failure offset.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Miscellaneous Events 11.7 Firmware Update Status Sensor The BMC FW supports a single Firmware Update Status sensor. This sensor is used to generate SEL events related to update of embedded firmware on the platform. This includes updates to the BMC, BIOS, and ME FW. This sensor is an event-only sensor that is not readable. Event generation is only enabled for assertion events.
Miscellaneous Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 11.8 Add-In Module Presence Sensor Some server boards provide dedicated slots for add-in modules/boards (for example, SAS, IO, and PCIe-riser). For these boards the BMC provides an individual presence sensor to indicate whether the module/board is installed. Table 85: Add-In Module Presence Sensor Typical Characteristics Byte 11.8.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Miscellaneous Events 11.9 Intel®Xeon Phi™ Coprocessor Management Sensors The Intel® Xeon® Processor E5 4600/2600/2400/1600 Product Families BMC supports limited manageability of the Intel® Xeon Phi™ Coprocessor adapter as described in this section.
Miscellaneous Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte 11.9.2.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Hot-Swap Controller Backplane Events 12. Hot-Swap Controller Backplane Events All new PCSD Platforms Based on Intel® Xeon® Processor E5 4600/2600/2400/1600 Product Families backplanes follow a hybrid architecture, in which the IPMI functionality previously supported in the HSC is integrated into the BMC FW. 12.
Hot-Swap Controller Backplane Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 88: HSC Backplane Temperature Sensor – Event Trigger Offset – Next Steps Hex Event Trigger Description Assertion Severity Deassert Severity Description Next Steps 00h Lower non-critical going low Degraded OK The temperature has dropped below its lower non-critical threshold. 1.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Hot-Swap Controller Backplane Events Byte 16 Field Event Data 3 Description Not used Table 90: Hard Disk Drive Monitoring Sensor – Event Trigger Offset – Next Steps Event Trigger Description 00h Drive Presence 01h Drive Fault 07h Rebuild/Remap in progress Next Steps If during normal operation the state changes unexpectedly, ensure that the drive was seated properly
Hot-Swap Controller Backplane Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte 12.3.1 Field Description 15 Event Data 2 Not used 16 Event Data 3 Not used HSC Health Sensor – Next Steps Ensure that all connections to the HSC are well seated. Cross test with another HSC. If the issue remains with the HSC, replace the HSC, otherwise start cross testing all interconnections.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Manageability Engine (ME) Events 13. Manageability Engine (ME) Events The Manageability Engine controls the PECI interface and also contains the Node Manager functionality. 13.1 ME Firmware Health Event This sensor is used in Platform Event messages to the BMC containing health information including but not limited to firmware upgrade and application errors.
Manageability Engine (ME) Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Table 93: ME Firmware Health Event Sensor – Next Steps ED2 ED3 Description Next Steps ® 00h Recovery GPIO forced. Recovery Image loaded due to recovery MGPIO pin asserted. Pin number is configurable in factory presets. Default recovery pin is MGPIO1. Deassert MGPIO1 and reset the Intel ME. 01h Image execution failed.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Manageability Engine (ME) Events 13.2 Node Manager Exception Event A Node Manager Exception Event will be sent each time maintained policy power limit is exceeded over Correction Time Limit.
Manageability Engine (ME) Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 13.3 Node Manager Health Event A Node Manager Health Event message provides a runtime error indication about Intel® Intelligent Power Node Manager’s health.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Manageability Engine (ME) Events Byte Field Description If Error type = 11 If Error type = 12 Otherwise set to 0. 13.3.1 Node Manager Health Event – Next Steps Misconfigured policy can happen if the max/min power consumption of the platform exceeds the values in policy due to hardware reconfiguration.
Manageability Engine (ME) Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 13.4 Node Manager Operational Capabilities Change This message provides a runtime error indication about Intel® Intelligent Power Node Manager’s operational capabilities. This applies to all domains. Assertion and deassertion of these events are supported.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Manageability Engine (ME) Events 13.4.1 Node Manager Operational Capabilities Change – Next Steps Policy Interface available indicates that Intel® Intelligent Power Node Manager is able to respond to the external interface about querying and setting Intel® Intelligent Power Node Manager policies. This is generally available as soon as the microcontroller is initialized.
Manageability Engine (ME) Events System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 13.5 Node Manager Alert Threshold Exceeded Policy Correction Time Exceeded Event will be sent each time maintained policy power limit is exceeded over Correction Time Limit.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Manageability Engine (ME) Events 13.5.1 Node Manager Alert Threshold Exceeded – Next Steps First occurrence of not acknowledged event will be retransmitted no faster than every 300 milliseconds. First occurrence of Threshold exceeded event assertion/deassertion will be retransmitted no faster than every 300 milliseconds. Next steps depend on the policy that was set.
Microsoft Windows* Records System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 14. Microsoft Windows* Records With Microsoft Windows Server 2003* R2 and later versions, an Intelligent Platform Management Interface (IPMI) driver was added. This added the capability of logging some OS events to the SEL. The driver can write multiple records to the SEL for the following events: Boot-up Shutdown Bug Check / Blue Screen 14.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Microsoft Windows* Records Table 99: Boot up OEM Event Record Typical Characteristics Byte Field Description 1 2 Record ID ID used for SEL Record access 3 Record Type [7:0] – DCh = OEM timestamped, bytes 8-16 OEM defined 4 5 6 7 Timestamp Time when the event was logged. LS byte first.
Microsoft Windows* Records System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 14.2 Shutdown Event Records When the system shuts down from the Microsoft Windows* OS, multiple events can be logged. The first is an OS Stop/Shutdown Event Record; this can be followed by a shutdown reason code OEM record, and then zero or more shutdown comment OEM records. These are all informational only records.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Microsoft Windows* Records Byte Field Description 6 7 8 9 10 IPMI Manufacturer ID 0137h (311d) = IANA enterprise number for Microsoft 11 Record ID Sequential number reflecting the order in which the records are read. The numbers start at 1 for the first entry in the SEL and continue sequentially to n, the number of entries in the SEL.
Microsoft Windows* Records System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Byte Field Description 12 13 14 15 Shutdown Comment Shutdown Comment from the registry (LSB first): HKLM/Software/Microsoft/Windows/CurrentVersion/Reliability/shutdown/Comment 16 Reserved 00h 14.3 Bug Check / Blue Screen Event Records When the system experiences a bug check (blue screen), multiple records will be written to the event log.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Microsoft Windows* Records Table 104: Bug Check / Blue Screen code OEM Event Record Typical Characteristics Byte Field Description 1 2 Record ID ID used for SEL Record access 3 Record Type [7:0] – DEh = OEM timestamped, bytes 8-16 OEM defined 4 5 6 7 Timestamp Time when the event was logged. LS byte first.
Linux* Kernel Panic Records System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families 15. Linux* Kernel Panic Records The Open IPMI driver supports the ability to put semi-custom and custom events in the system event log if a panic occurs. If you enable the “Generate a panic event to all BMCs on a panic” option, you will get one event on a panic in a standard IPMI event format.
System Event Log Troubleshooting Guide for PCSD Platforms Based on Intel®Xeon®Processor E5 4600/2600/2400/1600/1400 Product Families Linux* Kernel Panic Records Table 106: Linux* Kernel Panic String Extended Record Characteristics Byte Field Description 1 2 Record ID ID used for SEL Record access 3 Record Type [7:0] – F0h = OEM non-timestamped, bytes 4-16 OEM defined 4 Slave Address The slave address of the card saving the panic 5 Sequence Number A sequence number (starting at zero) 6 … 16 K