SEL Troubleshooting Guide

Table of Contents System Event Log Troubleshooting Guide for PCSD
Platforms Based on Intel
®
Xeon
®
Processor E5
4600/2600/2400/1600/1400 Product Families
iv Intel order number G90620-003 Revision 1.2
Table of Contents
1. Introduction ........................................................................................................................ 1
1.1 Purpose .................................................................................................................. 1
1.2 Industry Standard ................................................................................................... 2
1.2.1 Intelligent Platform Management Interface (IPMI) ................................................... 2
1.2.2 Baseboard Management Controller (BMC) ............................................................. 2
1.2.3 Intel
®
Intelligent Power Node Manager Version 2.0 ................................................ 3
2. Basic Decoding of a SEL Record ...................................................................................... 4
2.1 Default Values in the SEL Records ........................................................................ 4
2.2 Notes on SEL Logs and Collecting SEL Information ............................................. 10
2.2.1 Examples of Decoding BIOS Timestamp Events .................................................. 10
2.2.2 Example of Decoding a PCI Express* Correctable Error Events........................... 11
2.2.3 Example of Decoding a Power Supply Predictive Failure Event ........................... 12
3. Sensor Cross Reference List ........................................................................................... 13
3.1 BMC owned Sensors (GID = 0020h) .................................................................... 13
3.2 BIOS POST owned Sensors (GID = 0001h) ......................................................... 24
3.3 BIOS SMI Handler owned Sensors (GID = 0033h) ............................................... 24
3.4 Node Manager / ME Firmware owned Sensors (GID = 002Ch or 602Ch) ............. 25
3.5 Microsoft* OS owned Events (GID = 0041) .......................................................... 26
3.6 Linux* Kernel Panic Events (GID = 0021) ............................................................. 26
4. Power Subsystems ........................................................................................................... 27
4.1 Threshold-based Voltage Sensors ....................................................................... 27
4.2 Voltage Regulator Watchdog Timer Sensor ......................................................... 33
4.2.1 Voltage Regulator Watchdog Timer Sensor Next Steps .................................... 34
4.3 Power Unit ........................................................................................................... 34
4.3.1 Power Unit Status Sensor .................................................................................... 34
4.3.2 Power Unit Redundancy Sensor........................................................................... 36
4.3.3 Node Auto Shutdown Sensor ............................................................................... 37
4.4 Power Supply ....................................................................................................... 38
4.4.1 Power Supply Status Sensors .............................................................................. 38
4.4.2 Power Supply Power In Sensors .......................................................................... 41
4.4.3 Power Supply Current Out % Sensors ................................................................. 42
4.4.4 Power Supply Temperature Sensors .................................................................... 43
4.4.5 Power Supply Fan Tachometer Sensors .............................................................. 44
5. Cooling Subsystem .......................................................................................................... 45
5.1 Fan Sensors ......................................................................................................... 45
5.1.1 Fan Tachometer Sensors ..................................................................................... 45
5.1.2 Fan Presence and Redundancy Sensors ............................................................. 46
5.2 Temperature Sensors ........................................................................................... 49