SEL Troubleshooting Guide

Memory Subsystem System Event Log Troubleshooting Guide for Intel® S5500/S3420 Series Server Boards
54 Intel order number G74211-002 Revision 1.1
Byte
Field
Description
16
Event Data 3
[7:5] Indicates the Processor Socket to which the DDR3 DIMM having the ECC error is attached:
000b = Processor Socket 1
001b = Processor Socket 2
All other values are reserved.
[4:3] Indicates the processor Memory Channel to which the failing DDR3 DIMM is attached:
00b = Channel A or D (For Processor Socket 1, Processor Socket 2)
01b = Channel B or E
10b = Channel C or F
11b is reserved.
[2:0] Indicates the DIMM Socket on the channel to which the failing DDR3 DIMM is attached:
000b = DIMM Socket 1
001b = DIMM Socket 2
All other values are reserved.
Table 61: Correctable and Uncorrectable ECC Error Sensor Event Trigger Offset Next Steps
Event Trigger Offset
Description
Next Steps
Hex
Description
01h
Uncorrectable ECC
Error
An uncorrectable (multi-bit) ECC error has occurred. This is a
fatal issue that will typically lead to an OS crash (unless memory
has been configured in a RAS mode). The system will generate a
CATERR# (catastrophic error) and an MCE (Machine Check
Exception Error).
While the error may be due to a failing DRAM chip on the DIMM,
it could also be caused by incorrect seating or improper contact
between the socket and DIMM, or by bent pins in the processor
socket.
1. If needed, decode DIMM location from hex version of SEL.
2. Verify the DIMM is seated properly.
3. Examine gold fingers on edge of the DIMM to verify contacts are clean.
4. Inspect the processor socket this DIMM is connected to for bent pins,
and if found, replace the board.
5. Consider replacing the DIMM as a preventative measure. For multiple
occurrences, replace the DIMM.