SEL Troubleshooting Guide

System Event Log Troubleshooting Guide for Intel
®
S5500/S3420 Series Server Boards Memory Subsystem
Revision 1.1 Intel order number G74211-002 55
Event Trigger Offset
Description
Next Steps
Hex
Description
00h
Correctable ECC
Error threshold
reached
There have been too many (10 or more) correctable ECC errors
for this particular DIMM since last boot. This event in itself does
not pose any direct problems as the ECC errors are still being
corrected. Depending on the RAS configuration of the memory,
the IMC may take the affected DIMM offline.
Even though this event doesn't immediately lead to problems, it can indicate
one of the DIMM modules is slowly failing. If this error occurs more than
once:
1. If needed, decode DIMM location from hex version of SEL.
2. Verify the DIMM is seated properly.
3. Examine gold fingers on edge of the DIMM to verify contacts are clean.
4. Inspect the processor socket this DIMM is connected to for bent pins,
and if found, replace the board.
5. Consider replacing the DIMM as a preventative measure. For multiple
occurrences, replace the DIMM.
7.2.2 Memory Address Parity Error
Address Parity errors are errors detected in the memory addressing hardware. Because these affect the addressing of memory contents, they
can potentially lead to the same sort of failures as ECC errors. They are logged as a distinct type of error because they affect memory
addressing rather than memory contents, but otherwise they are treated exactly the same as Uncorrectable ECC Errors. Address Parity errors
are logged to the BMC SEL, with Event Data to identify the failing address by channel and DIMM to the extent that it is possible to do so.
Table 62: Address Parity Error Sensor Typical Characteristics
Byte
Field
Description
8
9
Generator ID
0033h = BIOS SMI Handler
11
Sensor Type
0ch = Memory
12
Sensor Number
14h
13
Event Direction and
Event Type
[7] Event direction
0b = Assertion Event
1b = Deassertion Event
[6:0] Event Type = 6Fh (Sensor Specific)