SEL Troubleshooting Guide
Memory Subsystem
System Event Log Troubleshooting Guide for PCSD
Platforms Based on Intel
®
Xeon
®
Processor E5 4600/2600/2400/1600/1400 Product Families
76 Intel order number G90620-003 Revision 1.2
Byte
Field
Description
16
Event Data 3
Location
[7:5]= Socket ID
0-3 = CPU1-4
[4:3] = Channel
0-3 = Channel A, B, C, D for CPU1
Channel E, F, G, H for CPU2
Channel J, K, L, M for CPU3
Channel N, P, R, T for CPU4
[2:0] = DIMM
0-2 = DIMM 1-3 on Channel
7.4.1 Sparing Redundancy State Sensor – Next Steps
This event is accompanied by memory errors indicating the source of the issue. Troubleshoot accordingly (probably replace affected
DIMM).
For boards with DIMM Fault LEDs, the appropriate Fault LED is lit to indicate which DIMM was the source of the error triggering the
Mirroring Failover action, that is, the failing DIMM.
7.5 ECC and Address Parity
1. Memory data errors are logged as correctable or uncorrectable.
2. Uncorrectable errors are fatal.
3. Memory addresses are protected with parity bits and a parity error is logged. This is a fatal error.
7.5.1 Memory Correctable and Uncorrectable ECC Error
ECC errors are divided into Uncorrectable ECC Errors and Correctable ECC Errors. A “Correctable ECC Error” actually represents a
threshold overflow. More Correctable Errors are detected at the memory controller level for a given DIMM within a given timeframe.
In both cases, the error can be narrowed down to particular DIMM(s). The BIOS SMI error handler uses this information to log the
data to the BMC SEL and identify the failing DIMM module.