HP Integrity rx3600 Server User Service Guide
The zx2 chip doubles memory carrier error correction from 4 bytes to 8 bytes of a 128 byte cache
line during cache line misses initiated by processor cache controllers, and by Direct Memory Access
(DMA) operations initiated by I/O devices. This feature is called double DRAM sparing. 2 out of
72 DRAMs in any DIMM quad can fail without any loss of server performance.
You must replace DIMMs or memory carriers when a threshold is reached for multiple double-byte
errors from one or more DIMMs on the same board. When any uncorrectable memory error (more
than 2 bytes) or when no quad of like memory DIMMs is loaded in rank 0 of side 0, you must
replace the DIMMs. All other DIMM errors are corrected by zx2 and reported to the Page
Deallocation Table (PDT) and the diagnostic LED panel.
Memory Error Messages
• Diagnostic LEDs light only when an error is isolated to a specific DIMM.
• Configuration errors, such as no DIMMs installed, cause diagnostic LEDs to light for all DIMMs
not installed.
• No diagnostic LEDs light for single-byte errors that are corrected in both Zx2 caches and
memory DIMMs during corrected platform error (CPE) events. Diagnostic messages are reported
for CPE events when thresholds are exceeded for both single-byte and double byte errors; all
fatal memory errors cause global MCA events.
• PDT logs for all double byte errors are permanent. Single byte errors are initially logged as
transient errors. If the server logs two single byte errors within 24 hours, they are upgraded
to permanent in the PDT.
Table 54 and Table 55 list the memory subsystem events that light and that may light the diagnostic
panel LEDs.
Table 54 Memory Subsystem Events that Light Diagnostic Panel LEDs
NotesSourceCauseIPMI EventsDiagnostic
LEDs
A voltage on the
memory expander is
BMCVoltage on memory board
is inadequate.
Type 02h, 02h:07h:03h
VOLTAGE_DEGRADES_TO_Non_RECOVERABLE
Memory
Carrier
out of range (likely too
low)
Light all DIMM LEDs in
rank 0 of cell 0
SFWNo memory DIMMs
installed in slot 0 of cell
0.
Type E0h, 208d:04d
MEM_NO_DIMMS_INSTALLED
DIMMs
Either EEPROM is
misprogrammed or
SFWA DIMM has a serial
presence detect (SPD)
Type E0h, 172d:04d
MEM_DIMM_SPD_CHECKSUM
DIMMs
this DIMM is
incompatible
EEPROM with a bad
checksum.
Memory rank is about
to fail or
WIN
Agent
This memory board is
correcting too many
single-bit errors.
Type E0h, 4652d:26d
WIN_AGT_PREDICT_MEM_FAIL
DIMMs
environmental
conditions are causing
more errors than usual
Table 55 Memory Subsystem Events that May Light Diagnostic Panel LEDs
NotesSourceCauseIPMI EventsDiagnostic
LEDs
SFWUnable to clear the
platform error logs in the
CEC.
Type E0h, 189d:26d
MEM_ERR_LOG_FAILED_TO_CLEAR
Processor
Carrier
SFWSelftest of CEC multibit
error signaling has failed.
Type E0h, 181d:26d
MEM_ECC_MBE_SIGNAL_TST_FAILED
Processor
Carrier
CPU, Memory and SBA 145