Technologies for the ProLiant ML570 G3 and ProLiant DL580 G3 Servers Technology Brief

are affected by the increased density of memory devices, system voltage and timing margins, memory
system operating frequencies, radioisotropic impurities in packaging and circuit board materials, and
even magnetic variations due to altitude/location around the earth.
Soft errors can be corrected using standard or advanced ECC. Because a soft error is not caused by
a problem with the DIMM, o
nce the data is corrected, the same error will not recur in the same
component.
Correctable and uncorrectable errors
Errors can be categorized as either correctable or uncorrectable. In the ProLiant ML570 G3 and the
DL580 G3 servers, the memory controller calculates check bits every time it writes to memory. When
memory is read, it re-calculates those check bits from the data stored in the DRAM devices and
compares the re-calculated check bits to the stored check bits. If the two sets of check bits are
different, the error can be corrected if it is a:
Single-bit error in a DRAM device (correctable by standard ECC)
Multi-bit error in a DRAM device (correctable by advanced ECC)
If multi-bit failures occur in different DRAM devices, they are not correctable. The uncorrectable error
will return bad data unless the customer has enabled Advanced Memory Protection techniques such
as Hot Plug RAID or Hot Plug Mirrored Memory.
Standard ECC
To significantly reduce the probability of fatal memory failures, HP was the first company to introduce
ECC memory in industry-standard servers in 1993. ECC memory is now standard in all HP ProLiant
servers and most other industry-standard servers. Standard ECC detects both single-bit and double-bit
errors, and it corrects single-bit errors.
Advanced ECC (single device data correction)
To improve memory protection, HP introduced Advanced ECC technology
8
in 1996. HP and most
other server manufacturers continue to use this solution in industry-standard products. The advanced
ECC algorithm that Intel uses in the XMB controllers is referred to as single device data correction
(SDDC). The eight-bit (x8) implementation of SDDC can detect and correct multi-bit failures in a four-
bit (x4) or x8 DRAM device,
9
which makes it able to recover from a x4 or x8 DRAM component
failure. It can also detect errors in two x4 DRAM components.
Advanced ECC is the only memory protection technique for the ProLiant ML570 G3 and the ProLiant
DL580 G3 servers that supports hot-add. Hot-add refers to adding memory boards to the system while
it is running, which make additional memory resources available to the OS. It must be enabled in the
ROM-Based Setup Utility (RBSU) and must be supported by the OS. Advanced ECC with hot-add
enabled allows the amount of memory available to the OS to be increased without rebooting the
system.
Demand scrubbing
After the chipset detects a correctable memory error, it recalculates the correct data using ECC (or
advanced ECC) check bits and sends this correct data back to the processor. For soft errors, the
invalid data is still present in the DRAM unless that memory error is scrubbed, or the good data is
written back to the DRAM. The ProLiant ML570 G3 and DL580 G3 servers support a memory
scrubbing technique called demand scrubbing.
8
U.S. Patent assigned to HP. D.G. Abdoo and J.D. Cabello, "Error Correction System for N Bits Using Error Correcting Code Designed for Fewer
than N Bits." U.S. Patent 5,490,155 (Feb. 6, 1996).
9
Server memory DIMMs use DRAM chips which hold either 4 or 8 bits, known as x4 or x8 devices.
12