Technologies for the ProLiant ML570 G3 and ProLiant DL580 G3 Servers Technology Brief

Figure 7. With interleaving across XMB memory controllers, the north bridge interleaves the cache lines among the memory
board subsystems (moving down vertically in the schematic). Rank interleaving would then interleave cache lines horizontally,
as depicted with cache line 5.
Errors in memory
The move to a 64-bit architecture naturally leads to servers that support more memory capacity to fully
utilize the capabilities of the 64-bit architecture. The continued reduction in cost of high-capacity
memory modules also leads
customers to install more memory as a way of achieving high
performance. However, as memory capacity grows, it becomes statistically more likely that memory
errors will occur: both hard and soft errors.
Hard and soft errors
A hard memory error is characterized by the fact that it is repeatable and indicates a physical
problem such as a memory defect or a broken connection on the DIMM. The data in a DIMM that
contains a hard error may be corrected using standard or advanced ECC (depending on the number
of bits in error). However, the error itself cannot be fixed and every time that memory location is read,
another error occurs.
Most errors that occur in the memory subsystem are soft errors. A soft error is a randomly occurring
event caused by external influences such as high-energy alpha particles and cosmic radiation that has
enough energy to penetrate the earth’s surface. When such particles collide with a memory storage
device, it may disturb the state of the data bit(s). According to the JEDEC standard
7
, soft error rates
7
The JEDEC Solid State Technology Assocation is the prominent developer of standards in the solid state electronics industry. The JEDEC standard
JESD89, “Measurement and Reporting of Alpha Particles and terrestrial Cosmic-Ray Soft Errors Induced in Semiconductor Devices,” is available.
at
www.jedec.org
11