Technical Product Specification

Table Of Contents
Functional Architecture Intel
®
Server Board S5400SF TPS
Revision 2.02
Intel order number: D92944-007
30
During POST, the BIOS captures and reports memory BIST errors.
At runtime, the BIOS captures and reports correctable, uncorrectable, and fatal errors occurring
in the memory subsystem.
3.2.3.9.1 Faulty FBDIMMs
The BIOS provides detection of a faulty or failing FBDIMM. A FBDIMM is considered faulty if it
fails the memory BIST. The BIOS enables the in-built memory BIST engine in the Intel
®
5400
MCH Chipset during memory initialization in POST. The Memory BIST cycle isolates failed,
failing, or faulty FBDIMMs and the BIOS then marks those FBDIMMs as failed, and takes these
FBDIMMs offline.
FBDIMMs can also fail during normal operation. The BIOS marks these FBDIMMs as
temporarily disabled, and performs other housekeeping tasks as relevant. The Memory BIST
function is run on every FBDIMM during each boot of the system.
3.2.3.9.2 Faulty Links
FBDIMM technology is a serial technology. Therefore, errors or failures can occur on the serial
path between FBDIMMs. These errors are different from ECC errors, and do not necessarily
occur as a result of faulty DRAM cells. The BIOS keeps track of such link-level failures.
In general, when a fatal link failure occurs, the BIOS disables all FBDIMMs on that link. If all
FBDIMMs are present on the same faulty link, the BIOS generates a POST code 0xE1 to
indicate that the system has no usable memory, and then halts the system. For example, if A1
through A4 and B1 through B4 is populated with 1 GB FBDIMMs, and if A3 fails, the BIOS
disables both A3 and A4.
If a fatal link failure occurs during normal operation at runtime (after POST), the BIOS signals a
fatal error and performs policies related to fatal error handling.
3.2.3.9.3 Error Counters and Thresholds
The BIOS handles memory errors through a variety of platform-specific policies. Each of these
policies is aimed at providing comprehensive diagnostic support to the system administrator
towards system recovery following the failure.
The BIOS uses error counters on the Intel
®
5400 Chipset and internal software counters to track
the number of single-bit correctable and multi-bit correctable errors that occur at runtime. The
chipset increments the count for these counters when an error occurs. The count also decays at
a given rate, programmable by the BIOS. Because of this particular nature of the counters, they
are termed leaky bucket counters.
The leaky bucket counters provide a measurement of the frequency of errors. The BIOS
configures and uses the leaky bucket counters and the decay rate such that it can be notified of
a failing FBDIMM. A degrading DRAM typically generates errors faster over time, which is
detected by the leaky bucket algorithm. The chipset maintains separate internal leaky bucket
counters for single-bit correctable and multi-bit correctable errors respectively.