Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1
Vol. 3A 14-25
MACHINE-CHECK ARCHITECTURE
When the MCIP flag is set in the IA32_MCG_STATUS register, a machine-check
exception is in progress and the machine-check exception handler has called the
exception logging routine.
Once the logging process has been completed the exception-handling routine must
determine whether execution can be restarted, which is usually possible when
damage has not occurred (The PCC flag is clear, in the IA32_MCi_STATUS register)
and when the processor can guarantee that execution is restartable (the RIPV flag is
set in the IA32_MCG_STATUS register). If execution cannot be restarted, the system
is not recoverable and the exception-handling routine should signal the console
appropriately before returning the error status to the Operating System kernel for
subsequent shutdown.
The machine-check architecture allows buffering of exceptions from a given error-
reporting bank although the Pentium 4, Intel Xeon, and P6 family processors do not
implement this feature. The error logging routine should provide compatibility with
future processors by reading each hardware error-reporting bank's
IA32_MCi_STATUS register and then writing 0s to clear the OVER and VAL flags in
this register. The error logging utility should re-read the IA32_MCi_STATUS register
for the bank ensuring that the valid bit is clear. The processor will write the next error
into the register bank and set the VAL flags.
Additional information that should be stored by the exception-logging routine
includes the processor’s time-stamp counter value, which provides a mechanism to
indicate the frequency of exceptions. A multiprocessing operating system stores the
identity of the processor node incurring the exception using a unique identifier, such
as the processor’s APIC ID (see Section 8.8, “Handling Interrupts”).
The basic algorithm given in Example 14-3 can be modified to provide more robust
recovery techniques. For example, software has the flexibility to attempt recovery
using information unavailable to the hardware. Specifically, the machine-check
exception handler can, after logging carefully analyze the error-reporting registers
when the error-logging routine reports an error that does not allow execution to be
restarted. These recovery techniques can use external bus related model-specific
information provided with the error report to localize the source of the error within
the system and determine the appropriate recovery strategy.