Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

Vol. 3A 14-21
MACHINE-CHECK ARCHITECTURE
14.8 GUIDELINES FOR WRITING MACHINE-CHECK
SOFTWARE
The machine-check architecture and error logging can be used in two different ways:
To detect machine errors during normal instruction execution, using the
machine-check exception (#MC).
To periodically check and log machine errors.
To use the machine-check exception, the operating system or executive software
must provide a machine-check exception handler. This handler can be designed
specifically for Pentium 4 and Intel Xeon processors or for P6 family processors. It
can also be a portable handler that handles processor machine-check errors from
several generations of IA-32 processors.
A special program or utility is required to log machine errors.
Guidelines for writing a machine-check exception handler or a machine-error logging
utility are given in the following sections.
14.8.1 Machine-Check Exception Handler
The machine-check exception (#MC) corresponds to vector 18. To service machine-
check exceptions, a trap gate must be added to the IDT. The pointer in the trap gate
must point to a machine-check exception handler. Two approaches can be taken to
designing the exception handler:
1. The handler can merely log all the machine status and error information, then
call a debugger or shut down the system.
2. The handler can analyze the reported error information and, in some cases,
attempt to correct the error and restart the processor.
For Pentium 4, Intel Xeon, P6 family, and Pentium processors; virtually all machine-
check conditions cannot be corrected (they result in abort-type exceptions). The
logging of status and error information is therefore a baseline implementation
requirement.
When recovery from a machine-check error may be possible, consider the following
when writing a machine-check exception handler:
To determine the nature of the error, the handler must read each of the error-
reporting register banks. The count field in the IA32_MCG_CAP register gives
number of register banks. The first register of register bank 0 is at address 400H.
The VAL (valid) flag in each IA32_MCi_STATUS register indicates whether the
error information in the register is valid. If this flag is clear, the registers in that
bank do not contain valid error information and do not need to be checked.
To write a portable exception handler, only the MCA error code field in the
IA32_MCi_STATUS register should be checked. See Section 14.7., “Interpreting