Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
D-24 Vol. 1
GUIDELINES FOR WRITING X87 FPU EXCEPTION HANDLERS
D.3.6.1 Speculatively Deferring x87 FPU Saves, General Overview
In order to support multitasking, each thread in the system needs a save area for the
general-purpose registers, and each task that is allowed to use floating-point needs
an x87 FPU save area large enough to hold the entire x87 FPU stack and associated
x87 FPU state such as the control word and status word. (See Section 8.1.10,
“Saving the x87 FPU’s State with FSTENV/FNSTENV and FSAVE/FNSAVE,” for a
complete description of the x87 FPU save image.) If the processor and the operating
system support Streaming SIMD Extensions, the save area should be large enough
and aligned correctly to hold x87 FPU and Streaming SIMD Extensions state.
On a task switch, the general-purpose registers are swapped out to their save area
for the suspending thread, and the registers of the resuming thread are loaded. The
x87 FPU state does not need to be saved at this point. If the resuming thread does
not use the x87 FPU before it is itself suspended, then both a save and a load of the
x87 FPU state has been avoided. It is often the case that several threads may be
executed without any usage of the x87 FPU.
The processor supports speculative deferral of x87 FPU saves via interrupt 7 “Device
Not Available” (DNA), used in conjunction with CR0 bit 3, the “Task Switched” bit
(TS). (See “Control Registers” in Chapter 2 of the Intel® 64 and IA-32 Architectures
Software Developer’s Manual, Volume 3A.) Every task switch via the hardware
supported task switching mechanism (see “Task Switching” in Chapter 6 of the Intel®
64 and IA-32 Architectures Software Developer’s Manual, Volume 3A) sets TS. Multi-
threaded kernels that use software task switching
1
can set the TS bit by reading CR0,
ORing a “1” into
2
bit 3, and writing back CR0. Any subsequent floating-point instruc-
tions (now being executed in a new thread context) will fault via interrupt 7 before
execution.
This allows a DNA handler to save the old floating-point context and reload the x87
FPU state for the current thread. The handler should clear the TS bit before exit using
the CLTS instruction. On return from the handler the faulting thread will proceed with
its floating-point computation.
Some operating systems save the x87 FPU context on every task switch, typically
because they also change the linear address space between tasks. The problem and
solution discussed in the following sections apply to these operating systems also.
1. In a software task switch, the operating system uses a sequence of instructions to save the sus-
pending thread’s state and restore the resuming thread’s state, instead of the single long non-
interruptible task switch operation provided by the IA-32 architecture.
2. Although CR0, bit 2, the emulation flag (EM), also causes a DNA exception, do not use the EM bit as
a surrogate for TS. EM means that no x87 FPU is available and that floating-point instructions
must be emulated. Using EM to trap on task switches is not compatible with the MMX technology.
If the EM flag is set, MMX instructions raise the invalid opcode exception.