Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

E-4 Vol. 1
GUIDELINES FOR WRITING SIMD FLOATING-POINT EXCEPTION HANDLERS
occur immediately and are not delayed until a subsequent floating-point instruction
is executed. However, floating-point emulation may be necessary when unmasked
floating-point exceptions are generated.
E.4 SIMD FLOATING-POINT EXCEPTIONS AND THE IEEE
STANDARD 754
SSE/SSE2/SSE3 extensions are 100% compatible with the IEEE Standard 754 for
Binary Floating-Point Arithmetic, satisfying all of its mandatory requirements (when
the flush-to-zero or denormals-are-zeros modes are not enabled). But a program-
ming environment that includes SSE/SSE2/SSE3 instructions will comply with both
the obligatory and the strongly recommended requirements of the IEEE Standard
754 regarding floating-point exception handling, only as a combination of hardware
and software (which is acceptable). The standard states that a user should be able to
request a trap on any of the five floating-point exceptions (note that the denormal
exception is an IA-32 addition), and it also specifies the values (operands or result)
to be delivered to the exception handler.
The main issue is that for SSE/SSE2/SSE3 instructions that raise post-computation
exceptions (traps: overflow, underflow, or inexact), unlike for x87 FPU instructions,
the processor does not provide the result recommended by IEEE Standard 754 to the
user handler. If a user program needs the result of an instruction that generated a
post-computation exception, it is the responsibility of the software to produce this
result by emulating the faulting SSE/SSE2/SSE3 instruction. Another issue is that the
standard does not specify explicitly how to handle multiple floating-point exceptions
that occur simultaneously. For packed operations, a logical OR of the flags that would
be set by each sub-operation is used to set the exception flags in the MXCSR. The
following subsections present one possible way to solve these problems.
E.4.1 Floating-Point Emulation
Every operating system must provide a kernel level floating-point exception handler
(a template was presented in Section E.2, “Software Exception Handling” above). In
the following discussion, assume that a user mode floating-point exception filter is
supplied for SIMD floating-point exceptions (for example as part of a library of C
functions), that a user program can invoke in order to handle unmasked exceptions.
The user mode floating-point exception filter (not shown here) has to be able to
emulate the subset of SSE/SSE2/SSE3 instructions that can generate numeric
exceptions, and has to be able to invoke a user provided floating-point exception
handler for floating-point exceptions. When a floating-point exception that is not
masked is raised by an SSE/SSE2/SSE3 instruction, the low-level floating-point
exception handler will be called. This low-level handler may in turn call the user mode
floating-point exception filter. The filter function receives the original operands of the
excepting instruction as no results are provided by the hardware, whether a pre-
computation or a post-computation exception has occurred. The filter will unpack the