Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

ManualsBrandsIntel ManualsOtherIntel Pentium 4 Processor 2.80 GHz, 512K Cache, 533 MHz FSB

411

412

413

414

415

416

417

418

419

420

E-4 Vol. 1

GUIDELINES FOR WRITING SIMD FLOATING-POINT EXCEPTION HANDLERS

occur immediately and are not delayed until a subsequent floating-point instruction

is executed. However, floating-point emulation may be necessary when unmasked

floating-point exceptions are generated.

E.4 SIMD FLOATING-POINT EXCEPTIONS AND THE IEEE

STANDARD 754

SSE/SSE2/SSE3 extensions are 100% compatible with the IEEE Standard 754 for

Binary Floating-Point Arithmetic, satisfying all of its mandatory requirements (when

the flush-to-zero or denormals-are-zeros modes are not enabled). But a program-

ming environment that includes SSE/SSE2/SSE3 instructions will comply with both

the obligatory and the strongly recommended requirements of the IEEE Standard

754 regarding floating-point exception handling, only as a combination of hardware

and software (which is acceptable). The standard states that a user should be able to

request a trap on any of the five floating-point exceptions (note that the denormal

exception is an IA-32 addition), and it also specifies the values (operands or result)

to be delivered to the exception handler.

The main issue is that for SSE/SSE2/SSE3 instructions that raise post-computation

exceptions (traps: overflow, underflow, or inexact), unlike for x87 FPU instructions,

the processor does not provide the result recommended by IEEE Standard 754 to the

user handler. If a user program needs the result of an instruction that generated a

post-computation exception, it is the responsibility of the software to produce this

result by emulating the faulting SSE/SSE2/SSE3 instruction. Another issue is that the

standard does not specify explicitly how to handle multiple floating-point exceptions

that occur simultaneously. For packed operations, a logical OR of the flags that would

be set by each sub-operation is used to set the exception flags in the MXCSR. The

following subsections present one possible way to solve these problems.

E.4.1 Floating-Point Emulation

Every operating system must provide a kernel level floating-point exception handler

(a template was presented in Section E.2, “Software Exception Handling” above). In

the following discussion, assume that a user mode floating-point exception filter is

supplied for SIMD floating-point exceptions (for example as part of a library of C

functions), that a user program can invoke in order to handle unmasked exceptions.

The user mode floating-point exception filter (not shown here) has to be able to

emulate the subset of SSE/SSE2/SSE3 instructions that can generate numeric

exceptions, and has to be able to invoke a user provided floating-point exception

handler for floating-point exceptions. When a floating-point exception that is not

masked is raised by an SSE/SSE2/SSE3 instruction, the low-level floating-point

exception handler will be called. This low-level handler may in turn call the user mode

floating-point exception filter. The filter function receives the original operands of the

excepting instruction as no results are provided by the hardware, whether a pre-

computation or a post-computation exception has occurred. The filter will unpack the