Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

11-32 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
majority of its floating-point computations in the XMM registers, using the packed
and scalar floating-point instructions, and at the same time use the x87 FPU to
perform trigonometric and other transcendental computations. Likewise, an
application can perform packed 64-bit and 128-bit SIMD integer operations
together without restrictions.
Those SSE and SSE2 instructions that operate on MMX registers (such as the
CVTPS2PI, CVTTPS2PI, CVTPI2PS, CVTPD2PI, CVTTPD2PI, CVTPI2PD,
MOVDQ2Q, MOVQ2DQ, PADDQ, and PSUBQ instructions) can also be executed in
the same instruction stream as 64-bit SIMD integer or x87 FPU instructions,
however, here they are subject to the restrictions on the simultaneous use of
MMX technology and x87 FPU instructions, which include:
Transition from x87 FPU to MMX technology instructions or to SSE or SSE2
instructions that operate on MMX registers should be preceded by saving the
state of the x87 FPU.
Transition from MMX technology instructions or from SSE or SSE2 instruc-
tions that operate on MMX registers to x87 FPU instructions should be
preceded by execution of the EMMS instruction.
11.6.8 Compatibility of SIMD and x87 FPU Floating-Point Data
Types
SSE and SSE2 extensions operate on the same single-precision and double-precision
floating-point data types that the x87 FPU operates on. However, when operating on
these data types, the SSE and SSE2 extensions operate on them in their native
format (single-precision or double-precision), in contrast to the x87 FPU which
extends them to double extended-precision floating-point format to perform compu-
tations and then rounds the result back to a single-precision or double-precision
format before writing results to memory. Because the x87 FPU operates on a higher
precision format and then rounds the result to a lower precision format, it may
return a slightly different result when performing the same operation on the same
single-precision or double-precision floating-point values than is returned by the SSE
and SSE2 extensions. The difference occurs only in the least-significant bits of the
significand.
11.6.9 Mixing Packed and Scalar Floating-Point and 128-Bit SIMD
Integer Instructions and Data
SSE and SSE2 extensions define typed operations on packed and scalar floating-
point data types and on 128-bit SIMD integer data types, but IA-32 processors do not
enforce this typing at the architectural level. They only enforce it at the microarchi-
tectural level. Therefore, when a Pentium 4 or Intel Xeon processor loads a packed or
scalar floating-point operand or a 128-bit packed integer operand from memory into
an XMM register, it does not check that the actual data being loaded matches the
data type specified in the instruction. Likewise, when the processor performs an