Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

ManualsBrandsIntel ManualsOtherIntel Pentium 4 Processor 2.80 GHz, 512K Cache, 533 MHz FSB

321

322

323

324

325

326

327

328

329

330

11-32 Vol. 1

PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)

majority of its floating-point computations in the XMM registers, using the packed

and scalar floating-point instructions, and at the same time use the x87 FPU to

perform trigonometric and other transcendental computations. Likewise, an

application can perform packed 64-bit and 128-bit SIMD integer operations

together without restrictions.

• Those SSE and SSE2 instructions that operate on MMX registers (such as the

CVTPS2PI, CVTTPS2PI, CVTPI2PS, CVTPD2PI, CVTTPD2PI, CVTPI2PD,

MOVDQ2Q, MOVQ2DQ, PADDQ, and PSUBQ instructions) can also be executed in

the same instruction stream as 64-bit SIMD integer or x87 FPU instructions,

however, here they are subject to the restrictions on the simultaneous use of

MMX technology and x87 FPU instructions, which include:

— Transition from x87 FPU to MMX technology instructions or to SSE or SSE2

instructions that operate on MMX registers should be preceded by saving the

state of the x87 FPU.

— Transition from MMX technology instructions or from SSE or SSE2 instruc-

tions that operate on MMX registers to x87 FPU instructions should be

preceded by execution of the EMMS instruction.

11.6.8 Compatibility of SIMD and x87 FPU Floating-Point Data

Types

SSE and SSE2 extensions operate on the same single-precision and double-precision

floating-point data types that the x87 FPU operates on. However, when operating on

these data types, the SSE and SSE2 extensions operate on them in their native

format (single-precision or double-precision), in contrast to the x87 FPU which

extends them to double extended-precision floating-point format to perform compu-

tations and then rounds the result back to a single-precision or double-precision

format before writing results to memory. Because the x87 FPU operates on a higher

precision format and then rounds the result to a lower precision format, it may

return a slightly different result when performing the same operation on the same

single-precision or double-precision floating-point values than is returned by the SSE

and SSE2 extensions. The difference occurs only in the least-significant bits of the

significand.

11.6.9 Mixing Packed and Scalar Floating-Point and 128-Bit SIMD

Integer Instructions and Data

SSE and SSE2 extensions define typed operations on packed and scalar floating-

point data types and on 128-bit SIMD integer data types, but IA-32 processors do not

enforce this typing at the architectural level. They only enforce it at the microarchi-

tectural level. Therefore, when a Pentium 4 or Intel Xeon processor loads a packed or

scalar floating-point operand or a 128-bit packed integer operand from memory into

an XMM register, it does not check that the actual data being loaded matches the

data type specified in the instruction. Likewise, when the processor performs an