Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

ManualsBrandsIntel ManualsOtherIntel Pentium 4 Processor 2.80 GHz, 512K Cache, 533 MHz FSB

331

332

333

334

335

336

337

338

339

340

Vol. 1 11-33

PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)

arithmetic operation on the data in an XMM register, it does not check that the data

being operated on matches the data type specified in the instruction.

As a general rule, because data typing of SIMD floating-point and integer data types

is not enforced at the architectural level, it is the responsibility of the programmer,

assembler, or compiler to insure that code enforces data typing. Failure to enforce

correct data typing can lead to computations that return unexpected results.

For example, in the following code sample, two packed single-precision floating-point

operands are moved from memory into XMM registers (using MOVAPS instructions);

then a double-precision packed add operation (using the ADDPD instruction) is

performed on the operands:

movaps xmm0, [eax] ; EAX register contains pointer to packed

; single-precision floating-point operand

movaps xmm1, [ebx]

addpd xmm0, xmm1

Pentium 4 and Intel Xeon processors execute these instructions without generating

an invalid-operand exception (#UD) and will produce the expected results in register

XMM0 (that is, the high and low 64-bits of each register will be treated as a double-

precision floating-point value and the processor will operate on them accordingly).

Because the data types operated on and the data type expected by the ADDPD

instruction were inconsistent, the instruction may result in a SIMD floating-point

exception (such as numeric overflow [#O] or invalid operation [#I]) being gener-

ated, but the actual source of the problem (inconsistent data types) is not detected.

The ability to operate on an operand that contains a data type that is inconsistent

with the typing of the instruction being executed, permits some valid operations to be

performed. For example, the following instructions load a packed double-precision

floating-point operand from memory to register XMM0, and a mask to register

XMM1; then they use XORPD to toggle the sign bits of the two packed values in

movapd xmm0, [eax] ; EAX register contains pointer to packed

; double-precision floating-point operand

movaps xmm1, [ebx] ; EBX register contains pointer to packed

; double-precision floating-point mask

xorpd xmm0, xmm1 ; XOR operation toggles sign bits using

; the mask in xmm1

In this example: XORPS or PXOR can be used in place of XORPD and yield the same

correct result. However, because of the type mismatch between the operand data

type and the instruction data type, a latency penalty will be incurred due to imple-

mentations of the instructions at the microarchitecture level.