Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
Vol. 1 11-33
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
arithmetic operation on the data in an XMM register, it does not check that the data
being operated on matches the data type specified in the instruction.
As a general rule, because data typing of SIMD floating-point and integer data types
is not enforced at the architectural level, it is the responsibility of the programmer,
assembler, or compiler to insure that code enforces data typing. Failure to enforce
correct data typing can lead to computations that return unexpected results.
For example, in the following code sample, two packed single-precision floating-point
operands are moved from memory into XMM registers (using MOVAPS instructions);
then a double-precision packed add operation (using the ADDPD instruction) is
performed on the operands:
movaps xmm0, [eax] ; EAX register contains pointer to packed
; single-precision floating-point operand
movaps xmm1, [ebx]
addpd xmm0, xmm1
Pentium 4 and Intel Xeon processors execute these instructions without generating
an invalid-operand exception (#UD) and will produce the expected results in register
XMM0 (that is, the high and low 64-bits of each register will be treated as a double-
precision floating-point value and the processor will operate on them accordingly).
Because the data types operated on and the data type expected by the ADDPD
instruction were inconsistent, the instruction may result in a SIMD floating-point
exception (such as numeric overflow [#O] or invalid operation [#I]) being gener-
ated, but the actual source of the problem (inconsistent data types) is not detected.
The ability to operate on an operand that contains a data type that is inconsistent
with the typing of the instruction being executed, permits some valid operations to be
performed. For example, the following instructions load a packed double-precision
floating-point operand from memory to register XMM0, and a mask to register
XMM1; then they use XORPD to toggle the sign bits of the two packed values in
register XMM0.
movapd xmm0, [eax] ; EAX register contains pointer to packed
; double-precision floating-point operand
movaps xmm1, [ebx] ; EBX register contains pointer to packed
; double-precision floating-point mask
xorpd xmm0, xmm1 ; XOR operation toggles sign bits using
; the mask in xmm1
In this example: XORPS or PXOR can be used in place of XORPD and yield the same
correct result. However, because of the type mismatch between the operand data
type and the instruction data type, a latency penalty will be incurred due to imple-
mentations of the instructions at the microarchitecture level.