Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

10-10 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)
The scalar single-precision floating-point instructions operate on the low (least
significant) doublewords of the two source operands (X0 and Y0); see Figure 10-6.
The three most significant doublewords (X1, X2, and X3) of the first source operand
are passed through to the destination. The scalar operations are similar to the
floating-point operations performed in the x87 FPU data registers with the precision
control field in the x87 FPU control word set for single precision (24-bit significand),
except that x87 stack operations use a 15-bit exponent range for the result, while
SSE operations use an 8-bit exponent range.
10.4.1.1 SSE Data Movement Instructions
SSE data movement instructions move single-precision floating-point data between
XMM registers and between an XMM register and memory.
Figure 10-5. Packed Single-Precision Floating-Point Operation
Figure 10-6. Scalar Single-Precision Floating-Point Operation
X3
X2 X1 X0
Y3
Y2 Y1 Y0
X3 OP Y3 X2 OP Y2 X1 OP Y1 X0 OP Y0
OP
OP
OP
OP
X3
X2 X1 X0
Y3
Y2 Y1 Y0
X3 X2 X1 X0 OP Y0
OP