Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

Vol. 1 10-9
PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)
a scalar single-precision floating-point value into a doubleword integer (see
Figure 11-8).
SSE extensions provide conversion instructions between XMM registers and MMX
registers, and between XMM registers and general-purpose bit registers. See
Figure 11-8.
The address of a 128-bit packed memory operand must be aligned on a 16-byte
boundary, except in the following cases:
The MOVUPS instruction supports unaligned accesses.
Scalar instructions that use a 4-byte memory operand that is not subject to
alignment requirements.
Figure 4-2 shows the byte order of 128-bit (double quadword) data types in memory.
10.4 SSE INSTRUCTION SET
SSE instructions are divided into four functional groups
Packed and scalar single-precision floating-point instructions
64-bit SIMD integer instructions
State management instructions
Cacheability control, prefetch, and memory ordering instructions
The following sections give an overview of each of the instructions in these groups.
10.4.1 SSE Packed and Scalar Floating-Point Instructions
The packed and scalar single-precision floating-point instructions are divided into the
following subgroups:
Data movement instructions
Arithmetic instructions
Logical instructions
Comparison instructions
Shuffle instructions
Conversion instructions
The packed single-precision floating-point instructions perform SIMD operations on
packed single-precision floating-point operands (see Figure 10-5). Each source
operand contains four single-precision floating-point values, and the destination
operand contains the results of the operation (OP) performed in parallel on the corre-
sponding values (X0 and Y0, X1 and Y1, X2 and Y2, and X3 and Y3) in each operand.