Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

11-6 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
The address of a 128-bit packed memory operand must be aligned on a 16-byte
boundary, except in the following cases:
a MOVUPD instruction which supports unaligned accesses
scalar instructions that use an 8-byte memory operand that is not subject to
alignment requirements
Figure 4-2 shows the byte order of 128-bit (double quadword) and 64-bit (quad-
word) data types in memory.
11.4 SSE2 INSTRUCTIONS
The SSE2 instructions are divided into four functional groups:
Packed and scalar double-precision floating-point instructions
64-bit and 128-bit SIMD integer instructions
128-bit extensions of SIMD integer instructions introduced with the MMX
technology and the SSE extensions
Cacheability-control and instruction-ordering instructions
The following sections provide more information about each group.
11.4.1 Packed and Scalar Double-Precision Floating-Point
Instructions
The packed and scalar double-precision floating-point instructions are divided into
the following sub-groups:
Data movement instructions
Arithmetic instructions
Comparison instructions
Conversion instructions
Logical instructions
Shuffle instructions
The packed double-precision floating-point instructions perform SIMD operations
similarly to the packed single-precision floating-point instructions (see Figure 11-3).
Each source operand contains two double-precision floating-point values, and the
destination operand contains the results of the operation (OP) performed in parallel
on the corresponding values (X0 and Y0, and X1 and Y1) in each operand.