Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

11-8 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
11.4.1.1 Data Movement Instructions
Data movement instructions move double-precision floating-point data between
XMM registers and between XMM registers and memory.
The MOVAPD (move aligned packed double-precision floating-point) instruction
transfers a 128-bit packed double-precision floating-point operand from memory to
an XMM register or vice versa, or between XMM registers. The memory address must
be aligned to a 16-byte boundary; if not, a general-protection exception (GP#) is
generated.
The MOVUPD (move unaligned packed double-precision floating-point) instruction
transfers a 128-bit packed double-precision floating-point operand from memory to
an XMM register or vice versa, or between XMM registers. Alignment of the memory
address is not required.
The MOVSD (move scalar double-precision floating-point) instruction transfers a
64-bit double-precision floating-point operand from memory to the low quadword of
an XMM register or vice versa, or between XMM registers. Alignment of the memory
address is not required, unless alignment checking is enabled.
The MOVHPD (move high packed double-precision floating-point) instruction trans-
fers a 64-bit double-precision floating-point operand from memory to the high quad-
word of an XMM register or vice versa. The low quadword of the register is left
unchanged. Alignment of the memory address is not required, unless alignment
checking is enabled.
The MOVLPD (move low packed double-precision floating-point) instruction transfers
a 64-bit double-precision floating-point operand from memory to the low quadword
of an XMM register or vice versa. The high quadword of the register is left unchanged.
Alignment of the memory address is not required, unless alignment checking is
enabled.
The MOVMSKPD (move packed double-precision floating-point mask) instruction
extracts the sign bit of each of the two packed double-precision floating-point
numbers in an XMM register and saves them in a general-purpose register. This 2-bit
value can then be used as a condition to perform branching.
11.4.1.2 SSE2 Arithmetic Instructions
SSE2 arithmetic instructions perform addition, subtraction, multiply, divide, square
root, and maximum/minimum operations on packed and scalar double-precision
floating-point values.
The ADDPD (add packed double-precision floating-point values) and SUBPD
(subtract packed double-precision floating-point values) instructions add and
subtract, respectively, two packed double-precision floating-point operands.
The ADDSD (add scalar double-precision floating-point values) and SUBSD (subtract
scalar double-precision floating-point values) instructions add and subtract, respec-
tively, the low double-precision floating-point values of two operands and stores the
result in the low quadword of the destination operand.