Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
Vol. 1 10-11
PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)
The MOVAPS (move aligned packed single-precision floating-point values) instruction
transfers a double quadword operand containing four packed single-precision
floating-point values from memory to an XMM register and vice versa, or between
XMM registers. The memory address must be aligned to a 16-byte boundary; other-
wise, a general-protection exception (#GP) is generated.
The MOVUPS (move unaligned packed single-precision, floating-point) instruction
performs the same operations as the MOVAPS instruction, except that 16-byte align-
ment of a memory address is not required.
The MOVSS (move scalar single-precision floating-point) instruction transfers a 32-
bit single-precision floating-point operand from memory to the low doubleword of an
XMM register and vice versa, or between XMM registers.
The MOVLPS (move low packed single-precision floating-point) instruction moves
two packed single-precision floating-point values from memory to the low quadword
of an XMM register and vice versa. The high quadword of the register is left
unchanged.
The MOVHPS (move high packed single-precision floating-point) instruction moves
two packed single-precision floating-point values from memory to the high quadword
of an XMM register and vice versa. The low quadword of the register is left
unchanged.
The MOVLHPS (move packed single-precision floating-point low to high) instruction
moves two packed single-precision floating-point values from the low quadword of
the source XMM register into the high quadword of the destination XMM register. The
low quadword of the destination register is left unchanged.
The MOVHLPS (move packed single-precision floating-point high to low) instruction
moves two packed single-precision floating-point values from the high quadword of
the source XMM register into the low quadword of the destination XMM register. The
high quadword of the destination register is left unchanged.
The MOVMSKPS (move packed single-precision floating-point mask) instruction
transfers the most significant bit of each of the four packed single-precision floating-
point numbers in an XMM register to a general-purpose register. This 4-bit value can
then be used as a condition to perform branching.
10.4.1.2 SSE Arithmetic Instructions
SSE arithmetic instructions perform addition, subtraction, multiply, divide, recip-
rocal, square root, reciprocal of square root, and maximum/minimum operations on
packed and scalar single-precision floating-point values.
The ADDPS (add packed single-precision floating-point values) and SUBPS (subtract
packed single-precision floating-point values) instructions add and subtract, respec-
tively, two packed single-precision floating-point operands.
The ADDSS (add scalar single-precision floating-point values) and SUBSS (subtract
scalar single-precision floating-point values) instructions add and subtract, respec-