Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

5-26 Vol. 1
INSTRUCTION SET SUMMARY
Three SIMD floating-point LOAD/MOVE/DUPLICATE instructions
Two thread synchronization instructions
SSE3 instructions can only be executed on Intel 64 and IA-32 processors that
support SSE3 extensions. Support for these instructions can be detected with the
CPUID instruction. See the description of the CPUID instruction in Chapter 3,
“Instruction Set Reference, A-M,” of the Intel® 64 and IA-32 Architectures Software
Developer’s Manual, Volume 2A.
The sections that follow describe each subgroup.
5.7.1 SSE3 x87-FP Integer Conversion Instruction
FISTTP Behaves like the FISTP instruction but uses truncation, irrespec-
tive of the rounding mode specified in the floating-point control
word (FCW)
5.7.2 SSE3 Specialized 128-bit Unaligned Data Load Instruction
LDDQU Special 128-bit unaligned load designed to avoid cache line
splits
5.7.3 SSE3 SIMD Floating-Point Packed ADD/SUB Instructions
ADDSUBPS Performs single-precision addition on the second and fourth
pairs of 32-bit data elements within the operands; single-preci-
sion subtraction on the first and third pairs
ADDSUBPD Performs double-precision addition on the second pair of quad-
words, and double-precision subtraction on the first pair
5.7.4 SSE3 SIMD Floating-Point Horizontal ADD/SUB Instructions
HADDPS Performs a single-precision addition on contiguous data
elements. The first data element of the result is obtained by
adding the first and second elements of the first operand; the
second element by adding the third and fourth elements of the
first operand; the third by adding the first and second elements
of the second operand; and the fourth by adding the third and
fourth elements of the second operand.
HSUBPS Performs a single-precision subtraction on contiguous data
elements. The first data element of the result is obtained by
subtracting the second element of the first operand from the
first element of the first operand; the second element by
subtracting the fourth element of the first operand from the third
element of the first operand; the third by subtracting the second