Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

Vol. 1 5-25
INSTRUCTION SET SUMMARY
PMULUDQ Multiply packed unsigned doubleword integers
PADDQ Add packed quadword integers
PSUBQ Subtract packed quadword integers
PSHUFLW Shuffle packed low words
PSHUFHW Shuffle packed high words
PSHUFD Shuffle packed doublewords
PSLLDQ Shift double quadword left logical
PSRLDQ Shift double quadword right logical
PUNPCKHQDQ Unpack high quadwords
PUNPCKLQDQ Unpack low quadwords
5.6.4 SSE2 Cacheability Control and Ordering Instructions
SSE2 cacheability control instructions provide additional operations for caching of
non-temporal data when storing data from XMM registers to memory. LFENCE and
MFENCE provide additional control of instruction ordering on store operations.
CLFLUSH Flushes and invalidates a memory operand and its associated
cache line from all levels of the processor’s cache hierarchy
LFENCE Serializes load operations
MFENCE Serializes load and store operations
PAUSE Improves the performance of “spin-wait loops”
MASKMOVDQU
Non-temporal store of selected bytes from
an XMM register into
memory
MOVNTPD Non-temporal store of two packed double-precision floating-
point values from an XMM register into memory
MOVNTDQ Non-temporal store of double quadword from an XMM register
into memory
MOVNTI Non-temporal store of a doubleword from a general-purpose
register into memory
5.7 SSE3 INSTRUCTIONS
The SSE3 extensions offers 13 instructions that accelerate performance of Streaming
SIMD Extensions technology, Streaming SIMD Extensions 2 technology, and x87-FP
math capabilities. These instructions can be grouped into the following categories:
One x87FPU instruction used in integer conversion
One SIMD integer instruction that addresses unaligned data loads
Two SIMD floating-point packed ADD/SUB instructions
Four SIMD floating-point horizontal ADD/SUB instructions