Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

11-16 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
The PSHUFD (shuffle packed doubleword integers) instruction shuffles the double-
word integers packed into the source operand and stores the shuffled result in the
destination operand. An 8-bit immediate operand specifies the shuffle order.
The PSLLDQ (shift double quadword left logical) instruction shifts the contents of the
source operand to the left by the amount of bytes specified by an immediate
operand. The empty low-order bytes are cleared (set to 0).
The PSRLDQ (shift double quadword right logical) instruction shifts the contents of
the source operand to the right by the amount of bytes specified by an immediate
operand. The empty high-order bytes are cleared (set to 0).
The PUNPCKHQDQ (Unpack high quadwords) instruction interleaves the high quad-
word of the source operand and the high quadword of the destination operand and
writes them to the destination register.
The PUNPCKLQDQ (Unpack low quadwords) instruction interleaves the low quad-
words of the source operand and the low quadwords of the destination operand and
writes them to the destination register.
Two additional SSE instructions enable data movement from the MMX registers to the
XMM registers.
The MOVQ2DQ (move quadword integer from MMX to XMM registers) instruction
moves the quadword integer from an MMX source register to an XMM destination
register.
The MOVDQ2Q (move quadword integer from XMM to MMX registers) instruction
moves the low quadword integer from an XMM source register to an MMX destination
register.
11.4.3 128-Bit SIMD Integer Instruction Extensions
All of 64-bit SIMD integer instructions introduced with MMX technology and SSE
extensions (with the exception of the PSHUFW instruction) have been extended by
SSE2 extensions to operate on 128-bit packed integer operands located in XMM
registers. The 128-bit versions of these instructions follow the same SIMD conven-
tions regarding packed operands as the 64-bit versions. For example, where the
64-bit version of the PADDB instruction operates on 8 packed bytes, the 128-bit
version operates on 16 packed bytes.
11.4.4 Cacheability Control and Memory Ordering Instructions
SSE2 extensions that give programs more control over the caching, loading, and
storing of data. are described below.