Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

5-28 Vol. 1
INSTRUCTION SET SUMMARY
5.8 SUPPLEMENTAL STREAMING SIMD EXTENSIONS 3
(SSSE3) INSTRUCTIONS
SSSE3 provide 32 instructions (represented by 14 mnemonics) to accelerate compu-
tations on packed integers. These include:
Twelve instructions that perform horizontal addition or subtraction operations.
Six instructions that evaluate absolute values.
Two instructions that perform multiply and add operations and speed up the
evaluation of dot products.
Two instructions that accelerate packed-integer multiply operations and produce
integer values with scaling.
Two instructions that perform a byte-wise, in-place shuffle according to the
second shuffle control operand.
Six instructions that negate packed integers in the destination operand if the
signs of the corresponding element in the source operand is less than zero.
Two instructions that align data from the composite of two operands.
SSSE3 instructions can only be executed on Intel 64 and IA-32 processors that
support SSSE3 extensions. Support for these instructions can be detected with the
CPUID instruction. See the description of the CPUID instruction in Chapter 3,
“Instruction Set Reference, A-M,” of the Intel® 64 and IA-32 Architectures Software
Developer’s Manual, Volume 2A.
The sections that follow describe each subgroup.
5.8.1 Horizontal Addition/Subtraction
PHADDW Adds two adjacent, signed 16-bit integers horizontally from the
source and destination operands and packs the signed 16-bit
results to the destination operand.
PHADDSW Adds two adjacent, signed 16-bit integers horizontally from the
source and destination operands and packs the signed, satu-
rated 16-bit results to the destination operand.
PHADDD Adds two adjacent, signed 32-bit integers horizontally from the
source and destination operands and packs the signed 32-bit
results to the destination operand.
PHSUBW Performs horizontal subtraction on each adjacent pair of 16-bit
signed integers by subtracting the most significant word from
the least significant word of each pair in the source and destina-
tion operands. The signed 16-bit results are packed and written
to the destination operand.
PHSUBSW Performs horizontal subtraction on each adjacent pair of 16-bit
signed integers by subtracting the most significant word from
the least significant word of each pair in the source and destina-