Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

Vol. 1 12-11
PROGRAMMING WITH SSE3 AND SUPPLEMENTAL SSE3
There are six horizontal add instructions (represented by three mnemonics); three
operate on 128-bit operands
and three operate on 64-bit operands. The width of each data element is either 16
bits or 32 bits. The mnemonics are listed below.
PHADDW adds two adjacent, signed 16-bit integers horizontally from the source
and destination operands and packs the signed 16-bit results to the destination
operand.
PHADDSW adds two adjacent, signed 16-bit integers horizontally from the source
and destination operands and packs the signed, saturated 16-bit results to the
destination operand.
PHADDD adds two adjacent, signed 32-bit integers horizontally from the source
and destination operands and packs the signed 32-bit results to the destination
operand.
There are six horizontal subtract instructions (represented by three mnemonics);
three operate on 128-bit operands and three operate on 64-bit operands. The width
of each data element is either 16 bits or 32 bits. These are listed below.
PHSUBW performs horizontal subtraction on each adjacent pair of 16-bit signed
integers by subtracting the most significant word from the least significant word
of each pair in the source and destination operands. The signed 16-bit results are
packed and written to the destination operand.
PHSUBSW performs horizontal subtraction on each adjacent pair of 16-bit signed
integers by subtracting the most significant word from the least significant word
of each pair in the source and destination operands. The signed, saturated 16-bit
results are packed and written to the destination operand.
PHSUBD performs horizontal subtraction on each adjacent pair of 32-bit signed
integers by subtracting the most significant doubleword from the least significant
double word of each pair in the source and destination operands. The signed
32-bit results are packed and written to the destination operand.
12.6.2 Packed Absolute Values
There are six packed-absolute-value instructions (represented by three mnemonics).
Three operate on 128-bit operands and three operate on 64-bit operands. The widths
of data elements are 8 bits, 16 bits or 32 bits. The absolute value of each data
element of the source operand is stored as an UNSIGNED result in the destination
operand.
PABSB computes the absolute value of each signed byte data element.
PABSW computes the absolute value of each signed 16-bit data element.
PABSD computes the absolute value of each signed 32-bit data element.