Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
Vol. 1 5-29
INSTRUCTION SET SUMMARY
tion operands. The signed, saturated 16-bit results are packed
and written to the destination operand.
PHSUBD Performs horizontal subtraction on each adjacent pair of 32-bit
signed integers by subtracting the most significant doubleword
from the least significant double word of each pair in the source
and destination operands. The signed 32-bit results are packed
and written to the destination operand.
5.8.2 Packed Absolute Values
PABSB Computes the absolute value of each signed byte data element.
PABSW Computes the absolute value of each signed 16-bit data
element.
PABSD Computes the absolute value of each signed 32-bit data
element.
5.8.3 Multiply and Add Packed Signed and Unsigned Bytes
PMADDUBSW Multiplies each unsigned byte value with the corresponding
signed byte value to produce an intermediate, 16-bit signed
integer. Each adjacent pair of 16-bit signed values are added
horizontally. The signed, saturated 16-bit results are packed to
the destination operand.
5.8.4 Packed Multiply High with Round and Scale
PMULHRSW Multiplies vertically each signed 16-bit integer from the destina-
tion operand with the corresponding signed 16-bit integer of the
source operand, producing intermediate, signed 32-bit integers.
Each intermediate 32-bit integer is truncated to the 18 most
significant bits. Rounding is always performed by adding 1 to the
least significant bit of the 18-bit intermediate result. The final
result is obtained by selecting the 16 bits immediately to the
right of the most significant bit of each 18-bit intermediate
result and packed to the destination operand.
5.8.5 Packed Shuffle Bytes
PSHUFB Permutes each byte in place, according to a shuffle control
mask. The least significant three or four bits of each shuffle
control byte of the control mask form the shuffle index. The
shuffle mask is unaffected. If the most significant bit (bit 7) of a
shuffle control byte is set, the constant zero is written in the
result byte.