Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

12-2 Vol. 1
PROGRAMMING WITH SSE3 AND SUPPLEMENTAL SSE3
12.1.2 Compatibility of SSE3/SSSE3 with MMX Technology, the x87
FPU Environment, and SSE/SSE2 Extensions
SSE3/SSSE3 do not introduce any new state to the Intel 64 and IA-32 execution
environments.
For SIMD and x87 programming, the FXSAVE and FXRSTOR instructions save and
restore the architectural states of XMM, MXCSR, x87 FPU, and MMX registers. The
MONITOR and MWAIT instructions use general purpose registers on input, they do
not modify the content of those registers.
12.1.3 Horizontal and Asymmetric Processing
Many SSE/SSE2/SSE3/SSSE3 instructions accelerate SIMD data processing using a
model referred to as vertical computation. Using this model, data flow is vertical
between the data elements of the inputs and the output.
Figure 12-1 illustrates the asymmetric processing of the SSE3 instruction
ADDSUBPD. Figure 12-2 illustrates the horizontal data movement of the SSE3
instruction HADDPD.
Figure 12-1. Asymmetric Processing in ADDSUBPD
X1 X0
X1 + Y1 X0 -Y0
SUB
Y1 Y0
ADD