Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
11-34 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
Latency penalties can also be incurred by using move instructions of the wrong type.
For example, MOVAPS and MOVAPD can both be used to move a packed single-preci-
sion operand from memory to an XMM register. However, if MOVAPD is used, a
latency penalty will be incurred when a correctly typed instruction attempts to use
the data in the register.
Note that these latency penalties are not incurred when moving data from XMM
registers to memory.
11.6.10 Interfacing with SSE/SSE2 Procedures and Functions
SSE and SSE2 extensions allow direct access to XMM registers. This means that all
existing interface conventions between procedures and functions that apply to the
use of the general-purpose registers (EAX, EBX, etc.) also apply to XMM register
usage.
11.6.10.1 Passing Parameters in XMM Registers
The state of XMM registers is preserved across procedure (or function) boundaries.
Parameters can be passed from one procedure to another using XMM registers.
11.6.10.2 Saving XMM Register State on a Procedure or Function Call
The state of XMM registers can be saved in two ways: using an FXSAVE instruction or
a move instruction. FXSAVE saves the state of all XMM registers (along with the state
of MXCSR and the x87 FPU registers). This instruction is typically used for major
changes in the context of the execution environment, such as a task switch. FXRSTOR
restores the XMM, MXCSR, and x87 FPU registers stored with FXSAVE.
In cases where only XMM registers must be saved, or where selected XMM registers
need to be saved, move instructions (MOVAPS, MOVUPS, MOVSS, MOVAPD,
MOVUPD, MOVSD, MOVDQA, and MOVDQU) can be used. These instructions can also
be used to restore the contents of XMM registers. To avoid performance degradation
when saving XMM registers to memory or when loading XMM registers from memory,
be sure to use the appropriately typed move instructions.
The move instructions can also be used to save the contents of XMM registers on the
stack. Here, the stack pointer (in the ESP register) can be used as the memory
address to the next available byte in the stack. Note that the stack pointer is not
automatically incremented when using a move instruction (as it is with PUSH).
A move-instruction procedure that saves the contents of an XMM register to the stack
is responsible for decrementing the value in the ESP register by 16. Likewise, a
move-instruction procedure that loads an XMM register from the stack needs also to
increment the ESP register by 16. To avoid performance degradation when moving
the contents of XMM registers, use the appropriately typed move instructions.