Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
Vol. 1 11-35
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
Use the LDMXCSR and STMXCSR instructions to save and restore, respectively, the
contents of the MXCSR register on a procedure call and return.
11.6.10.3 Caller-Save Requirement for Procedure and Function Calls
When making procedure (or function) calls from SSE or SSE2 code, a caller-save
convention is recommended for saving the state of the calling procedure. Using this
convention, any register whose content must survive intact across a procedure call
must be stored in memory by the calling procedure prior to executing the call.
The primary reason for using the caller-save convention is to prevent performance
degradation. XMM registers can contain packed or scalar double-precision floating-
point, packed single-precision floating-point, and 128-bit packed integer data types.
The called procedure has no way of knowing the data types in XMM registers
following a call; so it is unlikely to use the correctly typed move instruction to store
the contents of XMM registers in memory or to restore the contents of XMM registers
from memory.
As described in Section 11.6.9, “Mixing Packed and Scalar Floating-Point and 128-Bit
SIMD Integer Instructions and Data,” executing a move instruction that does not
match the type for the data being moved to/from XMM registers will be carried out
correctly, but can lead to a greater instruction latency.
11.6.11 Updating Existing MMX Technology Routines
Using 128-Bit SIMD Integer Instructions
SSE2 extensions extend all 64-bit MMX SIMD integer instructions to operate on 128-
bit SIMD integers using XMM registers. The extended 128-bit SIMD integer instruc-
tions operate like the 64-bit SIMD integer instructions; this simplifies the porting of
MMX technology applications. However, there are considerations:
• To take advantage of wider 128-bit SIMD integer instructions, MMX technology
code must be recompiled to reference the XMM registers instead of MMX
registers.
• Computation instructions that reference memory operands that are not aligned
on 16-byte boundaries should be replaced with an unaligned 128-bit load
(MOVUDQ instruction) followed by a version of the same computation operation
that uses register instead of memory operands. Use of 128-bit packed integer
computation instructions with memory operands that are not 16-byte aligned
results in a general protection exception (#GP).
• Extension of the PSHUFW instruction (shuffle word across 64-bit integer
operand) across a full 128-bit operand is emulated by a combination of the
following instructions: PSHUFHW, PSHUFLW, and PSHUFD.