Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

11-2 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
Modifications to existing IA-32 instructions to support SSE2 features:
Extensions and modifications to the CPUID instruction
Modifications to the RDPMC instruction
These new features extend the IA-32 architecture’s SIMD programming model in
three important ways:
They provide the ability to perform SIMD operations on pairs of packed double-
precision floating-point values. This permits higher precision computations to be
carried out in XMM registers, which enhances processor performance in scientific
and engineering applications and in applications that use advanced 3-D geometry
techniques (such as ray tracing). Additional flexibility is provided with instruc-
tions that operate on single (scalar) double-precision floating-point values
located in the low quadword of an XMM register.
They provide the ability to operate on 128-bit packed integers (bytes, words,
doublewords, and quadwords) in XMM registers. This provides greater flexibility
and greater throughput when performing SIMD operations on packed integers.
The capability is particularly useful for applications such as RSA authentication
and RC5 encryption. Using the full set of SIMD registers, data types, and instruc-
tions provided with the MMX technology and SSE/SSE2 extensions, programmers
can develop algorithms that finely mix packed single- and double-precision
floating-point data and 64- and 128-bit packed integer data.
SSE2 extensions enhance the support introduced with SSE extensions for
controlling the cacheability of SIMD data. SSE2 cache control instructions provide
the ability to stream data in and out of the XMM registers without polluting the
caches and the ability to prefetch data before it is actually used.
SSE2 extensions are fully compatible with all software written for IA-32 processors.
All existing software continues to run correctly, without modification, on processors
that incorporate SSE2 extensions, as well as in the presence of applications that
incorporate these extensions. Enhancements to the CPUID instruction permit detec-
tion of the SSE2 extensions. Also, because the SSE2 extensions use the same regis-
ters as the SSE extensions, no new operating-system support is required for saving
and restoring program state during a context switch beyond that provided for the
SSE extensions.
SSE2 extensions are accessible from all IA-32 execution modes: protected mode,
real address mode, virtual 8086 mode.
The following sections in this chapter describe the programming environment for
SSE2 extensions including: the 128-bit XMM floating-point register set, data types,
and SSE2 instructions. It also describes exceptions that can be generated with the
SSE and SSE2 instructions and gives guidelines for writing applications with SSE and
SSE2 extensions.
For additional information about SSE2 extensions, see:
Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volumes
2A & 2B, provide a detailed description of individual SSE3 instructions.