Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
Vol. 1 11-37
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
Temporal Data,” and Section 10.4.6.1, “Cacheability Control Instructions”). They
prevent non-temporal data from being written into processor caches on a store oper-
ation. These instructions are implementation specific. Programmers may have to
tune their applications for each IA-32 processor implementation to take advantage of
these instructions.
Besides reducing cache pollution, the use of weakly-ordered memory types can be
important under certain data sharing relationships, such as a producer-consumer
relationship. The use of weakly ordered memory can make the assembling of data
more efficient; but care must be taken to ensure that the consumer obtains the data
that the producer intended. Some common usage models that may be affected in this
way by weakly-ordered stores are:
• Library functions that use weakly ordered memory to write results
• Compiler-generated code that writes weakly-ordered results
• Hand-crafted code
The degree to which a consumer of data knows that the data is weakly ordered can
vary for these cases. As a result, the SFENCE or MFENCE instruction should be used
to ensure ordering between routines that produce weakly-ordered data and routines
that consume the data. SFENCE and MFENCE provide a performance-efficient way to
ensure ordering by guaranteeing that every store instruction that precedes
SFENCE/MFENCE in program order is globally visible before a store instruction that
follows the fence.
11.6.14 Effect of Instruction Prefixes on the SSE/SSE2 Instructions
Table 11-3 describes the effects of instruction prefixes on SSE and SSE2 instruc-
tions. (Table 11-3 also applies to SIMD integer and SIMD floating-point instructions
in SSE3.) Unpredictable behavior can range from prefixes being treated as a
reserved operation on one generation of IA-32 processors to generating an invalid
opcode exception on another generation of processors.
See also “Instruction Prefixes” in Chapter 2 of the Intel® 64 and IA-32 Architectures
Software Developer’s Manual, Volume 2A, for complete description of instruction
prefixes.
NOTE
Some SSE/SSE2/SSE3 instructions have two-byte opcodes that are
either 2 bytes or 3 bytes in length. Two-byte opcodes that are 3 bytes
in length consist of: a mandatory prefix (F2H, F3H, or 66H), 0FH, and
an opcode byte. See Table 11-3.