Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
Vol. 1 11-17
PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)
11.4.4.1 FLUSH Cache Line
The CLFLUSH (flush cache line) instruction writes and invalidates the cache line asso-
ciated with a specified linear address. The invalidation is for all levels of the
processor’s cache hierarchy, and it is broadcast throughout the cache coherency
domain.
NOTE
CLFLUSH was introduced with the SSE2 extensions. However, the
instruction can be implemented in IA-32 processors that do not
implement the SSE2 extensions. Detect CLFLUSH using the feature
bit (if CPUID.01H:EDX.CLFSH[bit 19] = 1).
11.4.4.2 Cacheability Control Instructions
The following four instructions enable data from XMM and general-purpose registers
to be stored to memory using a non-temporal hint. The non-temporal hint directs the
processor to store data to memory without writing the data into the cache hierarchy
whenever this is possible. See Section 10.4.6.2, “Caching of Temporal vs. Non-
Temporal Data,” for more information about non-temporal stores and hints.
The MOVNTDQ (store double quadword using non-temporal hint) instruction stores
packed integer data from an XMM register to memory, using a non-temporal hint.
The MOVNTPD (store packed double-precision floating-point values using non-
temporal hint) instruction stores packed double-precision floating-point data from an
XMM register to memory, using a non-temporal hint.
The MOVNTI (store doubleword using non-temporal hint) instruction stores integer
data from a general-purpose register to memory, using a non-temporal hint.
The MASKMOVDQU (store selected bytes of double quadword) instruction stores
selected byte integers from an XMM register to memory, using a byte mask to selec-
tively write the individual bytes. The memory location does not need to be aligned on
a natural boundary. This instruction also uses a non-temporal hint.
11.4.4.3 Memory Ordering Instructions
SSE2 extensions introduce two new fence instructions (LFENCE and MFENCE) as
companions to the SFENCE instruction introduced with SSE extensions.
The LFENCE instruction establishes a memory fence for loads. It guarantees ordering
between two loads and prevents speculative loads from passing the load fence (that
is, no speculative loads are allowed until all loads specified before the load fence have
been carried out).
The MFENCE instruction combines the functions of LFENCE and SFENCE by estab-
lishing a memory fence for both loads and stores. It guarantees that all loads and
stores specified before the fence are globally observable prior to any loads or stores
being carried out after the fence.