Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture
10-18 Vol. 1
PROGRAMMING WITH STREAMING SIMD EXTENSIONS (SSE)
10.4.6.1 Cacheability Control Instructions
The following three instructions enable data from the MMX and XMM registers to be
stored to memory using a non-temporal hint. The non-temporal hint directs the
processor to when possible store the data to memory without writing the data into
the cache hierarchy. See Section 10.4.6.2, “Caching of Temporal vs. Non-Temporal
Data,” for information about non-temporal stores and hints.
The MOVNTQ (store quadword using non-temporal hint) instruction stores packed
integer data from an MMX register to memory, using a non-temporal hint.
The MOVNTPS (store packed single-precision floating-point values using non-
temporal hint) instruction stores packed floating-point data from an XMM register to
memory, using a non-temporal hint.
The MASKMOVQ (store selected bytes of quadword) instruction stores selected byte
integers from an MMX register to memory, using a byte mask to selectively write the
individual bytes. This instruction also uses a non-temporal hint.
10.4.6.2 Caching of Temporal vs. Non-Temporal Data
Data referenced by a program can be temporal (data will be used again) or non-
temporal (data will be referenced once and not reused in the immediate future). For
example, program code is generally temporal, whereas, multimedia data, such as the
display list in a 3-D graphics application, is often non-temporal. To make efficient use
of the processor’s caches, it is generally desirable to cache temporal data and not
cache non-temporal data. Overloading the processor’s caches with non-temporal
data is sometimes referred to as “polluting the caches.” The SSE and SSE2 cache-
ability control instructions enable a program to write non-temporal data to memory
in a manner that minimizes pollution of caches.
These SSE and SSE2 non-temporal store instructions minimize cache pollutions by
treating the memory being accessed as the write combining (WC) type. If a program
specifies a non-temporal store with one of these instructions and the destination
region is mapped as cacheable memory (write back [WB], write through [WT] or WC
memory type), the processor will do the following:
• If the memory location being written to is present in the cache hierarchy, the data
in the caches is evicted.
• The non-temporal data is written to memory with WC semantics.
See also: Chapter 10, “Memory Cache Control,” in the Intel® 64 and IA-32 Architec-
tures Software Developer’s Manual, Volume 3A.
Using the WC semantics, the store transaction will be weakly ordered, meaning that
the data may not be written to memory in program order, and the store will not write
allocate (that is, the processor will not fetch the corresponding cache line into the
cache hierarchy, prior to performing the store). Also, different processor implemen-
tations may choose to collapse and combine these stores.