Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

Vol. 3A 10-5
MEMORY CACHE CONTROL
The processor’s caches are for the most part transparent to software. When enabled,
instructions and data flow through these caches without the need for explicit soft-
ware control. However, knowledge of the behavior of these caches may be useful in
optimizing software performance. For example, knowledge of cache dimensions and
replacement algorithms gives an indication of how large of a data structure can be
operated on at once without causing cache thrashing.
In multiprocessor systems, maintenance of cache consistency may, in rare circum-
stances, require intervention by system software. For these rare cases, the processor
provides privileged cache control instructions for use in flushing caches and forcing
memory ordering.
The Pentium III, Pentium 4, and Intel Xeon processors introduced several instructions
that software can use to improve the performance of the L1, L2, and L3 caches,
including the PREFETCHh and CLFLUSH instructions and the non-temporal move
instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD). The use of
these instructions are discussed in Section 10.5.5, “Cache Management Instruc-
tions.
10.2 CACHING TERMINOLOGY
IA-32 processors (beginning with the Pentium processor) and Intel 64 processors use
the MESI (modified, exclusive, shared, invalid) cache protocol to maintain consis-
tency with internal caches and caches in other processors (see Section 10.4, “Cache
Control Protocol”).
When the processor recognizes that an operand being read from memory is cache-
able, the processor reads an entire cache line into the appropriate cache (L1, L2, L3,
or all). This operation is called a cache line fill. If the memory location containing
that operand is still cached the next time the processor attempts to access the
operand, the processor can read the operand from the cache instead of going back to
memory. This operation is called a cache hit.
When the processor attempts to write an operand to a cacheable area of memory, it
first checks if a cache line for that memory location exists in the cache. If a valid
cache line does exist, the processor (depending on the write policy currently in force)
can write the operand into the cache instead of writing it out to system memory. This
operation is called a write hit. If a write misses the cache (that is, a valid cache line
is not present for area of memory being written to), the processor performs a cache
line fill, write allocation. Then it writes the operand into the cache line and
(depending on the write policy currently in force) can also write it out to memory. If
the operand is to be written out to memory, it is written first into the store buffer, and
then written from the store buffer to memory when the system bus is available.
(Note that for the Pentium processor, write misses do not result in a cache line fill;
they always result in a write to memory. For this processor, only read misses result in
cache line fills.)
When operating in an MP system, IA-32 processors (beginning with the Intel486
processor) and Intel 64 processors have the ability to snoop other processor’s