Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

10-26 Vol. 3A
MEMORY CACHE CONTROL
10.8 EXPLICIT CACHING
The Pentium III processor introduced four new instructions, the PREFETCHh instruc-
tions, that provide software with explicit control over the caching of data. These
instructions provide “hints” to the processor that the data requested by a PREFETCHh
instruction should be read into cache hierarchy now or as soon as possible, in antici-
pation of its use. The instructions provide different variations of the hint that allow
selection of the cache level into which data will be read.
The PREFETCHh instructions can help reduce the long latency typically associated
with reading data from memory and thus help prevent processor “stalls.” However,
these instructions should be used judiciously. Overuse can lead to resource conflicts
and hence reduce the performance of an application. Also, these instructions should
only be used to prefetch data from memory; they should not be used to prefetch
instructions. For more detailed information on the proper use of the prefetch instruc-
tion, refer to Chapter 7, “Optimizing Cache Usage,” in the Intel® 64 and IA-32 Archi-
tectures Optimization Reference Manual.
10.9 INVALIDATING THE TRANSLATION LOOKASIDE
BUFFERS (TLBS)
The processor updates its address translation caches (TLBs) transparently to soft-
ware. Several mechanisms are available, however, that allow software and hardware
to invalidate the TLBs either explicitly or as a side effect of another operation.
The INVLPG instruction invalidates the TLB for a specific page. This instruction is the
most efficient in cases where software only needs to invalidate a specific page,
because it improves performance over invalidating the whole TLB. This instruction is
not affected by the state of the G flag in a page-directory or page-table entry.
The following operations invalidate all TLB entries except global entries. (A global
entry is one for which the G (global) flag is set in its corresponding page-directory or
page-table entry. The global flag was introduced into the IA-32 architecture in the P6
family processors, see Section 10.5, “Cache Control”.)
Writing to control register CR3.
A task switch that changes control register CR3.
The following operations invalidate all TLB entries, irrespective of the setting of the
Gflag:
Asserting or de-asserting the FLUSH# pin.
(Pentium 4, Intel Xeon, and P6 family processors only.) Writing to an MTRR (with
a WRMSR instruction).
Writing to control register CR0 to modify the PG or PE flag.
(Pentium 4, Intel Xeon, and P6 family processors only.) Writing to control register
CR4 to modify the PSE, PGE, or PAE flag.