Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

Vol. 3A 10-25
MEMORY CACHE CONTROL
CPUID instruction, before the modified instruction is executed, which will automati-
cally resynchronize the instruction cache and prefetch queue. (See Section 7.1.3,
“Handling Self- and Cross-Modifying Code, for more information about the use of
self-modifying code.)
For Intel486 processors, a write to an instruction in the cache will modify it in both
the cache and memory, but if the instruction was prefetched before the write, the old
version of the instruction could be the one executed. To prevent the old instruction
from being executed, flush the instruction prefetch unit by coding a jump instruction
immediately after any write that modifies an instruction.
10.7 IMPLICIT CACHING (PENTIUM 4, INTEL XEON,
AND P6 FAMILY PROCESSORS)
Implicit caching occurs when a memory element is made potentially cacheable,
although the element may never have been accessed in the normal von Neumann
sequence. Implicit caching occurs on the P6 and more recent processor families due
to aggressive prefetching, branch prediction, and TLB miss handling. Implicit caching
is an extension of the behavior of existing Intel386, Intel486, and Pentium processor
systems, since software running on these processor families also has not been able
to deterministically predict the behavior of instruction prefetch.
To avoid problems related to implicit caching, the operating system must explicitly
invalidate the cache when changes are made to cacheable data that the cache coher-
ency mechanism does not automatically handle. This includes writes to dual-ported
or physically aliased memory boards that are not detected by the snooping mecha-
nisms of the processor, and changes to page- table entries in memory.
The code in Example 10-1 shows the effect of implicit caching on page-table entries.
The linear address F000H points to physical location B000H (the page-table entry for
F000H contains the value B000H), and the page-table entry for linear address F000
is PTE_F000.
Example 10-1. Effect of Implicit Caching on Page-Table Entries
mov EAX, CR3; Invalidate the TLB
mov CR3, EAX; by copying CR3 to itself
mov PTE_F000, A000H; Change F000H to point to A000H
mov EBX, [F000H];
Because of speculative execution in the P6 and more recent processor families, the
last MOV instruction performed would place the value at physical location B000H into
EBX, rather than the value at the new physical address A000H. This situation is
remedied by placing a TLB invalidation between the load and the store.