Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

10-10 Vol. 3A
MEMORY CACHE CONTROL
completely full WC buffer will always be propagated as a single 32-bit burst transac-
tion using any chunk order. In a WC buffer eviction where data will be evicted as
partials, all data contained in the same chunk (0 mod 8 aligned) will be propagated
simultaneously. Likewise, for more recent processors starting with those based on
Intel NetBurst microarchitectures, a full WC buffer will always be propagated as a
single burst transactions, using any chunk order within a transaction. For partial
buffer propagations, all data contained in the same chunk will be propagated simul-
taneously.
10.3.2 Choosing a Memory Type
The simplest system memory model does not use memory-mapped I/O with read or
write side effects, does not include a frame buffer, and uses the write-back memory
type for all memory. An I/O agent can perform direct memory access (DMA) to write-
back memory and the cache protocol maintains cache coherency.
A system can use strong uncacheable memory for other memory-mapped I/O, and
should always use strong uncacheable memory for memory-mapped I/O with read
side effects.
Dual-ported memory can be considered a write side effect, making relatively prompt
writes desirable, because those writes cannot be observed at the other port until they
reach the memory agent. A system can use strong uncacheable, uncacheable, write-
through, or write-combining memory for frame buffers or dual-ported memory that
contains pixel values displayed on a screen. Frame buffer memory is typically large (a
few megabytes) and is usually written more than it is read by the processor. Using
strong uncacheable memory for a frame buffer generates very large amounts of bus
traffic, because operations on the entire buffer are implemented using partial writes
rather than line writes. Using write-through memory for a frame buffer can displace
almost all other useful cached lines in the processor's L2 and L3 caches and L1 data
cache. Therefore, systems should use write-combining memory for frame buffers
whenever possible.
Software can use page-level cache control, to assign appropriate effective memory
types when software will not access data structures in ways that benefit from write-
back caching. For example, software may read a large data structure once and not
access the structure again until the structure is rewritten by another agent. Such a
large data structure should be marked as uncacheable, or reading it will evict cached
lines that the processor will be referencing again.
A similar example would be a write-only data structure that is written to (to export
the data to another agent), but never read by software. Such a structure can be
marked as uncacheable, because software never reads the values that it writes
(though as uncacheable memory, it will be written using partial writes, while as
write-back memory, it will be written using line writes, which may not occur until the
other agent reads the structure and triggers implicit write-backs).
On the Pentium III, Pentium 4, and more recent processors, new instructions are
provided that give software greater control over the caching, prefetching, and the