Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

ManualsBrandsIntel ManualsOtherIntel Pentium 4 Processor 2.80 GHz, 512K Cache, 533 MHz FSB

451

452

453

454

455

456

457

458

459

460

10-10 Vol. 3A

MEMORY CACHE CONTROL

completely full WC buffer will always be propagated as a single 32-bit burst transac-

tion using any chunk order. In a WC buffer eviction where data will be evicted as

partials, all data contained in the same chunk (0 mod 8 aligned) will be propagated

simultaneously. Likewise, for more recent processors starting with those based on

Intel NetBurst microarchitectures, a full WC buffer will always be propagated as a

single burst transactions, using any chunk order within a transaction. For partial

buffer propagations, all data contained in the same chunk will be propagated simul-

taneously.

10.3.2 Choosing a Memory Type

The simplest system memory model does not use memory-mapped I/O with read or

write side effects, does not include a frame buffer, and uses the write-back memory

type for all memory. An I/O agent can perform direct memory access (DMA) to write-

back memory and the cache protocol maintains cache coherency.

A system can use strong uncacheable memory for other memory-mapped I/O, and

should always use strong uncacheable memory for memory-mapped I/O with read

side effects.

Dual-ported memory can be considered a write side effect, making relatively prompt

writes desirable, because those writes cannot be observed at the other port until they

reach the memory agent. A system can use strong uncacheable, uncacheable, write-

through, or write-combining memory for frame buffers or dual-ported memory that

contains pixel values displayed on a screen. Frame buffer memory is typically large (a

few megabytes) and is usually written more than it is read by the processor. Using

strong uncacheable memory for a frame buffer generates very large amounts of bus

traffic, because operations on the entire buffer are implemented using partial writes

rather than line writes. Using write-through memory for a frame buffer can displace

almost all other useful cached lines in the processor's L2 and L3 caches and L1 data

cache. Therefore, systems should use write-combining memory for frame buffers

whenever possible.

Software can use page-level cache control, to assign appropriate effective memory

types when software will not access data structures in ways that benefit from write-

back caching. For example, software may read a large data structure once and not

access the structure again until the structure is rewritten by another agent. Such a

large data structure should be marked as uncacheable, or reading it will evict cached

lines that the processor will be referencing again.

A similar example would be a write-only data structure that is written to (to export

the data to another agent), but never read by software. Such a structure can be

marked as uncacheable, because software never reads the values that it writes

(though as uncacheable memory, it will be written using partial writes, while as

write-back memory, it will be written using line writes, which may not occur until the

other agent reads the structure and triggers implicit write-backs).

On the Pentium III, Pentium 4, and more recent processors, new instructions are

provided that give software greater control over the caching, prefetching, and the