Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

Vol. 3A 7-9
MULTIPLE-PROCESSOR MANAGEMENT
7.2.2 Memory Ordering in P6 and More Recent Processor Families
The Intel Core 2 Duo, Intel Core Duo, Pentium 4, and P6 family processors also use a
processor-ordered memory ordering model that can be further defined as “write
ordered with store-buffer forwarding.” This model can be characterized as follows.
In a single-processor system for memory regions defined as write-back cacheable,
the following ordering rules apply:
1. Reads can be carried out speculatively and in any order.
2. Reads can pass buffered writes, but the processor is self-consistent.
3. Writes to memory are always carried out in program order, with the exception of
writes executed with the CLFLUSH instruction and streaming stores (writes)
executed with the non-temporal move instructions (MOVNTI, MOVNTQ,
MOVNTDQ, MOVNTPS, and MOVNTPD).
4. Writes can be buffered.
5. Writes are not performed speculatively; they are only performed for instructions
that have actually been retired.
6. Data from buffered writes can be forwarded to waiting reads within the processor.
7. Reads or writes cannot pass (be carried out ahead of) I/O instructions, locked
instructions, or serializing instructions.
8. Reads cannot pass LFENCE and MFENCE instructions.
9. Writes cannot pass SFENCE and MFENCE instructions.
The second rule allows a read to pass a write. However, if the write is to the same
memory location as the read, the processor’s internal “snooping” mechanism will
detect the conflict and update the cached read before the processor executes the
instruction that uses the value.
The sixth rule constitutes an exception to an otherwise write ordered model. Note
that the term “write ordered with store-buffer forwarding” (introduced at the begin-
ning of this section) refers to the combined effects of rules 2 and 6.
In a multiple-processor system, the following ordering rules apply:
Individual processors use the same ordering rules as in a single-processor
system.
Writes by a single processor are observed in the same order by all processors.
Writes from the individual processors on the system bus are NOT ordered with
respect to each other.
See the example in Figure 7-1. Consider three processors in a system and each
processor performs three writes, one to each of three defined locations (A, B, and C).
Individually, the processors perform the writes in the same program order, but
because of bus arbitration and other memory access mechanisms, the order that the
three processors write the individual memory locations can differ each time the