Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

7-12 Vol. 3A
MULTIPLE-PROCESSOR MANAGEMENT
Memory mapped devices and other I/O devices on the bus are often sensitive to the
order of writes to their I/O buffers. I/O instructions can be used to (the IN and OUT
instructions) impose strong write ordering on such accesses as follows. Prior to
executing an I/O instruction, the processor waits for all previous instructions in the
program to complete and for all buffered writes to drain to memory. Only instruction
fetch and page tables walks can pass I/O instructions. Execution of subsequent
instructions do not begin until the processor determines that the I/O instruction has
been completed.
Synchronization mechanisms in multiple-processor systems may depend upon a
strong memory-ordering model. Here, a program can use a locking instruction such
as the XCHG instruction or the LOCK prefix to insure that a read-modify-write opera-
tion on memory is carried out atomically. Locking operations typically operate like
I/O operations in that they wait for all previous instructions to complete and for all
buffered writes to drain to memory (see Section 7.1.2, “Bus Locking”).
Program synchronization can also be carried out with serializing instructions (see
Section 7.4). These instructions are typically used at critical procedure or task
boundaries to force completion of all previous instructions before a jump to a new
section of code or a context switch occurs. Like the I/O and locking instructions, the
processor waits until all previous instructions have been completed and all buffered
writes have been drained to memory before executing the serializing instruction.
The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way
of insuring load and store memory ordering between routines that produce weakly-
ordered results and routines that consume that data. The functions of these instruc-
tions are as follows:
SFENCE — Serializes all store (write) operations that occurred prior to the
SFENCE instruction in the program instruction stream, but does not affect load
operations.
LFENCE — Serializes all load (read) operations that occurred prior to the LFENCE
instruction in the program instruction stream, but does not affect store
operations.
MFENCE — Serializes all store and load operations that occurred prior to the
MFENCE instruction in the program instruction stream.
Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient
method of controlling memory ordering than the CPUID instruction.
The MTRRs were introduced in the P6 family processors to define the cache charac-
teristics for specified areas of physical memory. The following are two examples of
how memory types set up with MTRRs can be used strengthen or weaken memory
ordering for the Pentium 4, Intel Xeon, and P6 family processors:
The strong uncached (UC) memory type forces a strong-ordering model on
memory accesses. Here, all reads and writes to the UC memory region appear on
the bus and out-of-order or speculative accesses are not performed. This
memory type can be applied to an address range dedicated to memory mapped
I/O devices to force strong memory ordering.