User's Manual

Vol. 3 8-23
MULTIPLE-PROCESSOR MANAGEMENT
as the XCHG instruction or the LOCK prefix to insure that a read-modify-write opera-
tion on memory is carried out atomically. Locking operations typically operate like
I/O operations in that they wait for all previous instructions to complete and for all
buffered writes to drain to memory (see
Section 8.1.2, “Bus Locking).
Program synchronization can also be carried out with serializing instructions (see
Section 8.3). These instructions are typically used at critical procedure or task
boundaries to force completion of all previous instructions before a jump to a new
section of code or a context switch occurs. Like the I/O and locking instructions, the
processor waits until all previous instructions have been completed and all buffered
writes have been drained to memory before executing the serializing instruction.
The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way
of insuring load and store memory ordering between routines that produce weakly-
ordered results and routines that consume that data. The functions of these instruc
-
tions are as follows:
SFENCE — Serializes all store (write) operations that occurred prior to the
SFENCE instruction in the program instruction stream, but does not affect load
operations.
LFENCE — Serializes all load (read) operations that occurred prior to the LFENCE
instruction in the program instruction stream, but does not affect store
operations.
MFENCE — Serializes all store and load operations that occurred prior to the
MFENCE instruction in the program instruction stream.
Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient
method of controlling memory ordering than the CPUID instruction.
The MTRRs were introduced in the P6 family processors to define the cache charac-
teristics for specified areas of physical memory. The following are two examples of
how memory types set up with MTRRs can be used strengthen or weaken memory
ordering for the Pentium 4, Intel Xeon, and P6 family processors:
The strong uncached (UC) memory type forces a strong-ordering model on
memory accesses. Here, all reads and writes to the UC memory region appear on
the bus and out-of-order or speculative accesses are not performed. This
memory type can be applied to an address range dedicated to memory mapped
I/O devices to force strong memory ordering.
For areas of memory where weak ordering is acceptable, the write back (WB)
memory type can be chosen. Here, reads can be performed speculatively and
writes can be buffered and combined. For this type of memory, cache locking is
performed on atomic (locked) operations that do not split across cache lines,
which helps to reduce the performance penalty associated with the use of the
typical synchronization instructions, such as XCHG, that lock the bus during the
entire read-modify-write operation. With the WB memory type, the XCHG
instruction locks the cache instead of the bus if the memory access is contained
within a cache line.