Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

ManualsBrandsIntel ManualsOtherIntel Pentium 4 Processor 2.80 GHz, 512K Cache, 533 MHz FSB

281

282

283

284

285

286

287

288

289

290

7-12 Vol. 3A

MULTIPLE-PROCESSOR MANAGEMENT

Memory mapped devices and other I/O devices on the bus are often sensitive to the

order of writes to their I/O buffers. I/O instructions can be used to (the IN and OUT

instructions) impose strong write ordering on such accesses as follows. Prior to

executing an I/O instruction, the processor waits for all previous instructions in the

program to complete and for all buffered writes to drain to memory. Only instruction

fetch and page tables walks can pass I/O instructions. Execution of subsequent

instructions do not begin until the processor determines that the I/O instruction has

been completed.

Synchronization mechanisms in multiple-processor systems may depend upon a

strong memory-ordering model. Here, a program can use a locking instruction such

as the XCHG instruction or the LOCK prefix to insure that a read-modify-write opera-

tion on memory is carried out atomically. Locking operations typically operate like

I/O operations in that they wait for all previous instructions to complete and for all

buffered writes to drain to memory (see Section 7.1.2, “Bus Locking”).

Program synchronization can also be carried out with serializing instructions (see

Section 7.4). These instructions are typically used at critical procedure or task

boundaries to force completion of all previous instructions before a jump to a new

section of code or a context switch occurs. Like the I/O and locking instructions, the

processor waits until all previous instructions have been completed and all buffered

writes have been drained to memory before executing the serializing instruction.

The SFENCE, LFENCE, and MFENCE instructions provide a performance-efficient way

of insuring load and store memory ordering between routines that produce weakly-

ordered results and routines that consume that data. The functions of these instruc-

tions are as follows:

• SFENCE — Serializes all store (write) operations that occurred prior to the

SFENCE instruction in the program instruction stream, but does not affect load

operations.

• LFENCE — Serializes all load (read) operations that occurred prior to the LFENCE

instruction in the program instruction stream, but does not affect store

operations.

• MFENCE — Serializes all store and load operations that occurred prior to the

MFENCE instruction in the program instruction stream.

Note that the SFENCE, LFENCE, and MFENCE instructions provide a more efficient

method of controlling memory ordering than the CPUID instruction.

The MTRRs were introduced in the P6 family processors to define the cache charac-

teristics for specified areas of physical memory. The following are two examples of

how memory types set up with MTRRs can be used strengthen or weaken memory

ordering for the Pentium 4, Intel Xeon, and P6 family processors:

• The strong uncached (UC) memory type forces a strong-ordering model on

memory accesses. Here, all reads and writes to the UC memory region appear on

the bus and out-of-order or speculative accesses are not performed. This

memory type can be applied to an address range dedicated to memory mapped

I/O devices to force strong memory ordering.