Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

Vol. 3A 7-15
MULTIPLE-PROCESSOR MANAGEMENT
The following instructions are memory ordering instructions, not serializing instruc-
tions. These drain the data memory subsystem. They do not effect the instruction
execution stream:
Non-privileged memory ordering instructionsSFENCE, LFENCE, and
MFENCE.
The SFENCE, LFENCE, and MFENCE instructions provide more granularity in control-
ling the serialization of memory loads and stores (see Section 7.2.4, “Strengthening
or Weakening the Memory Ordering Model”).
The following additional information is worth noting regarding serializing instruc-
tions:
The processor does not writeback the contents of modified data in its data cache
to external memory when it serializes instruction execution. Software can force
modified data to be written back by executing the WBINVD instruction, which is a
serializing instruction. It should be noted that frequent use of the WBINVD
instruction will seriously reduce system performance.
When an instruction is executed that enables or disables paging (that is, changes
the PG flag in control register CR0), the instruction should be followed by a jump
instruction. The target instruction of the jump instruction is fetched with the new
setting of the PG flag (that is, paging is enabled or disabled), but the jump
instruction itself is fetched with the previous setting. The Pentium 4, Intel Xeon,
and P6 family processors do not require the jump operation following the move to
register CR0 (because any use of the MOV instruction in a Pentium 4, Intel Xeon,
or P6 family processor to write to CR0 is completely serializing). However, to
maintain backwards and forward compatibility with code written to run on other
IA-32 processors, it is recommended that the jump operation be performed.
Whenever an instruction is executed to change the contents of CR3 while paging
is enabled, the next instruction is fetched using the translation tables that
correspond to the new value of CR3. Therefore the next instruction and the
sequentially following instructions should have a mapping based upon the new
value of CR3. (Global entries in the TLBs are not invalidated, see Section 10.9,
“Invalidating the Translation Lookaside Buffers (TLBs)”.)
The Pentium processor and more recent processor families use branch-prediction
techniques to improve performance by prefetching the destination of a branch
instruction before the branch instruction is executed. Consequently, instruction
execution is not deterministically serialized when a branch instruction is
executed.
7.5 MULTIPLE-PROCESSOR (MP) INITIALIZATION
The IA-32 architecture (beginning with the P6 family processors) defines a multiple-
processor (MP) initialization protocol called the Multiprocessor Specification Version
1.4. This specification defines the boot protocol to be used by IA-32 processors in