Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

Vol. 3A 7-11
MULTIPLE-PROCESSOR MANAGEMENT
from an external perspective, the string in a cache line by cache line mode. This
results in the processor looping on issuing a cache-line read for the source address
and an invalidation on the external bus for the destination address, knowing that all
bytes in the destination cache line will be modified, for the length of the string. In this
mode interrupts will only be accepted by the processor on cache line boundaries. It is
possible in this mode that the destination line invalidations, and therefore stores, will
be issued on the external bus out of order.
Code dependent upon sequential store ordering should not use the string operations
for the entire data structure to be stored. Data and semaphores should be separated.
Order dependent code should use a discrete semaphore uniquely stored to after any
string operations to allow correctly ordered data to be seen by all processors.
Initial conditions for “fast string” operations:
EDI and ESI must be 8-byte aligned for the Pentium III processor. EDI must be 8-
byte aligned for the Pentium 4 processor.
String operation must be performed in ascending address order.
The initial operation counter (ECX) must be equal to or greater than 64.
Source and destination must not overlap by less than a cache line (64 bytes, for
Intel Core 2 Duo, Intel Core, Pentium M, and Pentium 4 processors; 32 bytes P6
family and Pentium processors).
The memory type for both source and destination addresses must be either WB
or WC.
7.2.4 Strengthening or Weakening the Memory Ordering Model
The Intel 64 and IA-32 architectures provide several mechanisms for strengthening
or weakening the memory ordering model to handle special programming situations.
These mechanisms include:
The I/O instructions, locking instructions, the LOCK prefix, and serializing
instructions force stronger ordering on the processor.
The SFENCE instruction (introduced to the IA-32 architecture in the Pentium III
processor) and the LFENCE and MFENCE instructions (introduced in the Pentium
4 processor) provide memory ordering and serialization capability for specific
types of memory operations.
The memory type range registers (MTRRs) can be used to strengthen or weaken
memory ordering for specific area of physical memory (see Section 10.11,
“Memory Type Range Registers (MTRRs)”). MTRRs are available only in the
Pentium 4, Intel Xeon, and P6 family processors.
The page attribute table (PAT) can be used to strengthen memory ordering for a
specific page or group of pages (see Section 10.12, “Page Attribute Table (PAT)”).
The PAT is available only in the Pentium 4, Intel Xeon, and Pentium III processors.
These mechanisms can be used as follows.