Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1

17-40 Vol. 3A
ARCHITECTURE COMPATIBILITY
A general-protection exception (#GP) if the segment is a data segment (that is,
if the CS, DS, ES, FS, or GS register is being used to address the segment).
A stack-fault exception (#SS) if the segment is a stack segment (that is, if the SS
register is being used).
An exception to this behavior occurs when a stack access is data aligned, and the
stack pointer is pointing to the last aligned piece of data that size at the top of the
stack (ESP is FFFFFFFCH). When this data is popped, no segment limit violation
occurs and the stack pointer will wrap around to 0.
The address space of the P6 family, Pentium, and Intel486 processors may wrap-
around at 1 MByte in real-address mode. An external A20M# pin forces wraparound
if enabled. On Intel 8086 processors, it is possible to specify addresses greater than
1 MByte. For example, with a selector value FFFFH and an offset of FFFFH, the effec-
tive address would be 10FFEFH (1 MByte plus 65519 bytes). The 8086 processor,
which can form addresses up to 20 bits long, truncates the uppermost bit, which
“wraps” this address to FFEFH. However, the P6 family, Pentium, and Intel486
processors do not truncate this bit if A20M# is not enabled.
If a stack operation wraps around the address limit, shutdown occurs. (The 8086
processor does not have a shutdown mode or a limit.)
The behavior when executing near the limit of a 4-GByte selector (limit=0xFFFFFFFF)
is different between the Pentium Pro and the Pentium 4 family of processors. On the
Pentium Pro, instructions which cross the limit -- for example, a two byte instruction
such as INC EAX that is encoded as 0xFF 0xC0 starting exactly at the limit faults for
a segment violation (a one byte instruction at 0xFFFFFFFF does not cause an excep-
tion). Using the Pentium 4 microprocessor family, neither of these situations causes
a fault.
17.33 STORE BUFFERS AND MEMORY ORDERING
The Pentium 4, Intel Xeon, and P6 family processors provide a store buffer for
temporary storage of writes (stores) to memory (see Section 10.10, “Store Buffer”).
Writes stored in the store buffer(s) are always written to memory in program order,
with the exception of “fast string” store operations (see Section 7.2.3, “Out-of-Order
Stores For String Operations”).
The Pentium processor has two store buffers, one corresponding to each of the pipe-
lines. Writes in these buffers are always written to memory in the order they were
generated by the processor core.
It should be noted that only memory writes are buffered and I/O writes are not. The
Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors do not synchro-
nize the completion of memory writes on the bus and instruction execution after a
write. An I/O, locked, or serializing instruction needs to be executed to synchronize
writes with the next instruction (see Section 7.4, “Serializing Instructions”).