User's Manual

Vol. 3 8-5
MULTIPLE-PROCESSOR MANAGEMENT
8.1.2.2 Software Controlled Bus Locking
To explicitly force the LOCK semantics, software can use the LOCK prefix with the
following instructions when they are used to modify a memory location. An invalid-
opcode exception (#UD) is generated when the LOCK prefix is used with any other
instruction or when no write operation is made to memory (that is, when the destina
-
tion operand is in a register).
The bit test and modify instructions (BTS, BTR, and BTC).
The exchange instructions (XADD, CMPXCHG, and CMPXCHG8B).
The LOCK prefix is automatically assumed for XCHG instruction.
The following single-operand arithmetic and logical instructions: INC, DEC, NOT,
and NEG.
The following two-operand arithmetic and logical instructions: ADD, ADC, SUB,
SBB, AND, OR, and XOR.
A locked instruction is guaranteed to lock only the area of memory defined by the
destination operand, but may be interpreted by the system as a lock for a larger
memory area.
Software should access semaphores (shared memory used for signalling between
multiple processors) using identical addresses and operand lengths. For example, if
one processor accesses a semaphore using a word access, other processors should
not access the semaphore using a byte access.
NOTE
Do not implement semaphores using the WC memory type. Do not
perform non-temporal stores to a cache line containing a location
used to implement a semaphore.
The integrity of a bus lock is not affected by the alignment of the memory field. The
LOCK semantics are followed for as many bus cycles as necessary to update the
entire operand. However, it is recommend that locked accesses be aligned on their
natural boundaries for better system performance:
Any boundary for an 8-bit access (locked or otherwise).
16-bit boundary for locked word accesses.
32-bit boundary for locked doubleword accesses.
64-bit boundary for locked quadword accesses.
Locked operations are atomic with respect to all other memory operations and all
externally visible events. Only instruction fetch and page table accesses can pass
locked instructions. Locked instructions can be used to synchronize data written by
one processor and read by another processor.
For the P6 family processors, locked operations serialize all outstanding load and
store operations (that is, wait for them to complete). This rule is also true for the
Pentium 4 and Intel Xeon processors, with one exception. Load operations that refer-