Intel 64 and IA-32 Architectures Software Developers Manual Volume 3A, System Programming Guide, Part 1
7-56 Vol. 3A
MULTIPLE-PROCESSOR MANAGEMENT
• Timing loops cause problems when they are calibrated on a IA-32 processor
running at one clock speed and then executed on a processor running at another
clock speed.
• Routines for calibrating execution-based timing loops produce unpredictable
results when run on an IA-32 processor supporting Hyper-Threading Technology.
This is due to the sharing of execution resources between the logical processors
within a physical package.
To avoid the problems described, timing loop routines must use a timing mechanism
for the loop that does not depend on the execution speed of the logical processors in
the system. The following sources are generally available:
• A high resolution system timer (for example, an Intel 8254).
• A high resolution timer within the processor (such as, the local APIC timer or the
time-stamp counter).
For additional information, see the Intel® 64 and IA-32 Architectures Optimization
Reference Manual.
7.11.6.7 Place Locks and Semaphores in Aligned, 128-Byte Blocks of
Memory
When software uses locks or semaphores to synchronize processes, threads, or other
code sections; Intel recommends that only one lock or semaphore be present within
a cache line (or 128 byte sector, if 128-byte sector is supported). In processors based
on Intel NetBurst microarchitecture (which support 128-byte sector consisting of two
cache lines), following this recommendation means that each lock or semaphore
should be contained in a 128-byte block of memory that begins on a 128-byte
boundary. The practice minimizes the bus traffic required to service locks.