Intel 64 and IA-32 Architectures Software Developers Manual Volume 1, Basic Architecture

ManualsBrandsIntel ManualsOtherIntel Pentium 4 Processor 2.80 GHz, 512K Cache, 533 MHz FSB

311

312

313

314

315

316

317

318

319

320

Vol. 1 11-17

PROGRAMMING WITH STREAMING SIMD EXTENSIONS 2 (SSE2)

11.4.4.1 FLUSH Cache Line

The CLFLUSH (flush cache line) instruction writes and invalidates the cache line asso-

ciated with a specified linear address. The invalidation is for all levels of the

processor’s cache hierarchy, and it is broadcast throughout the cache coherency

domain.

NOTE

CLFLUSH was introduced with the SSE2 extensions. However, the

instruction can be implemented in IA-32 processors that do not

implement the SSE2 extensions. Detect CLFLUSH using the feature

bit (if CPUID.01H:EDX.CLFSH[bit 19] = 1).

11.4.4.2 Cacheability Control Instructions

The following four instructions enable data from XMM and general-purpose registers

to be stored to memory using a non-temporal hint. The non-temporal hint directs the

processor to store data to memory without writing the data into the cache hierarchy

whenever this is possible. See Section 10.4.6.2, “Caching of Temporal vs. Non-

Temporal Data,” for more information about non-temporal stores and hints.

The MOVNTDQ (store double quadword using non-temporal hint) instruction stores

packed integer data from an XMM register to memory, using a non-temporal hint.

The MOVNTPD (store packed double-precision floating-point values using non-

temporal hint) instruction stores packed double-precision floating-point data from an

XMM register to memory, using a non-temporal hint.

The MOVNTI (store doubleword using non-temporal hint) instruction stores integer

data from a general-purpose register to memory, using a non-temporal hint.

The MASKMOVDQU (store selected bytes of double quadword) instruction stores

selected byte integers from an XMM register to memory, using a byte mask to selec-

tively write the individual bytes. The memory location does not need to be aligned on

a natural boundary. This instruction also uses a non-temporal hint.

11.4.4.3 Memory Ordering Instructions

SSE2 extensions introduce two new fence instructions (LFENCE and MFENCE) as

companions to the SFENCE instruction introduced with SSE extensions.

The LFENCE instruction establishes a memory fence for loads. It guarantees ordering

between two loads and prevents speculative loads from passing the load fence (that

is, no speculative loads are allowed until all loads specified before the load fence have

been carried out).

The MFENCE instruction combines the functions of LFENCE and SFENCE by estab-

lishing a memory fence for both loads and stores. It guarantees that all loads and

stores specified before the fence are globally observable prior to any loads or stores

being carried out after the fence.