User's Manual

Use the 3DNow! PREFETCH and PREFETCHW Instructions 47
22007E/0November 1999 AMD Athlon Processor x86 Code Optimization
PREFETCH/W versus
PREFETCHNTA/T0/T1
/T2
The PREFETCHNTA/T0/T1/T2 instructions in the MMX
extensions are processor implementation dependent. To
maintain compatibility with the 25 million AMD-K6
®
-2 and
AMD-K6-III processors already sold, use the 3DNow!
PREFETCH/W instructions instead of the various prefetch
flavors in the new MMX extensions.
PREFETCHW Usage Code that intends to modify the cache line brought in through
prefetching should use the PREFETCHW instruction. While
PREFETCHW works the same as a PREFETCH on the
AMD-K6-2 and AMD-K6-III processors, PREFETCHW gives a
hint to the AMD Athlon processor of an intent to modify the
cache line. The AMD Athlon processor will mark the cache line
being brought in by PREFETCHW as Modified. Using
PREFETCHW can save an additional 15-25 cycles compared to
a PREFETCH and the subsequent cache state change caused by
a write to the prefetched cache line.
Multiple Prefetches Programmers can initiate multiple outstanding prefetches on
the AMD Athlon processor. While the AMD-K6-2 and
AMD-K6-III processors can have only one outstanding prefetch,
the AMD Athlon processor can have up to six outstanding
prefetches. When all six buffers are filled by various memory
read requests, the processor will simply ignore any new
prefetch requests until a buffer frees up. Multiple prefetch
requests are essentially handled in-order. If data is needed first,
then that data should be prefetched first.
The example below shows how to initiate multiple prefetches
when traversing more than one array.
Example (Multiple Prefetches):
.CODE
.K3D
; original C code
;
; #define LARGE_NUM 65536
;
; double array_a[LARGE_NUM];
; double array b[LARGE_NUM];
; double array c[LARGE_NUM];
; int i;
;
; for (i = 0; i < LARGE_NUM; i++) {
; a[i] = b[i] * c[i]
; }