user manual

Repeated String Instruction Usage 85
22007E/0November 1999 AMD Athlon Processor x86 Code Optimization
Ensure DF=0 (UP) Always make sure that DF = 0 (UP) (after execution of CLD) for
REP MOVS and REP STOS. DF = 1 (DOWN) is only needed for
certain cases of overlapping REP MOVS (for example, source
and destination overlap).
While string instructions with DF = 1 (DOWN) are slower, only
the overhead part of the cycle equation is larger and not the
throughput part. See Table 1, Latency of Repeated String
Instructions, on page 84 for additional latency numbers.
Align Source and
Destination with
Operand Size
For REP MOVS, make sure that both source and destination are
aligned with regard to the operand size. Handle the end case
separately, if necessary. If either source or destination cannot
be aligned, make the destination aligned and the source
misaligned. For REP STOS, make the destination aligned.
Inline REP String
with Low Counts
Expand REP string instructions into equivalent sequences of
simple x86 instructions, if the repeat count is constant and less
than eight. Use an inline sequence of loads and stores to
accomplish the move. Use a sequence of stores to emulate REP
STOS. This technique eliminates the setup overhead of REP
instructions and increases instruction throughput.
Use Loop for REP
String with Low
Variable Counts
If the repeated count is variable, but is likely less than eight,
use a simple loop to move/store the data. This technique avoids
the overhead of REP MOVS and REP STOS.
Using MOVQ and
MOVNTQ for Block
Copy/Fill
To fill or copy blocks of data that are larger than 512 bytes, or
where the destination is in uncacheable memory, it is
recommended to use the MMX instructions MOVQ/MOVNTQ
instead of REP STOS and REP MOVS in order to achieve
maximum performance. (See the guideline, Use MMX
Instructions for Block Copies and Block Fills on page 115.)