user manual

ManualsBrandsAMD ManualsTypewriterTypewriter x86

101

102

103

104

105

106

107

108

109

110

Repeated String Instruction Usage 85

22007E/0—November 1999 AMD Athlon™ Processor x86 Code Optimization

Ensure DF=0 (UP) Always make sure that DF = 0 (UP) (after execution of CLD) for

REP MOVS and REP STOS. DF = 1 (DOWN) is only needed for

certain cases of overlapping REP MOVS (for example, source

and destination overlap).

While string instructions with DF = 1 (DOWN) are slower, only

the overhead part of the cycle equation is larger and not the

throughput part. See Table 1, “Latency of Repeated String

Instructions,” on page 84 for additional latency numbers.

Align Source and

Destination with

Operand Size

For REP MOVS, make sure that both source and destination are

aligned with regard to the operand size. Handle the end case

separately, if necessary. If either source or destination cannot

be aligned, make the destination aligned and the source

misaligned. For REP STOS, make the destination aligned.

Inline REP String

with Low Counts

Expand REP string instructions into equivalent sequences of

simple x86 instructions, if the repeat count is constant and less

than eight. Use an inline sequence of loads and stores to

accomplish the move. Use a sequence of stores to emulate REP

STOS. This technique eliminates the setup overhead of REP

instructions and increases instruction throughput.

Use Loop for REP

String with Low

Variable Counts

If the repeated count is variable, but is likely less than eight,

use a simple loop to move/store the data. This technique avoids

the overhead of REP MOVS and REP STOS.

Using MOVQ and

MOVNTQ for Block

Copy/Fill

To fill or copy blocks of data that are larger than 512 bytes, or

where the destination is in uncacheable memory, it is

recommended to use the MMX instructions MOVQ/MOVNTQ

instead of REP STOS and REP MOVS in order to achieve

maximum performance. (See the guideline, “Use MMX™

Instructions for Block Copies and Block Fills” on page 115.)