User's Manual

22007E/0November 1999 AMD Athlon Processor x86 Code Optimization
Schedule Instructions According to their Latency 67
7
Scheduling Optimizations
This chapter describes how to code instructions for efficient
scheduling. Guidelines are listed in order of importance.
Schedule Instructions According to their Latency
The AMD Athlon processor can execute up to three x86
instructions per cycle, with each x86 instruction possibly having
a different latency. The AMD Athlon processor has flexible
scheduling, but for absolute maximum performance, schedule
instructions, especially FPU and 3DNow! instructions,
according to their latency. Dependent instructions will then not
have to wait on instructions with longer latencies.
See Appendix F, Instruction Dispatch and Execution
Resources on page 187 for a list of latency numbers.
Unrolling Loops
Complete Loop Unrolling
Make use of the large AMD Athlon processor 64-Kbyte
instruction cache and unroll loops to get more parallelism and
reduce loop overhead, even with branch prediction. Complete