user manual

34 Select DirectPath Over VectorPath Instructions
AMD Athlon Processor x86 Code Optimization
22007E/0November 1999
Select DirectPath Over VectorPath Instructions
Use DirectPath instructions rather than VectorPath
instructions. DirectPath instructions are optimized for decode
and execute efficiently by minimizing the number of operations
per x86 instruction, which includes registerregister op
memory as well as registerregister op register forms of
instructions. Up to three DirectPath instructions can be
decoded per cycle. VectorPath instructions will block the
decoding of DirectPath instructions.
The very high majority of instructions used be a compiler has
been implemented as DirectPath instructions in the
AMD Athlon processor. Assembly writers must still take into
consideration the usage of DirectPath versus VectorPath
instructions.
See Appendix F, Instruction Dispatch and Execution
Resources on page 187 and Appendix G, DirectPath versus
VectorPath Instructions on page 219 for tables of DirectPath
and VectorPath instructions.
Load-Execute Instruction Usage
Use Load-Execute Integer Instructions
Most load-execute integer instructions are DirectPath
decodable and can be decoded at the rate of three per cycle.
Splitting a load-execute integer instruction into two separate
instructionsa load instruction and a reg, reg instruction
reduces decoding bandwidth and increases register pressure,
which results in lower performance. The split-instruction form
can be used to avoid scheduler stalls for longer executing
instructions and to explicitly schedule the load and execute
operations.
TOP
TOP