user manual

Contents vii
22007E/0November 1999 AMD Athlon Processor x86 Code Optimization
Signed Derivation for Algorithm, Multiplier, and
Shift Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
9 Floating-Point Optimizations 97
Ensure All FPU Data is Aligned . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Use Multiplies Rather than Divides . . . . . . . . . . . . . . . . . . . . . . . . . . 97
Use FFREEP Macro to Pop One Register from the FPU Stack . . . . 98
Floating-Point Compare Instructions . . . . . . . . . . . . . . . . . . . . . . . . . 98
Use the FXCH Instruction Rather than FST/FLD Pairs . . . . . . . . . . 99
Avoid Using Extended-Precision Data . . . . . . . . . . . . . . . . . . . . . . . . 99
Minimize Floating-Point-to-Integer Conversions . . . . . . . . . . . . . . . 100
Floating-Point Subexpression Elimination. . . . . . . . . . . . . . . . . . . . 103
Check Argument Range of Trigonometric Instructions
Efficiently . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Take Advantage of the FSINCOS Instruction . . . . . . . . . . . . . . . . . 105
10 3DNow!™ and MMX™ Optimizations 107
Use 3DNow! Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Use FEMMS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
Use 3DNow! Instructions for Fast Division . . . . . . . . . . . . . . . . . . . 108
Optimized 14-Bit Precision Divide . . . . . . . . . . . . . . . . . . . . . 108
Optimized Full 24-Bit Precision Divide . . . . . . . . . . . . . . . . . 108
Pipelined Pair of 24-Bit Precision Divides. . . . . . . . . . . . . . . 109
Newton-Raphson Reciprocal. . . . . . . . . . . . . . . . . . . . . . . . . . 109
Use 3DNow! Instructions for Fast Square Root and
Reciprocal Square Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
Optimized 15-Bit Precision Square Root . . . . . . . . . . . . . . . . 110
Optimized 24-Bit Precision Square Root . . . . . . . . . . . . . . . . 110
Newton-Raphson Reciprocal Square Root. . . . . . . . . . . . . . . 111
Use MMX PMADDWD Instruction to Perform
Two 32-Bit Multiplies in Parallel. . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3DNow! and MMX Intra-Operand Swapping . . . . . . . . . . . . . . . . . . 112