user manual

108 Use 3DNow! Instructions for Fast Division
AMD Athlon Processor x86 Code Optimization
22007E/0November 1999
FEMMS instruction is supported for backward compatibility
with AMD-K6 family processors, and is aliased to the EMMS
instruction.
3DNow! and MMX instructions are designed to be used
concurrently with no switching issues. Likewise, enhanced
3DNow! instructions can be used simultaneously with MMX
instructions. However, x87 and 3DNow! instructions share the
same architectural registers so there is no easy way to use them
concurrently without cleaning up the register file in between
using FEMMS/EMMS.
Use 3DNow! Instructions for Fast Division
3DNow! instructions can be used to compute a very fast, highly
accurate reciprocal or quotient.
Optimized 14-Bit Precision Divide
This divide operation executes with a total latency of seven
cycles, assuming that the program hides the latency of the first
MOVD/MOVQ instructions within preceding code.
Example:
MOVD MM0, [MEM] ; 0 | W
PFRCP MM0, MM0 ; 1/W | 1/W (approximate)
MOVQ MM2, [MEM] ; Y | X
PFMUL MM2, MM0 ; Y/W | X/W
Optimized Full 24-Bit Precision Divide
This divide operation executes with a total latency of 15 cycles,
assuming that the program hides the latency of the first
MOVD/MOVQ instructions within preceding code.
Example:
MOVD MM0, [W] ; 0 | W
PFRCP MM1, MM0 ; 1/W | 1/W (approximate)
PUNPCKLDQ MM0, MM0 ; W | W (MMX instr.)
PFRCPIT1 MM0, MM1 ; 1/W | 1/W (refine)
MOVQ MM2, [X_Y] ; Y | X
PFRCPIT2 MM0, MM1 ; 1/W | 1/W (final)
PFMUL MM2, MM0 : Y/W | X/W