user manual

112 3DNow! and MMX Intra-Operand Swapping
AMD Athlon Processor x86 Code Optimization
22007E/0November 1999
Example:
PXOR MM2, MM2 ; 0 | 0
MOVD MM0, [ab] ; 0 0 | b a
MOVD MM1, [cd] ; 0 0 | d c
PUNPCKLWD MM0, MM2 ; 0 b | 0 a
PUNCPKLWD MM1, MM2 ; 0 d | 0 c
PMADDWD MM0, MM1 ; b*d | a*c
3DNow! and MMX Intra-Operand Swapping
AMD Athlon
Specific Code
If the swapping of MMX register halves is necessary, use the
PSWAPD instruction, which is a new AMD Athlon 3DNow! DSP
extension. Use of this instruction should only be for
AMD Athlon specific code. PSWAPD MMreg1, MMreg2
performs the following operation:
mmreg1[63:32] = mmreg2[31:0])
mmreg1[31:0] = mmreg2[63:32])
See the AMD Extensions to the 3DNow! and MMX Instruction Set
Manual, order #22466 for more usage information.
Blended Code Otherwise, for blended code, which needs to run well on
AMD-K6 and AMD Athlon family processors, the following code
is recommended:
Example 1 (Preferred, faster):
;MM1 = SWAP (MM0), MM0 destroyed
MOVQ MM1, MM0 ;make a copy
PUNPCKLDQ MM0, MM0 ;duplicate lower half
PUNPCKHDQ MM1, MM0 ;combine lower halves
Example 2 (Preferred, fast):
;MM1 = SWAP (MM0), MM0 preserved
MOVQ MM1, MM0 ;make a copy
PUNPCKHDQ MM1, MM1 ;duplicate upper half
PUNPCKLDQ MM1, MM0 ;combine upper halves
Both examples accomplish the swapping, but the first example
should be used if the original contents of the register do not
need to be preserved. The first example is faster due to the fact
that the MOVQ and PUNPCKLDQ instructions can execute in
parallel. The instructions in the second example are dependent
on one another and take longer to execute.