user manual
112 3DNow!™ and MMX™ Intra-Operand Swapping
AMD Athlon™ Processor x86 Code Optimization
22007E/0—November 1999
Example:
PXOR MM2, MM2 ; 0 | 0
MOVD MM0, [ab] ; 0 0 | b a
MOVD MM1, [cd] ; 0 0 | d c
PUNPCKLWD MM0, MM2 ; 0 b | 0 a
PUNCPKLWD MM1, MM2 ; 0 d | 0 c
PMADDWD MM0, MM1 ; b*d | a*c
3DNow!™ and MMX™ Intra-Operand Swapping
AMD Athlon™
Specific Code
If the swapping of MMX register halves is necessary, use the
PSWAPD instruction, which is a new AMD Athlon 3DNow! DSP
extension. Use of this instruction should only be for
AMD Athlon specific code. “PSWAPD MMreg1, MMreg2”
performs the following operation:
mmreg1[63:32] = mmreg2[31:0])
mmreg1[31:0] = mmreg2[63:32])
See the AMD Extensions to the 3DNow! and MMX Instruction Set
Manual, order #22466 for more usage information.
Blended Code Otherwise, for blended code, which needs to run well on
AMD-K6 and AMD Athlon family processors, the following code
is recommended:
Example 1 (Preferred, faster):
;MM1 = SWAP (MM0), MM0 destroyed
MOVQ MM1, MM0 ;make a copy
PUNPCKLDQ MM0, MM0 ;duplicate lower half
PUNPCKHDQ MM1, MM0 ;combine lower halves
Example 2 (Preferred, fast):
;MM1 = SWAP (MM0), MM0 preserved
MOVQ MM1, MM0 ;make a copy
PUNPCKHDQ MM1, MM1 ;duplicate upper half
PUNPCKLDQ MM1, MM0 ;combine upper halves
Both examples accomplish the swapping, but the first example
should be used if the original contents of the register do not
need to be preserved. The first example is faster due to the fact
that the MOVQ and PUNPCKLDQ instructions can execute in
parallel. The instructions in the second example are dependent
on one another and take longer to execute.