user manual

60 Replace Branches with Computation in 3DNow! Code
AMD Athlon Processor x86 Code Optimization
22007E/0November 1999
Replace Branches with Computation in 3DNow! Code
Branches negatively impact the performance of 3DNow! code.
Branches can operate only on one data item at a time, i.e., they
are inherently scalar and inhibit the SIMD processing that
makes 3DNow! code superior. Also, branches based on 3DNow!
comparisons require data to be passed to the integer units,
which requires either transport through memory, or the use of
MOVD reg, MMreg instructions. If the body of the branch is
small, one can achieve higher performance by replacing the
branch with computation. The computation simulates
predicated execution or conditional moves. The principal tools
for this are the following instructions: PCMPGT, PFCMPGT,
PFCMPGE, PFMIN, PFMAX, PAND, PANDN, POR, PXOR.
Muxing Constructs
The most important construct to avoiding branches in
3DNow! and MMX code is a 2-way muxing construct that is
equivalent to the ternary operator ?: in C and C++. It is
implemented using the PCMP/PFCMP, PAND, PANDN, and
POR instructions. To maximize performance, it is important to
apply the PAND and PANDN instructions in the proper order.
Example 1 (Avoid):
; r = (x < y) ? a : b
;
; in: mm0 a
; mm1 b
; mm2 x
; mm3 y
; out: mm1 r
PCMPGTD MM3, MM2 ; y > x ? 0xffffffff : 0
MOVQ MM4, MM3 ; duplicate mask
PANDN MM3, MM0 ; y > x ? 0 : a
PAND MM1, MM4 ; y > x ? b : 0
POR MM1, MM3 ; r = y > x ? b : a
Because the use of PANDN destroys the mask created by PCMP,
the mask needs to be saved, which requires an additional
register. This adds an instruction, lengthens the dependency
chain, and increases register pressure. Therefore 2-way muxing
constructs should be written as follows.