user manual

70 Unrolling Loops
AMD Athlon Processor x86 Code Optimization
22007E/0November 1999
no faster than three iterations in 10 cycles, or 6/10
floating-point adds per cycle, or 1.4 times as fast as the original
loop.
Deriving Loop
Control For Partially
Unrolled Loops
A frequently used loop construct is a counting loop. In a typical
case, the loop count starts at some lower bound lo, increases by
some fixed, positive increment inc for each iteration of the
loop, and may not exceed some upper bound hi. The following
example shows how to partially unroll such a loop by an
unrolling factor of fac, and how to derive the loop control for
the partially unrolled version of the loop.
Example 1 (rolled loop):
for (k = lo; k <= hi; k += inc) {
x[k] =
...
}
Example 2 (partially unrolled loop):
for (k = lo; k <= (hi - (fac-1)*inc); k += fac*inc) {
x[k] =
...
x[k+inc] =
...
...
x[k+(fac-1)*inc] =
...
}
/* handle end cases */
for (k = k; k <= hi; k += inc) {
x[k] =
...
}