Technical data
Algorithm Optimization Examples
Figure A–2 Cooley-Tukey Butterfly Graph, One-Dimensional Fast Fourier Transform
for N = 16
Refer to the printed version of this book, EK–60VAA–PG.
is performed as close to the center as possible, when the stage number
= LOG(N)/2. To complete the algorithm, the butterfly distances now
increase again. Second, this process entirely eliminates the need for short
butterflies.
Another optimization made to the FFT algorithm is the use of a table
lookup method to access the sine and cosine factors, which reduces
repetitive calls to the computationally intensive trigonometric functions.
The initialization of this trigonometric table has been fully vectorized
but shows only a modest factor of 2 performance gain. To build the
table, a first order linear recurrence loop is formed that severely limits
vector speedup. Because this calculation is only done once, it becomes
negligible for multiple calls to the one-dimensional FFTs and for all
higher dimensional FFTs. The benchmark shown in Figure A–4 was
looped and includes the calculation of the trigonometric table performed
once for each FFT data length.
A–8