Technical data

Algorithm Optimization Examples

When the value of N is relatively small (that is, N < 256), the two-

dimensional FFT can be computed by calling a one-dimensional FFT of

length N**2. The small two-dimensional FFT can achieve performance

equal to that of the aggregate size one-dimensional FFT by linearizing the

data array. Figure A–7 shows the tradeoff between using the linearized

two-dimensional routine (for small N) and the transposed method (for

large N) to maintain high performance across all data sizes.

The optimization of an algorithm that vectorizes poorly in its original form

has been shown. The resulting algorithm yields much higher performance

on the VAX 6000 Model 400 processor. High performance is due to the

unique way the algorithm touches contiguous memory locations and its

effort to maximize the vector length. The implementation described above

always uses unity stride vectors and always results in a vector length of

64 for FFT lengths greater than 2K (2 x 1024).

Figure A–7 Two-Dimensional Fast Fourier Transform Performance Graph,

Optimized Single-Precision Complex Transforms

Refer to the printed version of this book, EK–60VAA–PG.

A–12