Technical data

Algorithm Optimization Examples
For improved performance on VAX vector systems, the use of a matrix
transpose can dramatically increase the vector processing performance
of two-dimensional FFTs for large values of N (that is, N > 256).
The difference between unity stride and nonunity stride is the key
performance issue. Figure A–6 shows that a vectorized matrix transpose
can be performed after each set of N one-dimensional FFTs. The
computation will be equivalent to Figure A–2 but with a matrix transpose:
each one-dimensional FFT will be column access which is unity stride.
The overhead of transposing the matrix becomes negligible for large
values of N.
Figure A–6 Two-Dimensional Fast Fourier Transforms Using a Matrix Transpose
Between Each Set of N Column One-Dimensional Fast Fourier
Transforms
Refer to the printed version of this book, EK–60VAA–PG.
A–11