Technical data

Algorithm Optimization Examples
Two groups of applications that have high vector processing potential
include equation solvers and signal processing routines. For example,
computational fluid dynamics, finite element analysis, molecular
dynamics, circuit simulation, quantum chromodynamics, and economic
modeling applications use various types of simultaneous or differential
equation solvers. Applications such as air pollution modeling, seismic
analysis, weather forecasting, radar imaging, speech and image
processing, and many other scientific and engineering applications use
signal processing routines, such as fast Fourier transforms (FFT), to
obtain solutions.
A.1 EQUATION SOLVERS
Equation solvers generally fall into four categories: general rectangle,
symmetric, hermitian, and tridiagonal. The most common benchmark
used to measure a computer system’s ability to solve a general rectangular
system of linear equations is Linpack. The Linpack benchmarks,
developed at Argonne National Laboratory, measure the performance
across different computer systems while solving dense systems of 100,
300, and 1000 linear equations.
These benchmarks are currently written to call subroutines from the
Linpack library. The subroutines, in turn, call the basic linear algebra
subroutines (BLAS) at the lowest level. For each benchmark size,
there are different optimization rules which govern the type of changes
permitted in the Linpack report. Optimizations to the BLAS routines
are always allowed. Modifications can be made to the FORTRAN source
or by supplying the routine in macrocode. Algorithm changes are only
allowed for the largest problem size, the solution to a system of 1000
linear equations.
The smallest problem size uses a two-dimensional array that is 100
by 100. The benchmarks are written to use Gaussian elimination for
solving 100 simultaneous equations. This two-step method features a
factorization routine, xGEFA, and a solver, xGESL. Both are column-
oriented algorithms and use vector-vector level 1 BLAS routines. Column
orientation increases program efficiency because it improves locality of
data based on the way FORTRAN stores arrays.
As shown in Example A–1, the BLAS level 1 routines allow the user to
schedule the instructions optimally in vector macrocode. Deficiencies
in BLAS 1 routines include frequent synchronization, a large calling
overhead, and more vector load and store operations in comparison to
other vector arithmetic operations.
A–2