Technical data

Vector Processing Concepts
ability to overlap vector and scalar operations can give better performance
than those that do not.
1.2.2 Memory vs. Register Integrated Vector Processors
There are two types of integrated vector processor architectures: memory-
to-memory and register-to-register.
In a memory-to-memory architecture, vector data is fetched directly from
memory into the function units of the vector processing unit. Once the
data is operated on, the results are returned directly to memory.
With a register-to-register (or load/store) architecture, vector data is first
loaded from memory into a set of high-speed registers. From there it is
moved into the function units and operated on. The resulting data is not
returned to the registers until all operations are complete, at which point
the vector data is stored back in memory.
For applications that use very long vectors (on the order of thousands of
elements), a memory-to-memory architecture works quite well. Once the
overhead involved in starting the vector operation is completed, results
can be produced at the rate of one element per cycle. On the other hand,
with a register-to-register architecture, only a limited segment of the
array can be processed at once, and the load/store overhead (or latency)
must be paid over and over. With long vectors, this overhead can reduce
the performance advantage of high-speed registers.
However, several hardware techniques can be implemented by a register-
to-register architecture that can help amortize this load/store overhead.
By using techniques such as chaining and instruction overlap, multiple
operations can be executed concurrently on the same set of vector data
while that data is still in the vector registers. Intermediate (temporary)
values need not be returned to memory. Such techniques are not possible
with a memory-to-memory architecture.
1.3 VECTORIZING COMPILERS
Developing programs to take maximum advantage of a specific vector
processor requires a great deal of knowledge of, and attention to, the
particular vector computer hardware. Fortunately most applications that
benefit from vector processing can be written in a high-level programming
language, such as FORTRAN, and submitted to a vectorizing compiler
for that language. The primary function of a vectorizing compiler is to
analyze the source program for combinations of arithmetic operations
1–8