Technical data

Vector Processing Concepts

ability to overlap vector and scalar operations can give better performance

than those that do not.

1.2.2 Memory vs. Register Integrated Vector Processors

There are two types of integrated vector processor architectures: memory-

to-memory and register-to-register.

In a memory-to-memory architecture, vector data is fetched directly from

memory into the function units of the vector processing unit. Once the

data is operated on, the results are returned directly to memory.

With a register-to-register (or load/store) architecture, vector data is ﬁrst

loaded from memory into a set of high-speed registers. From there it is

moved into the function units and operated on. The resulting data is not

returned to the registers until all operations are complete, at which point

the vector data is stored back in memory.

For applications that use very long vectors (on the order of thousands of

elements), a memory-to-memory architecture works quite well. Once the

overhead involved in starting the vector operation is completed, results

can be produced at the rate of one element per cycle. On the other hand,

with a register-to-register architecture, only a limited segment of the

array can be processed at once, and the load/store overhead (or latency)

must be paid over and over. With long vectors, this overhead can reduce

the performance advantage of high-speed registers.

However, several hardware techniques can be implemented by a register-

to-register architecture that can help amortize this load/store overhead.

By using techniques such as chaining and instruction overlap, multiple

operations can be executed concurrently on the same set of vector data

while that data is still in the vector registers. Intermediate (temporary)

values need not be returned to memory. Such techniques are not possible

with a memory-to-memory architecture.

1.3 VECTORIZING COMPILERS

Developing programs to take maximum advantage of a speciﬁc vector

processor requires a great deal of knowledge of, and attention to, the

particular vector computer hardware. Fortunately most applications that

beneﬁt from vector processing can be written in a high-level programming

language, such as FORTRAN, and submitted to a vectorizing compiler

for that language. The primary function of a vectorizing compiler is to

analyze the source program for combinations of arithmetic operations

1–8