Technical data

Vector Processing Concepts

A scalar processor operates on single quantities of data. Consider the

following operation: A(I) = B(I) + 50. As illustrated in Figure 1–1, ﬁve

separate instructions must be performed, using one instruction per unit

of time, for each value from 1 to I

max

(some CPUs may combine steps and

use fewer units of time):

1 Load ﬁrst element from location B.

2 Add 50 to the ﬁrst element.

3 Store the result in location A.

4 Increment the counter.

5 Test the counter for index I

max

If I

max

is reached, the operation is complete. If not, steps 1 through 5 are

repeated. To calculate A(I) for 100 elements using these instructions (5 X

100), or 500 scalar instructions, takes 500 units of computer time.

Since a vector processor operates on complete vectors of independent data

at the same time, only three instructions are needed to perform the same

operation A(I) using a vector processor:

1 Load the array from memory into the vector processor register B.

2 Add 50 to all elements in the array, placing the results in register A.

3 Store the entire vector back into memory.

The ﬂow of data optimizes the use of memory and reduces the overhead

to perform each operation. Within the vector processor, much the same

processing may occur as in the scalar processor, but the vector processor

is optimized to do it faster. It is important to remember that vector

operations generate the same result as scalar operations.

1–4