Technical data

Vector Processing Concepts

Consider a scalar program that uses 20% of its code 70% of the time. If

this 20% portion of the code is converted to vector processing code, the

program is considered to have a vectorization factor of 70%. If the time

for the scalar operation is set to 1 and the time for a vector operation is

10%, we have:

T = N * (.30 * 1 + .70 * .1)

T = N * .37

If performance (P), equals operations performed (N) per unit time (T)

then, with T = N * .37:

P = N / T = N / (N * .37) = 1 / .37 = 2.7

The improved performance, shown in Figure 1–9, would be about 2.7

times faster than a scalar processor. Vectorization factors above 70%

achieve performance above the same computer using scalar processing.

The speedup ratio is deﬁned as the vector performance divided by the

scalar performance.

1.10.3 Crossover Point

The crossover point is the vector length or number of elements at which

the vector unit exceeds the performance of the scalar unit for a particular

instruction or sequence. To achieve a performance improvement on a

given vector processor, a vectorized application should have an average

vector length that is larger than the crossover point for that processor and

the vector operations used.

The smaller the crossover point, the better. A crossover point of 11 means

that DO loops below 11 elements are performed faster using a scalar

processor than by using a vector processor. This point is a result of the

overhead instructions and time required to set up the vector processor,

process the data, and return the solution. This point varies from computer

to computer.

Vector operations add some startup overhead, putting a limit on the

minimum number of elements in an array. For small arrays, the time

to process and compile the data is usually longer than doing the same

process on a scalar processor.

1–22