Technical data
Vector Processing Concepts
Consider a scalar program that uses 20% of its code 70% of the time. If
this 20% portion of the code is converted to vector processing code, the
program is considered to have a vectorization factor of 70%. If the time
for the scalar operation is set to 1 and the time for a vector operation is
10%, we have:
T = N * (.30 * 1 + .70 * .1)
T = N * .37
If performance (P), equals operations performed (N) per unit time (T)
then, with T = N * .37:
P = N / T = N / (N * .37) = 1 / .37 = 2.7
The improved performance, shown in Figure 1–9, would be about 2.7
times faster than a scalar processor. Vectorization factors above 70%
achieve performance above the same computer using scalar processing.
The speedup ratio is defined as the vector performance divided by the
scalar performance.
1.10.3 Crossover Point
The crossover point is the vector length or number of elements at which
the vector unit exceeds the performance of the scalar unit for a particular
instruction or sequence. To achieve a performance improvement on a
given vector processor, a vectorized application should have an average
vector length that is larger than the crossover point for that processor and
the vector operations used.
The smaller the crossover point, the better. A crossover point of 11 means
that DO loops below 11 elements are performed faster using a scalar
processor than by using a vector processor. This point is a result of the
overhead instructions and time required to set up the vector processor,
process the data, and return the solution. This point varies from computer
to computer.
Vector operations add some startup overhead, putting a limit on the
minimum number of elements in an array. For small arrays, the time
to process and compile the data is usually longer than doing the same
process on a scalar processor.
1–22