Technical data

VAX 6000 Series Vector Processor

2.10 INSTRUCTION EXECUTION

The vector pipeline is made up of a varying number of segments

depending on the type of instruction being executed. Once an instruction

is issued, the pipeline is under the control of the load/store unit or the

arithmetic unit. The interaction between the different function units of

the vector module can greatly affect the performance/execution of vector

instructions.

The execution time of a vector instruction can be calculated using the

following equation:

FC + IC * round_up [ VL / NPP ]

where FC is the ﬁxed cost and IC is the incremental cost per vector

element, NPP is the number of parallel pipelines, and VL is the length

(number of elements) of the vector operand. This can be rewritten in

terms of the data as:

Startup_latency + Execution_time

where Execution_time is a function of vector length.

Note that the execution of D_ and G_ﬂoating (64-bit data) type arithmetic

instructions (except divide) can only produce results every two cycles

due to the bandwidth of the interconnect between the register ﬁle and

the vector FPU, whereas F_ﬂoating type arithmetic instructions (except

divide) produce results each cycle.

The execution time of a sequence of instructions is not necessarily equal

to the sum of the execution times of the individual instructions. Overlap

can occur between arithmetic instructions and load/store instructions as

well as between individual arithmetic instructions. It is possible that a

sequence of instructions consisting of two arithmetics followed by a load

or store can have a total execution time just slightly longer than the

execution time of the load or store or equal to the total execution time of

the arithmetics, whichever is longer.

In the case of overlap between individual arithmetic instructions, a

minimum of one cycle must elapse between the ﬁnal result of the ﬁrst

instruction and the ﬁrst result of the following instruction. In other

words, when overlap occurs the total execution time decreases. For all

overlapping arithmetic instructions, other than the ﬁrst instruction to

enter the empty pipeline, the effective ﬁxed cost (or startup latency) is

reduced to a minimum of one cycle.

2–21