Technical data
VAX 6000 Series Vector Processor
2.10 INSTRUCTION EXECUTION
The vector pipeline is made up of a varying number of segments
depending on the type of instruction being executed. Once an instruction
is issued, the pipeline is under the control of the load/store unit or the
arithmetic unit. The interaction between the different function units of
the vector module can greatly affect the performance/execution of vector
instructions.
The execution time of a vector instruction can be calculated using the
following equation:
FC + IC * round_up [ VL / NPP ]
where FC is the fixed cost and IC is the incremental cost per vector
element, NPP is the number of parallel pipelines, and VL is the length
(number of elements) of the vector operand. This can be rewritten in
terms of the data as:
Startup_latency + Execution_time
where Execution_time is a function of vector length.
Note that the execution of D_ and G_floating (64-bit data) type arithmetic
instructions (except divide) can only produce results every two cycles
due to the bandwidth of the interconnect between the register file and
the vector FPU, whereas F_floating type arithmetic instructions (except
divide) produce results each cycle.
The execution time of a sequence of instructions is not necessarily equal
to the sum of the execution times of the individual instructions. Overlap
can occur between arithmetic instructions and load/store instructions as
well as between individual arithmetic instructions. It is possible that a
sequence of instructions consisting of two arithmetics followed by a load
or store can have a total execution time just slightly longer than the
execution time of the load or store or equal to the total execution time of
the arithmetics, whichever is longer.
In the case of overlap between individual arithmetic instructions, a
minimum of one cycle must elapse between the final result of the first
instruction and the first result of the following instruction. In other
words, when overlap occurs the total execution time decreases. For all
overlapping arithmetic instructions, other than the first instruction to
enter the empty pipeline, the effective fixed cost (or startup latency) is
reduced to a minimum of one cycle.
2–21