Technical data

Optimizing with MACRO-32
Example 3–6 is another example of the use of a deferred arithmetic
instruction. In this case, a divide instruction is followed by an add
and then a load. The deferred instruction queue and the length of the
divide instruction combine to "hide" the load instruction (that is, the
execution time of the load instruction does not contribute to the total
execution time of the instruction sequence). Note also that the divide
instruction completes after the load completes. Out of order completion of
instructions is possible.
Example 3–6 Use of the Deferred Arithmetic Instruction Queue
Instruction Sequence
VVDIVL V1,V2,V3
VVADDL V3,V1,V4
VLDL base,#4,V5
Execution without Deferred Instruction Queue
Issue VVDIVL IEEEEEEEEEEEEEEEEEEEE
Issue VVADDL IEEEEEEEE
Issue VLDL IEEEEEEEEEEEEEE
Execution with Deferred Instruction Queue
Issue VVDIVL IEEEEEEEEEEEEEEEEEEEE
Issue deferred VVADDL I...................EEEEEEEE
Issue VLDL IEEEEEEEEEEEEEE
3.7 CHAINING
Vector operands are generally read from and written to the vector register
file. An exception to this process occurs when a store instruction is
waiting for the results of a currently executing arithmetic instruction.
(Divide instructions are not included in this exception because they do
not have the same degree of pipelining as the other instructions.) As
results are generated by the arithmetic instruction and are ready to be
written to the register file, they are also immediately available for input
to the waiting store instruction. Therefore, the store instruction can begin
processing the data before the arithmetic instruction has completed. This
process is called "chain into store." The store instruction will not overrun
the arithmetic instruction because the store instruction cannot process
data faster than the arithmetic unit can generate results.
3–20