Technical data

Optimizing with MACRO-32

In the following examples, an I represents instruction issue time and an E

represents instruction execution time. A series of periods represents wait

time in the arithmetic unit for deferred instructions. Notice that these

are not exact timing examples, since they do not correspond to individual

instruction timings, but are for illustration purposes only.

In Example 3–1 the execution of the VLDL instruction does overlap the

VVADDL instruction because there is no conﬂict in the destination vector

registers, V3 and V1, for the add and load respectively.

Example 3–1 Overlapped Load and Arithmetic Instructions

VVADDL V1,V2,V3 IEEEEEEEE

VLDL base,#4,V1 IEEEEEEEEEEEEEE

3.5.1 Maximizing Instruction Execution Overlap

Three important hardware features help to maximize instruction overlap

in the load/store unit. First, a load or store instruction can execute in

parallel with up to two arithmetic instructions, provided the arithmetic

instructions are issued ﬁrst. Second, the chain into store sequence can

reduce the perceived execution time of a store instruction. Finally, early

detection of no memory faults allows scalar-to-vector communications to

overlap with load or store instruction execution.

In the ﬁrst instruction sequence in Example 3–2 there is little overlapping

of instructions, whereas in the second sequence the VVMULL and the

second VLDL instructions overlap and require less total time to complete

execution. The only difference between the two instruction sequences

is the order in which they are issued. Because the VVMULL does not

require the result of the second VLDL and can precede that instruction, a

signiﬁcant reduction in execution time is achieved.

Another effective way to maximize the overlap of load/store instructions is

to precede, wherever possible, all load and store instructions by at least

two arithmetic instructions. In this way both the load/store pipeline and

the arithmetic pipeline will be in use.

3–16