Technical data

Optimizing with MACRO-32

Example 3–2 Maximizing Instruction Execution Overlap

Instruction Sequence 1

VLDL base1,#4,V1 IEEEEEEEEE

VLDL base2,#4,V2 IEEEEEEEEE

VVMULL V3,V1,V1 IEEEEE

VVADDL V1,V2,V2 I....EEEEE

VSTL V2,base,#4 IEEEEEEEEE

Instruction Sequence 2

VLDL base1,#4,V1 IEEEEEEEEE

VVMULL V3,V1,V1 IEEEEE

VLDL base2,#4,V2 IEEEEEEEEE

VVADDL V1,V2,V2 IEEEEE

VSTL V2,base,#4 IEEEEEEEEE

A load instruction cannot begin execution until the register to which

it will write is free. A register conﬂict may occur if the destination

by a preceding arithmetic instruction. If instruction execution overlap

could occur if the load instruction were using a different register, then the

Example 3–3 shows the effects of register conﬂict. In the ﬁrst instruction

sequence the VLDL instruction must wait until the VVADDL instruction

completes and the VVMULL instruction begins because VLDL will change

the contents of one of the registers that provides input to the deferred

VVMULL instruction. In the second instruction sequence it is possible

to take advantage of the deferred arithmetic instruction queue and

overlap the VLDL and arithmetic instruction execution because the

VLDL instruction does not change the registers used by the arithmetic

instructions. By simply changing the register to which the VLDL will

write, the total execution time for the instruction sequence is reduced.

The locality of reference of data plays an important role in determining

the performance of load/store operations. Unity stride load and store

instructions are the most efﬁcient. For this reason, whenever possible

data should be stored in the sequential order in which it is usually

referenced.

3–17