Technical data
Optimizing with MACRO-32
Example 3–2 Maximizing Instruction Execution Overlap
Instruction Sequence 1
VLDL base1,#4,V1 IEEEEEEEEE
VLDL base2,#4,V2 IEEEEEEEEE
VVMULL V3,V1,V1 IEEEEE
VVADDL V1,V2,V2 I....EEEEE
VSTL V2,base,#4 IEEEEEEEEE
Instruction Sequence 2
VLDL base1,#4,V1 IEEEEEEEEE
VVMULL V3,V1,V1 IEEEEE
VLDL base2,#4,V2 IEEEEEEEEE
VVADDL V1,V2,V2 IEEEEE
VSTL V2,base,#4 IEEEEEEEEE
A load instruction cannot begin execution until the register to which
it will write is free. A register conflict may occur if the destination
register of a load instruction is the same as one of the registers used
by a preceding arithmetic instruction. If instruction execution overlap
could occur if the load instruction were using a different register, then the
register conflict can be eliminated by simply changing the register used.
Example 3–3 shows the effects of register conflict. In the first instruction
sequence the VLDL instruction must wait until the VVADDL instruction
completes and the VVMULL instruction begins because VLDL will change
the contents of one of the registers that provides input to the deferred
VVMULL instruction. In the second instruction sequence it is possible
to take advantage of the deferred arithmetic instruction queue and
overlap the VLDL and arithmetic instruction execution because the
VLDL instruction does not change the registers used by the arithmetic
instructions. By simply changing the register to which the VLDL will
write, the total execution time for the instruction sequence is reduced.
The locality of reference of data plays an important role in determining
the performance of load/store operations. Unity stride load and store
instructions are the most efficient. For this reason, whenever possible
data should be stored in the sequential order in which it is usually
referenced.
3–17