Technical data
Optimizing with MACRO-32
In the load/store instruction, the Vector Length Register (VLR) and the
Vector Mask Register (VMR) with the match true/false (T/F) (when the
mask operate enable (MOE) bit is set) determine which elements to access
in Vc. For longwords, only bits <31:0> may be accessed. The elements
can be loaded or stored out of order, because there can be multiple load
/store units and multiple paths to memory, a desirable effect of vector
processors.
A Modify Intent (MI) bit may be used with the VLD instruction to improve
performance for systems that use writeback caches. The MI bit is not
used for store or scatter instructions.
During a load operation, the first element in memory at the base address
loads into the destination vector register. The next element in memory
at the base address plus the stride loads into the next location in the
destination vector register. With a vector load/store operation, the stride
is constant, so that the third address in memory is the base address plus
two times the stride.
3.4.2 Store Instruction
When the vector control unit receives a store instruction, the source
vector register is checked against outstanding arithmetic instructions. If
there are no conflicts, the instruction is dispatched to the load/store unit.
If the source for the store is the destination of the current arithmetic
instruction, and the deferred arithmetic instruction does not conflict with
the source vector register, and the arithmetic instruction is not a divide,
then the vector control unit waits for a signal from the arithmetic unit
to indicate that the store operation can start. The instruction is then
dispatched to the load/store unit.
During a store operation, the data moves in the opposite direction, from
the destination vector register back to memory. The elements of the vector
are placed back into memory at the base address plus a multiple of the
stride, as shown in the following example:
VLDL base,#4,V3 Load vector V3 from memory, starting at
the "base" address and obtaining next
elements every 4 bytes apart (stride = 4).
VSTL V1,base,#16 Store vector V4 into memory starting at
"base" address and placing next
elements 16 bytes apart.
3–13