Technical data
Optimizing with MACRO-32
When a gather or scatter instruction is received by the vector control unit,
the destination/source register is checked against outstanding arithmetic
instructions. If there are no conflicts, the instruction is dispatched
to the load/store unit. The load/store unit will then fetch the offset
vector register. When this is complete, the vector control unit reissues
the instruction and the gather/scatter operation takes place using the
previously stored offset vector register to generate the virtual addresses.
A gather instruction is used to collect memory data into vector registers
when the memory data does not have a constant stride. The memory data
starts with a base address plus an offset number of up to a 64-element
(depending on VL) register of offsets. The elements are loaded nearly as
fast as a load instruction and are loaded sequentially in the destination
register. (The scatter instruction stores the result back to memory using
the same offsets.)
3.4.5 Masked Load/Store, Gather/Scatter Instructions
The operation for masked memory instructions is identical to the
unmasked versions except the following operations are performed first.
The vector controller checks if any outstanding arithmetic instructions
will modify the mask register. If not, the vector controller reads the mask
from the arithmetic unit and sends it to the load/store unit. The sequence
is then performed as above.
3.5 OVERLAP OF ARITHMETIC AND LOAD/STORE INSTRUCTIONS
Arithmetic instructions and load/store instructions may overlap because
the functional units are independent. To achieve this overlap, the
following conditions must be met:
• The arithmetic instruction must be issued before the load/store
instruction.
• There must be no register conflict between the arithmetic and load
/store instructions.
In the following example, while the results of vector register 2, V2,
are being calculated, vector register 4, V4, is being stored in memory.
Consequently, this is referred to as overlapping instructions.
VVADDL V1,V3,V2
VSTL V4,base,#4
3–15