Technical data

Optimizing with MACRO-32
In Example 3–7, the VSTL instruction requires the result of the VVADDL
instruction and without chain into store would have to wait for the
VVADDL to complete before beginning the store operation. The use of
chain into store allows the VSTL operation to begin after the first result
of the add is complete, while the VVADDL is still executing and greater
overlap of instruction execution is the result. The instruction sequence
requires a shorter period of time to complete.
The coordination of the arithmetic operation and the VSTORE for a chain
into store is handled by the vector arithmetic unit and depends on a
number of factors such as vector length.
Example 3–7 Example of Chain Into Store
Instruction Sequence
VVADDL V1,V2,V3
VVMULL V1,V2,V4
VSTL V3,base,#4
Execution without Chain into Store:
Issue VVADDL IEEEEEEEE
Issue deferred VVMULL I.......EEEEEEEE
Issue VSTL IEEEEEEEEEEEEEE
Execution with Chain into Store:
Issue VVADDL IEEEEEEEE
Issue deferred VVMULL I.......EEEEEEEE
Issue VSTL IEEEEEEEEEEEEEE
3.8 CACHE
With the 1-Mbyte vector cache, up to four load operations with cache
misses can be queued at one time. The pipeline continues processing
vector element loads until a fourth cache miss occurs. At that point the
cache miss queue is full and the pipeline stalls. The pipeline remains
stalled until one of the cache misses is serviced. Cache misses on a load
instruction degrade the performance of the load/store pipeline.
3–21