Technical data

Optimizing with MACRO-32

In Example 3–7, the VSTL instruction requires the result of the VVADDL

instruction and without chain into store would have to wait for the

VVADDL to complete before beginning the store operation. The use of

chain into store allows the VSTL operation to begin after the ﬁrst result

of the add is complete, while the VVADDL is still executing and greater

overlap of instruction execution is the result. The instruction sequence

requires a shorter period of time to complete.

The coordination of the arithmetic operation and the VSTORE for a chain

into store is handled by the vector arithmetic unit and depends on a

number of factors such as vector length.

Example 3–7 Example of Chain Into Store

Instruction Sequence

VVADDL V1,V2,V3

VVMULL V1,V2,V4

VSTL V3,base,#4

Execution without Chain into Store:

Issue VVADDL IEEEEEEEE

Issue deferred VVMULL I.......EEEEEEEE

Issue VSTL IEEEEEEEEEEEEEE

Execution with Chain into Store:

Issue VVADDL IEEEEEEEE

Issue deferred VVMULL I.......EEEEEEEE

Issue VSTL IEEEEEEEEEEEEEE

3.8 CACHE

With the 1-Mbyte vector cache, up to four load operations with cache

misses can be queued at one time. The pipeline continues processing

vector element loads until a fourth cache miss occurs. At that point the

cache miss queue is full and the pipeline stalls. The pipeline remains

stalled until one of the cache misses is serviced. Cache misses on a load

instruction degrade the performance of the load/store pipeline.

3–21