Technical data

Contents
CHAPTER 3 OPTIMIZING WITH MACRO-32 3–1
3.1 VECTORIZATION 3–2
3.1.1 Using Vectorization Alone 3–2
3.1.2 Combining Decomposition with Vectorization 3–3
3.1.3 Algorithms 3–5
3.2 CROSSOVER POINT 3–5
3.3 SCALAR/VECTOR SYNCHRONIZATION 3–6
3.3.1 Scalar/Vector Instruction Synchronization (SYNC) 3–6
3.3.2 Scalar/Vector Memory Synchronization
3–7
3.3.2.1 Memory Instruction Synchronization (MSYNC) 3–8
3.3.2.2 Memory Activity Completion Synchronization (VMAC) 3–9
3.3.3 Memory Synchronization Within the Vector Processor
(VSYNC) 3–9
3.3.4 Exceptions 3–10
3.3.4.1 Imprecise Exceptions 3–10
3.3.4.2 Precise Exceptions 3–11
3.4 INSTRUCTION FLOW 3–11
3.4.1 Load Instruction 3–12
3.4.2 Store Instruction 3–13
3.4.3 Memory Management Okay (MMOK) 3–14
3.4.4 Gather/Scatter Instructions 3–14
3.4.5 Masked Load/Store, Gather/Scatter Instructions 3–15
3.5 OVERLAP OF ARITHMETIC AND LOAD/STORE INSTRUCTIONS 3–15
3.5.1 Maximizing Instruction Execution Overlap 3–16
3.6 OUT-OF-ORDER INSTRUCTION EXECUTION 3–18
3.7 CHAINING 3–20
3.8 CACHE 3–21
3.9 STRIDE/TRANSLATION BUFFER MISS 3–22
v