Technical data
Contents
CHAPTER 3 OPTIMIZING WITH MACRO-32 3–1
3.1 VECTORIZATION 3–2
3.1.1 Using Vectorization Alone 3–2
3.1.2 Combining Decomposition with Vectorization 3–3
3.1.3 Algorithms 3–5
3.2 CROSSOVER POINT 3–5
3.3 SCALAR/VECTOR SYNCHRONIZATION 3–6
3.3.1 Scalar/Vector Instruction Synchronization (SYNC) 3–6
3.3.2 Scalar/Vector Memory Synchronization
3–7
3.3.2.1 Memory Instruction Synchronization (MSYNC) • 3–8
3.3.2.2 Memory Activity Completion Synchronization (VMAC) • 3–9
3.3.3 Memory Synchronization Within the Vector Processor
(VSYNC) 3–9
3.3.4 Exceptions 3–10
3.3.4.1 Imprecise Exceptions • 3–10
3.3.4.2 Precise Exceptions • 3–11
3.4 INSTRUCTION FLOW 3–11
3.4.1 Load Instruction 3–12
3.4.2 Store Instruction 3–13
3.4.3 Memory Management Okay (MMOK) 3–14
3.4.4 Gather/Scatter Instructions 3–14
3.4.5 Masked Load/Store, Gather/Scatter Instructions 3–15
3.5 OVERLAP OF ARITHMETIC AND LOAD/STORE INSTRUCTIONS 3–15
3.5.1 Maximizing Instruction Execution Overlap 3–16
3.6 OUT-OF-ORDER INSTRUCTION EXECUTION 3–18
3.7 CHAINING 3–20
3.8 CACHE 3–21
3.9 STRIDE/TRANSLATION BUFFER MISS 3–22
v