Technical data

Contents

CHAPTER 3 OPTIMIZING WITH MACRO-32 3–1

3.1 VECTORIZATION 3–2

3.1.1 Using Vectorization Alone 3–2

3.1.2 Combining Decomposition with Vectorization 3–3

3.1.3 Algorithms 3–5

3.2 CROSSOVER POINT 3–5

3.3 SCALAR/VECTOR SYNCHRONIZATION 3–6

3.3.1 Scalar/Vector Instruction Synchronization (SYNC) 3–6

3.3.2 Scalar/Vector Memory Synchronization

3–7

3.3.2.1 Memory Instruction Synchronization (MSYNC) • 3–8

3.3.2.2 Memory Activity Completion Synchronization (VMAC) • 3–9

3.3.3 Memory Synchronization Within the Vector Processor

(VSYNC) 3–9

3.3.4 Exceptions 3–10

3.3.4.1 Imprecise Exceptions • 3–10

3.3.4.2 Precise Exceptions • 3–11

3.4 INSTRUCTION FLOW 3–11

3.4.1 Load Instruction 3–12

3.4.2 Store Instruction 3–13

3.4.3 Memory Management Okay (MMOK) 3–14

3.4.4 Gather/Scatter Instructions 3–14

3.4.5 Masked Load/Store, Gather/Scatter Instructions 3–15

3.5 OVERLAP OF ARITHMETIC AND LOAD/STORE INSTRUCTIONS 3–15

3.5.1 Maximizing Instruction Execution Overlap 3–16

3.6 OUT-OF-ORDER INSTRUCTION EXECUTION 3–18

3.7 CHAINING 3–20

3.8 CACHE 3–21

3.9 STRIDE/TRANSLATION BUFFER MISS 3–22