Technical data
Contents
FIGURES
1–1 Scalar vs. Vector Processing 1–5
1–2 Vector Registers 1–10
1–3 Vector Function Units
1–11
1–4 Pipelining a Process 1–12
1–5 Constant-Strided Vectors in Memory
1–16
1–6 Random-Strided Vectors in Memory 1–16
1–7 Vector Gather and Scatter Instructions 1–17
1–8 Computer Performance Dominated by Slowest Process 1–20
1–9 Computer Performance vs. Vectorized Code 1–21
2–1 Scalar/Vector Pair Block Diagram 2–3
2–2 FV64A Vector Processor Block Diagram 2–4
2–3 Vector Count, Vector Length, Vector Mask, and Vector Registers 2–10
2–4 Virtual Address Format
2–11
2–5 Address/Data Flow in Load/Store Pipeline 2–13
2–6 Cache Arrangement
2–14
2–7 Physical Address Division 2–14
2–8 Main Tag Memory Organization 2–15
2–9 Data Cache Logical Organization 2–15
2–10 Vector Processor Units 2–17
2–11 Vector Arithmetic Unit 2–20
A–1 Linpack Performance Graph, Double-Precision BLAS Algorithms A–4
A–2 Cooley-Tukey Butterfly Graph, One-Dimensional Fast Fourier
Transform for N = 16 A–8
A–3 Optimized Cooley-Tukey Butterfly Graph, One-Dimensional Fast
Fourier Transform for N = 16 A–9
A–4 One-Dimensional Fast Fourier Transform Performance Graph,
Optimized Single-Precision Complex Transforms A–10
A–5 Two-Dimensional Fast Fourier Transforms Using N Column and N
Row One-Dimensional Fast Fourier Transforms A–10
A–6 Two-Dimensional Fast Fourier Transforms Using a Matrix Transpose
Between Each Set of N Column One-Dimensional Fast Fourier
Transforms A–11
A–7 Two-Dimensional Fast Fourier Transform Performance Graph,
Optimized Single-Precision Complex Transforms A–12
vii