Technical data

Contents

FIGURES

1–1 Scalar vs. Vector Processing 1–5

1–2 Vector Registers 1–10

1–3 Vector Function Units

1–11

1–4 Pipelining a Process 1–12

1–5 Constant-Strided Vectors in Memory

1–16

1–6 Random-Strided Vectors in Memory 1–16

1–7 Vector Gather and Scatter Instructions 1–17

1–8 Computer Performance Dominated by Slowest Process 1–20

1–9 Computer Performance vs. Vectorized Code 1–21

2–1 Scalar/Vector Pair Block Diagram 2–3

2–2 FV64A Vector Processor Block Diagram 2–4

2–3 Vector Count, Vector Length, Vector Mask, and Vector Registers 2–10

2–4 Virtual Address Format

2–11

2–5 Address/Data Flow in Load/Store Pipeline 2–13

2–6 Cache Arrangement

2–14

2–7 Physical Address Division 2–14

2–8 Main Tag Memory Organization 2–15

2–9 Data Cache Logical Organization 2–15

2–10 Vector Processor Units 2–17

2–11 Vector Arithmetic Unit 2–20

A–1 Linpack Performance Graph, Double-Precision BLAS Algorithms A–4

A–2 Cooley-Tukey Butterﬂy Graph, One-Dimensional Fast Fourier

Transform for N = 16 A–8

A–3 Optimized Cooley-Tukey Butterﬂy Graph, One-Dimensional Fast

Fourier Transform for N = 16 A–9

A–4 One-Dimensional Fast Fourier Transform Performance Graph,

Optimized Single-Precision Complex Transforms A–10

A–5 Two-Dimensional Fast Fourier Transforms Using N Column and N

Row One-Dimensional Fast Fourier Transforms A–10

A–6 Two-Dimensional Fast Fourier Transforms Using a Matrix Transpose

Between Each Set of N Column One-Dimensional Fast Fourier

Transforms A–11

A–7 Two-Dimensional Fast Fourier Transform Performance Graph,

Optimized Single-Precision Complex Transforms A–12

vii