Technical data
Contents
3.10 REGISTER REUSE 3–25
APPENDIX A ALGORITHM OPTIMIZATION EXAMPLES A–1
A.1 EQUATION SOLVERS A–2
A.2 SIGNAL PROCESSING—FAST FOURIER TRANSFORMS A–7
A.2.1 Optimized One-Dimensional Fast Fourier Transforms A–7
A.2.2 Optimized Two-Dimensional Fast Fourier Transforms A–9
GLOSSARY
INDEX
EXAMPLES
3–1 Overlapped Load and Arithmetic Instructions 3–16
3–2 Maximizing Instruction Execution Overlap 3–17
3–3 Effects of Register Conflict 3–18
3–4 Deferred Arithmetic Instruction Queue 3–19
3–5 A Load Stalled due to an Arithmetic Instruction 3–19
3–6 Use of the Deferred Arithmetic Instruction Queue 3–20
3–7 Example of Chain Into Store 3–21
3–8 Matrix Multiply—Basic 3–24
3–9 Matrix Multiply—Improved 3–24
3–10 Matrix Multiply—Optimal 3–26
A–1 Core Loop of a BLAS 1 Routine Using Vector-Vector Operations A–3
A–2 Core Loop of a BLAS 2 Routine Using Matrix-Vector Operations A–5
A–3 Core Loop of a BLAS 3 Routine Using Matrix-Matrix Operations A–6
vi