user manual

AMD Athlon Processor Microarchitecture 131
22007E/0November 1999 AMD Athlon Processor x86 Code Optimization
Figure 1. AMD Athlon Processor Block Diagram
Instruction Cache
The out-of-order execute engine of the AMD Athlon processor
contains a very large 64-Kbyte L1 instruction cache. The L1
instruction cache is organized as a 64-Kbyte, two-way,
set-associative array. Each line in the instruction array is 64
bytes long. Functions associated with the L1 instruction cache
are instruction loads, instruction prefetching, instruction
predecoding, and branch prediction. Requests that miss in the
L1 instruction cache are fetched from the backside L2 cache or,
subsequently, from the local memory using the bus interface
unit (BIU).
The instruction cache generates fetches on the naturally
aligned 64 bytes containing the instructions and the next
sequential line of 64 bytes (a prefetch). The principal of
program spatial locality makes data prefetching very effective
and avoids or reduces execution stalls due to the amount of
time wasted reading the necessary data. Cache line
Load / Store Queue Unit
IEU0 AGU0
Instruction Control Unit (72-Entry)
Fetch/Decode
Control
2-Way, 64-Kbyte Data Cache
32-Entry L1 TLB/256-Entry L2 TLB
3-Way x86 Instruction Decoders
FPU Register File (88-Entry)
MMX
3DNow!™
FMUL
MMX
3DNow
!
IEU1
Integer Scheduler (18-Entry)
FPU Stack Map / Rename
L2 SRAMs
System Interface
2-Way, 64-Kbyte Instruction Cache
24-Entry L1 TLB/256-Entry L2 TLB
Predecode
Cache
Branch
Prediction Table
L2 Cache
Controller
Bus
Interface
Unit
FPU Scheduler (36-Entry)
AGU1 IEU2 AGU2
FADD
FSTORE