user manual

ManualsBrandsAMD ManualsTypewriterTypewriter x86

141

142

143

144

145

146

147

148

149

150

132 AMD Athlon™ Processor Microarchitecture

AMD Athlon™ Processor x86 Code Optimization

22007E/0—November 1999

replacement is based on a least-recently used (LRU)

replacement algorithm.

The L1 instruction cache has an associated two-level translation

look-aside buffer (TLB) structure. The first-level TLB is fully

associative and contains 24 entries (16 that map 4-Kbyte pages

and eight that map 2-Mbyte or 4-Mbyte pages). The second-level

TLB is four-way set associative and contains 256 entries, which

can map 4-Kbyte pages.

Predecode

Predecoding begins as the L1 instruction cache is filled.

Predecode information is generated and stored alongside the

instruction cache. This information is used to help efficiently

identify the boundaries between variable length x86

instructions, to distinguish DirectPath from VectorPath

early-decode instructions, and to locate the opcode byte in each

instruction. In addition, the predecode logic detects code

branches such as CALLs, RETURNs and short unconditional

JMPs. When a branch is detected, predecoding begins at the

target of the branch.

Branch Prediction

The fetch logic accesses the branch prediction table in parallel

with the instruction cache and uses the information stored in

the branch prediction table to predict the direction of branch

instructions.

The AMD Athlon processor employs combinations of a branch

target address buffer (BTB), a global history bimodal counter

(GHBC) table, and a return address stack (RAS) hardware in

order to predict and accelerate branches. Predicted-taken

branches incur only a single-cycle delay to redirect the

instruction fetcher to the target instruction. In the event of a

mispredict, the minimum penalty is ten cycles.

The BTB is a 2048-entry table that caches in each entry the

predicted target address of a branch.

In addition, the AMD Athlon processor implements a 12-entry

return address stack to predict return addresses from a near or

far call. As CALLs are fetched, the next EIP is pushed onto the