Alpha 21264/EV67 Microprocessor Hardware Reference Manual Order Number: DS–0028B–TE This manual is directly derived from the internal 21264/EV67 Specifications, Revision 1.4. You can access this hardware reference manual in PDF format from the following site: ftp://ftp.compaq.com/pub/products/alphaCPUdocs Revision/Update Information: This is a revised document. It supercedes the Alpha 21264A Microprocessor Hardware Reference Manual (DS–0028A–TE).
September 2000 The information in this publication is subject to change without notice. COMPAQ COMPUTER CORPORATION SHALL NOT BE LIABLE FOR TECHNICAL OR EDITORIAL ERRORS OR OMISSIONS CONTAINED HEREIN, NOR FOR INCIDENTAL OR CONSEQUENTIAL DAMAGES RESULTING FROM THE FURNISHING, PERFORMANCE, OR USE OF THIS MATERIAL.
Table of Contents Preface 1 Introduction 1.1 1.1.1 1.1.2 1.1.3 1.2 2 The Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Integer Data Types. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Floating-Point Data Types . . . . . . . . . . . . . .
2.3.1 2.3.2 2.3.3 2.4 2.4.1 2.5 2.6 2.6.1 2.6.2 2.6.3 2.6.4 2.7 2.7.1 2.7.2 2.7.3 2.8 2.8.1 2.8.2 2.8.3 2.8.4 2.9 2.10 2.11 2.11.1 2.11.1.1 2.11.1.2 2.11.2 2.12 2.12.1 2.12.1.1 2.12.1.2 2.12.1.3 2.13 2.14 2.15 2.15.1 2.15.2 2.16 3 21264/EV67 Microprocessor Logic Symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21264/EV67 Signal Names and Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pin Assignments . . . . . . . . . . . . . . . . . .
4.3.2 4.4 4.5 4.5.1 4.5.2 4.5.3 4.5.4 4.5.5 4.6 4.6.1 4.6.2 4.6.3 4.6.4 4.7 4.7.1 4.7.2 4.7.3 4.7.3.1 4.7.3.2 4.7.4 4.7.5 4.7.6 4.7.7 4.7.7.1 4.7.7.2 4.7.8 4.7.8.1 4.7.8.2 4.7.8.3 4.7.8.4 4.7.8.5 4.7.8.6 4.7.9 4.7.10 4.7.10.1 4.7.10.2 4.8 4.8.1 4.8.2 4.8.2.1 4.8.3 4.8.3.1 4.8.3.2 4.8.3.3 4.8.4 4.8.4.1 4.8.4.2 4.8.4.3 4.8.5 4.8.6 4.9 5 System Duplicate Tag Stores. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Victim Data Buffer . . . . . . . . . . . . . . . . . . .
5.1.3 5.1.4 5.1.5 5.2 5.2.1 5.2.2 5.2.3 5.2.4 5.2.5 5.2.6 5.2.7 5.2.8 5.2.9 5.2.10 5.2.11 5.2.12 5.2.13 5.2.14 5.2.15 5.2.16 5.2.17 5.2.18 5.2.19 5.2.20 5.2.21 5.2.22 5.3 5.3.1 5.3.2 5.3.3 5.3.4 5.3.5 5.3.6 5.3.7 5.3.8 5.3.9 5.3.10 5.3.11 5.4 5.4.1 5.4.2 5.4.3 5.4.4 5.4.
6.5.2 Hardware Structure of Explicitly Written IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.3 Hardware Structure of Implicitly Written IPRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.4 IPR Access Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.5.5 Correct Ordering of Explicit Writers Followed by Implicit Readers. . . . . . . . . . . . . . . . . 6.5.
7.11.2 7.11.2.1 7.11.2.2 7.11.2.3 7.11.2.4 8 Electrical Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DC Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Power Supply Sequencing and Avoiding Potential Failure Mechanisms . . . . . . . . . . . . . . . AC Characteristics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.5.2 SROM Initialization. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.5.2.1 Serial Instruction Cache Load Operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.6 Notes on IEEE 1149.1 Operation and Compliance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Alpha Instruction Set A.1 A.2 A.2.1 A.2.2 A.3 A.4 A.5 A.6 A.7 A.8 B 11–5 11–6 11–7 Alpha Instruction Summary . . . . . . . . . . . . . . . . . . .
D.26 D.27 D.28 D.29 D.30 D.31 D.32 D.33 D.34 D.35 D.36 D.37 D.38 D.39 D.40 D.41 D.42 D.43 D.44 E Restriction 30 : HW_MTPR and HW_MFPR to the Cbox CSR . . . . . . . . . . . . . . . . . . . . . . . D–15 Restriction 31 : I_CTL[VA_48] Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–17 Restriction 32 : PCTR_CTL Update . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D–17 Restriction 33 : HW_LD Physical/Lock Use. . . . . . . . . . .
Figures 2–1 2–2 2–3 2–4 2–5 2–6 2–7 2–8 2–9 2–10 2–11 2–12 2–13 3–1 3–2 3–3 3–4 4–1 4–2 4–3 4–4 4–5 4–6 5–1 5–2 5–3 5–4 5–5 5–6 5–7 5–8 5–9 5–10 5–11 5–12 5–13 5–14 5–15 5–16 5–17 5–18 5–19 5–20 5–21 5–22 5–23 5–24 5–25 5–26 5–27 5–28 5–29 5–30 5–31 5–32 5–33 21264/EV67 Block Diagram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Branch Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5–34 5–35 5–36 5–37 6–1 6–2 6–3 6–4 6–5 6–6 7–1 7–2 7–3 7–4 7–5 10–1 10–2 10–3 11–1 11–2 11–3 11–4 xii Dcache Status Register. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cbox Data Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Cbox Shift Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tables 1–1 2–1 2–2 2–3 2–4 2–5 2–6 2–7 2–8 2–9 2–10 2–11 2–12 2–13 2–14 2–15 2–16 3–1 3–2 3–3 3–4 3–5 3–6 4–1 4–2 4–3 4–4 4–5 4–6 4–7 4–8 4–9 4–10 4–11 4–12 4–13 4–14 4–15 4–16 4–17 4–18 4–19 4–20 4–21 4–22 4–23 4–24 4–25 4–26 4–27 4–28 4–29 4–30 4–31 4–32 4–33 Integer Data Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Pipeline Abort Delay (GCLK Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4–34 4–35 4–36 4–37 4–38 4–39 4–40 4–41 4–42 4–43 4–44 4–45 4–46 4–47 5–1 5–2 5–3 5–4 5–5 5–6 5–7 5–8 5–9 5–10 5–11 5–12 5–13 5–14 5–15 5–16 5–17 5–18 5–19 5–20 5–21 5–22 5–23 5–24 5–25 5–26 6–1 6–2 6–3 6–4 6–5 6–6 6–7 6–8 6–9 6–10 6–11 6–12 6–13 6–14 7–1 7–2 7–3 7–4 7–5 xiv Rules for System Control of Cache Status Update Order . . . . . . . . . . . . . . . . . . . . . . . . . . . Range of Maximum Bcache Clock Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7–6 7–7 7–8 7–9 7–10 7–11 7–12 8–1 8–2 8–3 9–1 9–2 9–3 9–4 9–5 9–6 9–7 9–8 9–9 9–10 9–11 9–12 9–13 10–1 10–2 10–3 10–4 10–5 10–6 10–7 10–8 11–1 11–2 11–3 A–1 A–2 A–3 A–4 A–5 A–6 A–7 A–8 A–9 A–10 A–11 E–1 E–2 E–3 E–4 E–5 Effect on IPRs After Transition Through Sleep Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Signals and Constraints for the Sleep Mode Sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . Effect on IPRs After Warm Reset . . . . . . . . . . . . . . . . . . . .
Preface Audience This manual is for system designers and programmers who use the Alpha 21264/EV67 microprocessor (referred to as the 21264/EV67). Content This manual contains the following chapters and appendixes: Chapter 1, Introduction, introduces the 21264/EV67 and provides an overview of the Alpha architecture. Chapter 2, Internal Architecture, describes the major hardware functions and the internal chip architecture. It describes performance measurement facilities, coding rules, and design examples.
Appendix C, Serial Icache Load Predecode Values, provides a pointer to the Alpha Motherboards Software Developer’s Kit (SDK), which contains this information. Appendix D, PALcode Restrictions and Guidelines, lists restrictions and guidelines that must be adhered to when generating PALcode. Appendix E, 21264/EV67-to-Bcache Pin Interconnections, provides the pin interface between the 21264/EV67 and Bcache SSRAMs. The Glossary lists and defines terms associated with the 21264/EV67.
Terminology and Conventions This section defines the abbreviations, terminology, and other conventions used throughout this document. Abbreviations • Binary Multiples The abbreviations K, M, and G (kilo, mega, and giga) represent binary multiples and have the following values.
Abbreviation Meaning RW Read/Write Bits and fields can be read and written. RW,n Read/Write, and takes the value n at power-on reset. Bits and fields can be read and written. W1C Write One to Clear If read operations are allowed to the register, then the value may be read by software. If it is a write-only register, then a read operation by software returns an UNPREDICTABLE result. Software write operations of a 1 cause the bit to be cleared by hardware.
Data Units The following data unit terminology is used throughout this manual. Term Words Bytes Bits Other Byte ½ 1 8 — Word 1 2 16 — Longword 2 4 32 Dword Quadword 4 8 64 2 longword Do Not Care (X) A capital X represents any valid value. External Unless otherwise stated, external means not contained in the chip. Field Notation The names of single-bit and multiple-bit fields can be used rather than the actual bit numbers (see Bit Notation).
AlphaSignal[n:n] Boldface, mixed-case type denotes signal names that are assigned internal and external to the 21264/EV67 (that is, the signal traverses a chip interface pin). AlphaSignal_x[n:n] When a signal has high and low assertion states, a lowercase italic x represents the assertion states. For example, SignalName_x[3:0] represents SignalName_H[3:0] and SignalName_L[3:0].
X Do not care. A capital X represents any valid value.
1 Introduction This chapter provides a brief introduction to the Alpha architecture, Compaq’s RISC (reduced instruction set computing) architecture designed for high performance. The chapter then summarizes the specific features of the Alpha 21264/EV67 microprocessor (hereafter called the 21264/EV67) that implements the Alpha architecture. Appendix A provides a list of Alpha instructions.
The Architecture direct access to low-level hardware functions. PALcode supports optimizations for multiple operating systems, flexible memory-management implementations, and multiinstruction atomic sequences. The Alpha architecture performs byte shifting and masking with normal 64-bit, register-to-register instructions. The 21264/EV67 performs single-byte and single-word load and store instructions. 1.1.1 Addressing The basic addressable unit in the Alpha architecture is the 8-bit byte.
21264/EV67 Microprocessor Features 1.2 21264/EV67 Microprocessor Features The 21264/EV67 microprocessor is a superscalar pipelined processor. It is packaged in a 587-pin PGA carrier and has removable application-specific heat sinks. A number of configuration options allow its use in a range of system designs ranging from extremely simple uniprocessor systems with minimum component count to high-performance multiprocessor systems with very high cache and memory bandwidth.
21264/EV67 Microprocessor Features • An onchip, duplicate tag array used to maintain level 2 cache coherency. • A 64-bit data bus with onchip parity and error correction code (ECC) support. • Support for an external second-level (Bcache) cache. The size and some timing parameters of the Bcache are programmable. • An internal clock generator providing a high-speed clock used by the 21264/EV67, and two clocks for use by the CPU module.
2 Internal Architecture This chapter provides both an overview of the 21264/EV67 microarchitecture and a system designer’s view of the 21264/EV67 implementation of the Alpha architecture. The combination of the 21264/EV67 microarchitecture and privileged architecture library code (PALcode) defines the chip’s implementation of the Alpha architecture. If a certain piece of hardware seems to be “architecturally incomplete,” the missing functionality is implemented in PALcode.
21264/EV67 Microarchitecture • Floating-point execution unit (Fbox) • Onchip caches (Icache and Dcache) • Memory reference unit (Mbox) • External cache and system interface unit (Cbox) • Pipeline operation sequence 2.1.
21264/EV67 Microarchitecture Figure 2–1 21264/EV67 Block Diagram Instruction Cache Ibox Fetch Unit Virtual Address VPC Queue Four Instructions ITB Next Address Physical Address Predecode Retire Unit Branch Predictor Decode and Rename Registers Integer Issue Queue (20 Entries) FP Issue Queue (15 Entries) 128 Cbox Cache Data Probe Queue Ebox 128 Fbox Address ALU 0 (L0) INT UNIT 0 (U0) Integer Registers 0 (80 Registers) INT UNIT 1 (U1) FP ADD DIV SQRT Address ALU 1 (L1) Integer Regist
21264/EV67 Microarchitecture Figure 2–2 Branch Predictor Local Predictor Global Predictor Choice Predictor Predicted Branch Address FM-05810.AI4 Local Predictor The local predictor uses a 2-level table that holds the history of individual branches. The 2-level table design approaches the prediction accuracy of a larger single-level table while requiring fewer total bits of storage. Figure 2–3 shows how the local predictor generates a prediction.
21264/EV67 Microarchitecture Figure 2–4 Global Predictor Global Path History 12 Index Global Predictor 4K x 2 +/- 2 2 1 Global Branch Prediction FM-05812.AI4 Choice Predictor The choice predictor monitors the history of the local and global predictors and chooses the best of the two predictors for a particular branch. Figure 2–5 shows how the choice predictor generates its choice of the result of the local or global prediction.
21264/EV67 Microarchitecture 2.1.1.4 Instruction Fetch Logic The instruction prefetcher (predecode) reads an octaword, containing up to four naturally aligned instructions per cycle, from the Icache. Branch prediction and line prediction bits accompany the four instructions. The branch prediction scheme operates most efficiently when only one branch instruction is contained among the four fetched instructions.
21264/EV67 Microarchitecture • Integer operate • Integer conditional branch • Unconditional branch – both displacement and memory format • Integer and floating-point load and store • PAL-reserved instructions: HW_MTPR, HW_MFPR, HW_LD, HW_ST, HW_RET • Integer-to-floating-point (ITOFx) and floating-point-to-integer (FTOIx) Each queue entry asserts four request signals—one for each of the Ebox subclusters.
21264/EV67 Microarchitecture The FQ arbiters pick between simultaneous requesters of a pipeline based on the age of the request—older requests are given priority over newer requests. Floating-point store instructions and FTOIx instructions in even-numbered queue entries arbitrate for one store port. Floating-point store instructions and FTOIx instructions in odd-numbered queue entries arbitrate for the second store port.
21264/EV67 Microarchitecture Figure 2–6 Integer Execution Unit—Clusters 0 and 1 iop_wr iop_wr U0 U1 Register Register L0 L1 iop_wr iop_wr Load/Store Data Load/Store Data eff_VA eff_VA FM-05643.AI4 Most instructions have 1-cycle latency for consumers that execute within the same cluster. Also, there is another 1-cycle delay associated with producing a value in one cluster and consuming the value in the other cluster.
21264/EV67 Microarchitecture The Ebox has 80 register-file entries that contain storage for the values of the 31 Alpha integer registers (the value of R31 is not stored), the values of 8 PALshadow registers, and 41 results written by instructions that have not yet been retired. Ignoring cross-cluster delay, the two copies of the Ebox register file contain identical values. Each copy of the Ebox register file contains four read ports and six write ports.
21264/EV67 Microarchitecture The Fbox register file contains six reads ports and four write ports. Four read ports are used to source operands to the add and multiply pipelines, and two read ports are used to source data for store instructions. Two write ports are used to write results generated by the add and multiply pipelines, and two write ports are used to write results from floating-point load instructions. 2.1.
21264/EV67 Microarchitecture • Virtual tag bits [47:15] • 8-bit address space number (ASN) field • 1-bit address space match (ASM) bit • 1-bit PALcode bit to indicate physical addressing • Valid bit • Data and tag parity bits • Four access-check bits for the following modes: kernel, executive, supervisor, and user (KESU) • Additional predecoded information to assist with instruction processing and fetch control 2.1.5.
Pipeline Organization • Miss address file (MAF) • Dstream translation buffer (DTB) 2.1.6.1 Load Queue The load queue (LQ) is a reorder buffer for load instructions. It contains 32 entries and maintains the state associated with load instructions that have been issued to the Mbox, but for which results have not been delivered to the processor and the instructions retired.
Pipeline Organization Figure 2–8 Pipeline Organization 0 Branch Predictor 1 2 3 4 5 6 ALU Shifter Integer Register Rename Map Integer Issue Queue (20) Integer Register File ALU Shifter Multiplier ALU Address 64KB Data Cache Four Instructions Instruction Cache (64KB) (2-Set) FloatingPoint Register Rename Map System Bus (64 Bits) Address ALU FloatingPoint Issue Queue (15) FloatingPoint Register File Floating-Point Add, Divide, and Square Root Floating-Point Multiply Bus Interface Unit
Pipeline Organization In the slot stage, the branch predictor compares the next Icache index that it generates to the index that was generated by the line predictor. If there is a mismatch, the branch predictor wins—the instructions fetched during that cycle are aborted, and the index predicted by the branch predictor is applied to the Icache during the next cycle. Line mispredictions result in one pipeline bubble.
Instruction Issue Rules Stage 4 — Register Read Instructions issued from the issue queues read their operands from the integer and floating-point register files and receive bypass data. Stage 5 — Execute The Ebox and Fbox pipelines begin execution. Stage 6 — Dcache Access Memory reference instructions access the Dcache and data translation buffers. Normally load instructions access the tag and data arrays while store instructions only access the tag arrays.
Instruction Issue Rules 2.3.1 Instruction Group Definitions Table 2–2 lists the instruction class, the pipeline assignments, and the instructions included in the class.
Instruction Issue Rules Table 2–2 Instruction Name, Pipeline, and Types (Continued) Class Name Pipeline ftoi FST0, FST1, L0, L1 FTOIS, FTOIT itof L0, L1 ITOFS, ITOFF, ITOFT mx_fpcr FM Instructions that move data from the floating-point control register Instruction Type 2.3.
Instruction Issue Rules Table 2–3 Instruction Group Definitions and Pipeline Unit (Continued) Instruction Class 3210 Slotting 3210 Instruction Class 3210 Slotting 3210 ELUU LLUU UELE ULLU EUEE LULU UELL UULL EUEL LUUL UELU ULLU EUEU LULU UEUE ULUL EULE LULU UEUL ULUL EULL UULL UEUU ULUU EULU LULU ULEE ULUL EUUE LUUL ULEL ULUL EUUL LUUL ULEU ULLU EUUU LUUU ULLE ULLU LEEE LULU ULLL U L LL LEEL LUUL ULLU ULLU LEEU LULU ULUE ULUL LELE LULU ULUL ULUL L
Instruction Issue Rules 2.3.3 Instruction Latencies After an instruction is placed in the IQ or FQ, its issue point is determined by the availability of its register operands, functional unit(s), and relationship to other instructions in the queue. There are register producer-consumer dependencies and dynamic functional unit availability dependencies that affect instruction issue. The mapper removes register producer-producer dependencies. The latency to produce a register result is generally fixed.
Instruction Retire Rules Table 2–4 Instruction Class Latency in Cycles (Continued) Class Latency Comments fmul 4 6 Consumer other than fst or ftoi. Consumer fst or ftoi. Measured from when an fmul is issued from the FQ to when an fst or ftoi is issued from the IQ. fcmov1 4 Only consumer is fcmov2. fcmov2 4 6 Consumer other than fst. Consumer fst or ftoi. Measured from when an fcmov2 is issued from the FQ to when an fst or ftoi is issued from the IQ.
Retire of Operate Instructions into R31/F31 Table 2–5 Minimum Retire Latencies for Instruction Classes (Continued) Instruction Class Retire Stage Comments Floating-point DIV/SQRT 11 + latency Add latency of unit reuse for the instruction indicated in Table 2–4. For example, latency for a single-precision fdiv would be 11 plus 9 from Table 2–4. Latency is 11 if hardware detects that no exception is possible (see Section 2.4.1).
Load Instructions to R31 and F31 Table 2–6 Instructions Retired Without Execution Instruction Type Notes INTA, INTL, INTM, INTS All with R31 as destination. FLTI, FLTL, FLTV All with F31 as destination. MT_FPCR is not included because it has no destination—it is never removed from the pipeline. LDQ_U All with R31 as destination. MISC TRAPB and EXCB are always removed. Others are never removed. FLTS All (SQRT, ITOF) with F31 as destination. 2.
Special Cases of Alpha Instruction Execution 2.6.3 Prefetch, Evict Next: LDQ and HW_LDQ Instructions The 21264/EV67 processes this instruction like a normal prefetch transaction (ReadBlkSpec command), with one exception—if the load misses the Dcache, the addressed cache block is allocated into the Dcache, but the Dcache set allocation pointer is left pointing to this block. The next miss to the same Dcache line will evict the block.
Special Cases of Alpha Instruction Execution Figure 2–9 Pipeline Timing for Integer Load Instructions Hit Cycle Number 1 2 3 4 5 ILD Q R E D B Q R Instruction 1 Instruction 2 6 7 8 Q FM-05814.AI4 There are two cycles in which the IQ may speculatively issue instructions that use load data before Dcache hit information is known.
Special Cases of Alpha Instruction Execution Figure 2–10 Pipeline Timing for Floating-Point Load Instructions Hit Cycle Number 1 2 3 4 5 FLD Q R E D B Instruction 1 Q Instruction 2 6 7 8 R Q FM-05815.AI4 The speculative window for floating-point load instructions is one cycle wide. FQ-issued instructions that are issued within the speculative window of a floating-point load instruction that has missed, are only aborted if they depend on the load being successful.
Memory and I/O Address Space Instructions The first instruction, CMOV1, tests the value of Ra and records the result of this test in a 65th bit of its destination register, newRc1. It also copies the value of the old physical destination register, oldRc, to newRc1. The second instruction, CMOV2, then copies either the value in newRc1 or the value in Rb into a second physical destination register, newRc2, based on the CMOV predicate bit stored in newRc1.
Memory and I/O Address Space Instructions If the requested physical location is found in the Dcache (a hit), the data is formatted and written into the appropriate integer or floating-point register. If the location is not in the Dcache (a miss), the physical address is placed in the miss address file (MAF) for processing by the Cbox. The MAF performs a merging function in which a new miss address is compared to miss addresses already held in the MAF.
Memory and I/O Address Space Instructions 2.8.3 Memory Address Space Store Instructions The Mbox begins execution of a store instruction by translating its virtual address to a physical address using the DTB and by probing the Dcache. The Mbox puts information about the store instruction, including its physical address, its data and the results of the Dcache probe, into the store queue (SQ).
MAF Memory Address Space Merging Rules • Byte/word store instructions and different size store instructions are not allowed to merge. • A stream of ascending non-overlapping, but not necessarily consecutive, longword store instructions are allowed to merge into naturally aligned 32-byte blocks. • A stream of ascending non-overlapping, but not necessarily consecutive, quadword store instructions are allowed to merge into naturally aligned 64-byte blocks.
Replay Traps The 21264/EV67 maintains the default memory data instruction ordering as shown in Table 2–10 (assume address X and address Y are different).
I/O Write Buffer and the WMB Instruction 2.11.1.1 Load-Load Order Trap The Mbox ensures that load instructions that read the same physical byte(s) ultimately issue in correct order by using the load-load order trap. The Mbox compares the address of each load instruction, as it is issued, to the address of all load instructions in the load queue. If the Mbox finds a newer load instruction in the load queue, it invokes a load-load order trap on the newer instruction.
I/O Write Buffer and the WMB Instruction • RdBlkSpec (valid), RdBlkModSpec (valid), RdBlkSpecI (valid) • RdBlkVic, RdBlkModVic, RdBlkVicI • CleanToDirty, SharedToDirty, STChangeToDirty, InvalToDirty • FetchBlk, FetchBlkSpec (valid), Evict • RdByte, RdLw, RdQw, WrByte, WrLW, WrQW The counter is decremented with the C (commit) bit in the Probe and SysDc commands (see Section 4.7.7).
I/O Write Buffer and the WMB Instruction Because the MB instruction is executed speculatively, MB processing can begin and the original MB can be killed. In the internal acknowledge case, the MB may have already been sent to the system interface, and the system is still expected to respond to the MB. 2.12.1.2 WMB Instruction Processing Write memory barrier (WMB) instructions are issued into the Mbox store-queue, where they wait until they are retired and all prior store instructions become writable.
I/O Write Buffer and the WMB Instruction Also consider the related sequence shown in Table 2–13. In this case, the data could be cached in the Bcache; Pj should fetch datai if it is using PTEi.
Performance Measurement Support—Performance Counters 2.13 Performance Measurement Support—Performance Counters The 21264/EV67 provides hardware support for two methods of obtaining program performance feedback information. The two methods do not require program modification. The first method offers similar capabilities to earlier microprocessor performance counters.
Floating-Point Control Register Table 2–14 Floating-Point Control Register Fields (Continued) Name Extent Type Description UNFD [61] RW Underflow Disable. The 21264/EV67 hardware cannot generate IEEE compliant denormal results. UNFD is used in conjunction with UNDZ as follows: UNFD UNDZ Result 0 X Underflow trap. 1 0 Trap to supply a possible denormal result. 1 1 Underflow trap suppressed. Destination is written with a true zero (+0.0). UNDZ [60] RW Underflow to zero.
AMASK and IMPLVER Instruction Values Table 2–14 Floating-Point Control Register Fields (Continued) Name Extent Type Description DNZ [48] RW Denormal operands to zero. If this bit is set, treat all Denormal operands as a signed zero value with the same sign as the Denormal operand. Reserved [47:0]1 — — 1 Alpha architecture FPCR bit 47 (DNOD) is not implemented by the 21264/EV67. 2.
Design Examples 2.16 Design Examples The 21264/EV67 can be designed into many different uniprocessor and multiprocessor system configurations. Figures 2–12 and 2–13 illustrate two possible configurations. These configurations employ additional system/memory controller chipsets. Figure 2–12 shows a typical uniprocessor system with a second-level cache. This system configuration could be used in standalone or networked workstations.
Design Examples Figure 2–13 Typical Multiprocessor Configuration 21264 21272 Core Logic Chipset L2 Cache DRAM Arrays Address Control Chip 21264 Data Slice Chips Data DRAM Arrays L2 Cache Address Host PCI Bridge Chip Host PCI Bridge Chip Data 64-bit PCI Bus 64-bit PCI Bus FM-05574-EV67 2–40 Internal Architecture Alpha 21264/EV67 Hardware Reference Manual
3 Hardware Interface This chapter contains the 21264/EV67 microprocessor logic symbol and provides information about signal names, their function, and their location. This chapter also describes the mechanical specifications of the 21264/EV67.
21264/EV67 Microprocessor Logic Symbol Figure 3–1 21264/EV67 Microprocessor Logic Symbol 21264 System Interface Bcache Interface SysAddIn_L[14:0] BcAdd_H[23:4] SysAddInClk_L BcData_H[127:0] BcCheck_H[15:0] SysAddOut_L[14:0] BcDataInClk_H[7:0] SysAddOutClk_L BcDataOutClk_[3:0] x SysVref SysData_L[63:0] BcDataOE_L SysCheck_L[7:0] BcDataWr_L BcTag_H[42:20] SysDataInClk_H[7:0] BcTagInClk_H SysDataOutClk_L[7:0] BcTagOutClk_x SysDataInValid_L SysDataOutValid_L BcVref BcTagDirty_H SysFillValid
21264/EV67 Signal Names and Functions 3.2 21264/EV67 Signal Names and Functions Table 3–1 defines the 21264/EV67 signal types referred to in this section.
21264/EV67 Signal Names and Functions Table 3–2 21264/EV67 Signal Descriptions (Continued) Signal Count Description BcDataOutClk_H[3:0] O_PP BcDataOutClk_L[3:0] 8 Bcache data output clocks. These free-running clocks are differential copies of the Bcache clock and are derived from the 21264/EV67 GCLK. Their period is a multiple of the GCLK and is fixed for all operations. They can be configured so that their rising edge lags BcAdd_H[23:4] by 0 to 2 GCLK cycles.
21264/EV67 Signal Names and Functions Table 3–2 21264/EV67 Signal Descriptions (Continued) Signal Type Count Description FrameClk_H FrameClk_L I_DA_CLK 2 A skew-controlled differential 50% duty cycle copy of the system clock. It is used by the 21264/EV67 as a reference, or framing, clock. IRQ_H[5:0] I_DA 6 These six interrupt signal lines may be asserted by the system. The response of the 21264/EV67 is determined by the system software.
21264/EV67 Signal Names and Functions Table 3–2 21264/EV67 Signal Descriptions (Continued) Signal Type Count Description SysVref I_DC_REF 1 System interface reference voltage. Tck_H I_DA 1 IEEE 1149.1 test clock. Tdi_H I_DA 1 IEEE 1149.1 test data-in signal. Tdo_H O_OD_TP 1 IEEE 1149.1 test data-out signal. TestStat_H O_OD_TP 1 Test status pin. System reset drives the test status pin low. The TestStat_H pin is forced high at the start of the Icache BiST.
21264/EV67 Signal Names and Functions Table 3–3 21264/EV67 Signal Descriptions by Function (Continued) Signal Type Count Description BcVref I_DC_REF 1 Tag data input reference voltage. SysAddIn_L[14:0] I_DA 15 Time-multiplexed SysAddIn, system-to-21264/EV67. SysAddInClk_L I_DA 1 Single-ended forwarded clock from system for SysAddIn_L[14:0] and SysFillValid_L. SysAddOut_L[14:0] O_OD 15 Time-multiplexed SysAddOut, 21264/EV67-to-system.
Pin Assignments Table 3–3 21264/EV67 Signal Descriptions by Function (Continued) Signal Type Count Description Reset_L I_DA 1 System reset. This signal protects the 21264/EV67 from damage during initial power-up. It must be asserted until DCOK_H is asserted. After that, it is deasserted and the 21264/EV67 begins its reset sequence. SromClk_H O_OD_TP 1 Serial ROM clock. SromData_H I_DA 1 Serial ROM data. SromOE_L O_OD_TP 1 Serial ROM enable. Tck_H I_DA 1 IEEE 1149.1 test clock.
Pin Assignments Table 3–4 Pin List Sorted by Signal Name (Continued) Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location BcData_H_106 L45 BcData_H_107 N45 BcData_H_108 T44 BcData_H_109 U45 BcData_H_11 M2 BcData_H_110 W45 BcData_H_111 AA43 BcData_H_112 AC43 BcData_H_113 AD44 BcData_H_114 AE41 BcData_H_115 AG45 BcData_H_116 AK44 BcData_H_117 AL43 BcData_H_118 AM42 BcData_H_119 AR45 BcData_H_12 T2 BcData_H_120 AP40 BcData_H_121 BA45 BcData_H_122
Pin Assignments Table 3–4 Pin List Sorted by Signal Name (Continued) Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location BcData_H_9 K2 BcData_H_90 BA3 BcData_H_91 BC3 BcData_H_92 BD6 BcData_H_93 BA9 BcData_H_94 BC9 BcData_H_95 AY12 BcData_H_96 A39 BcData_H_97 D36 BcData_H_98 A41 BcData_H_99 B42 BcDataInClk_H_0 E7 BcDataInClk_H_1 R3 BcDataInClk_H_2 AH2 BcDataInClk_H_3 BC5 BcDataInClk_H_4 F38 BcDataInClk_H_5 U39 BcDataInClk_H_6 AH44 BcDataInClk_H
Pin Assignments Table 3–4 Pin List Sorted by Signal Name (Continued) Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location SysAddIn_L_5 BA27 SysAddIn_L_6 BD28 SysAddIn_L_7 BE27 SysAddIn_L_8 AY26 SysAddIn_L_9 BC25 SysAddInClk_L BB26 SysAddOut_L_0 AW33 SysAddOut_L_1 BE39 SysAddOut_L_10 BE33 SysAddOut_L_11 AW29 SysAddOut_L_12 BC31 SysAddOut_L_13 AV28 SysAddOut_L_14 BB30 SysAddOut_L_2 BD36 SysAddOut_L_3 BC35 SysAddOut_L_4 BA33 SysAddOut_L_5 AY32 SysAdd
Pin Assignments Table 3–4 Pin List Sorted by Signal Name (Continued) Signal Name PGA Location Signal Name PGA Location Signal Name PGA Location SysDataOutClk_L_5 R41 SysDataOutClk_L_6 AH40 SysDataOutClk_L_7 AW39 SysDataOutValid_L BB22 SysFillValid_L BC23 SysVref BA25 Tck_H BE19 Tdi_H BA21 Tdo_H BB20 TestStat_H BA19 Tms_H BD18 Trst_L AY20 Table 3–5 Pin List Sorted by PGA Location PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name A11 BcTag_H_22 A13 BcTa
Pin Assignments Table 3–5 Pin List Sorted by PGA Location (Continued) PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name AR1 BcData_H_22 AR3 Spare AR39 SysData_L_58 AR43 BcDataOutClk_H_3 AR45 BcData_H_119 AR7 SysData_L_25 AT2 BcCheck_H_2 AT38 SysData_L_59 AT4 Spare AT42 BcDataOutClk_L_3 AT44 BcCheck_H_14 AT8 SysData_L_26 AU3 BcDataOutClk_H_1 AU41 BcData_H_57 AU43 BcCheck_H_6 AU5 BcData_H_88 AV10 SysData_L_28 AV12 SysData_L_30 AV16 FrameClk_H
Pin Assignments Table 3–5 Pin List Sorted by PGA Location (Continued) PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name BC25 SysAddIn_L_9 BC29 SysAddIn_L_1 BC3 BcData_H_91 BC31 SysAddOut_L_12 BC35 SysAddOut_L_3 BC37 BcCheck_H_7 BC41 BcData_H_125 BC43 BcData_H_60 BC5 BcDataInClk_H_3 BC9 BcData_H_94 BD10 BcCheck_H_11 BD12 PllBypass_H BD16 Reset_L BD18 Tms_H BD2 NoConnect BD22 SysDataInValid_L BD24 SysAddIn_L_12 BD28 SysAddIn_L_6 BD30 SysAddIn_L
Pin Assignments Table 3–5 Pin List Sorted by PGA Location (Continued) PGA Location Signal Name PGA Location Signal Name PGA Location Signal Name G39 SysData_L_37 G41 BcData_H_38 G45 BcData_H_104 G5 BcData_H_70 G7 SysData_L_5 H10 SysData_L_4 H12 SysData_L_3 H16 BcTag_H_21 H18 BcTag_H_29 H22 BcTag_H_42 H24 BcTagOE_L H28 BcAdd_H_13 H30 BcAdd_H_21 H34 SysData_L_34 H36 SysDataOutClk_L_4 H4 BcData_H_72 H40 BcData_H_102 H42 BcData_H_103 H6 BcData_H_6 J3 BcData_H_8 J41 S
Pin Assignments Table 3–6 lists the 21264/EV67 ground and power (VSS and VDD, respectively) pin list.
Mechanical Specifications 3.4 Mechanical Specifications This section shows the 21264/EV67 mechanical package dimensions without a heat sink. For heat sink information and dimensions, refer to Chapter 10. Figure 3–2 shows the package physical dimensions without a heat sink. Figure 3–2 Package Dimensions 1.27 mm (.050 in) Typ 4.32 mm (.170 in) Typ 2.54 mm (.100 in) Typ B BE BC BA AW AU AR AN AL AJ AG AE AC AA W U R N L J G E C A 1.377 mm (.055 in) Typ Standoff (4x) 587x 1.40 mm (.
21264/EV67 Packaging 3.5 21264/EV67 Packaging Figure 3–3 shows the 21264/EV67 pinout from the top view with pins facing down.
21264/EV67 Packaging Figure 3–4 shows the 21264/EV67 pinout from the bottom view with pins facing up.
4 Cache and External Interfaces This chapter describes the 21264/EV67 cache and external interface, which includes the second-level cache (Bcache) interface and the system interface. It also describes locks, interrupt signals, and ECC/parity generation.
Introduction to the External Interfaces • • The Bcache interface includes a 128-bit bidirectional data bus, a 20-bit unidirectional address bus, and several control signals. – The BcDataOutClk_x[3:0] clocks are free-running and are derived from the internal GCLK. The period of BcDataOutClk_x[3:0] is a programmable multiple of GCLK. – The Bcache turns the BcDataOutClk_x[3:0] clocks around and returns them to the 21264/EV67 as BcDataInClk_H[7:0]. Likewise, BcTagOutClk_x returns as BcTagInClk_H.
Introduction to the External Interfaces Figure 4–1 21264/EV67 System and Bcache Interfaces SysAddIn_L[14:0] SysAddInClk_L SysAddOut_L[14:0] SysAddOutClk_L SysVref SysData_L[63:0] SysCheck_L[7:0] SysDataInClk_H[7:0] SysDataOutClk_L[7:0] SysDataInValid_L SysDataOutValid_L SysFillValid_L BcAdd_H[23:4] 21264 [23:4] [23:6] Data [23:6] Tag Status System BcLoad_L BcData_H[127:0] BcCheck_H[15:0] BcDataInClk_H[7:0] BcDataOutClk_ x [3:0] BcDataOE_L BcDataWr_L BcTag_H[42:20] BcTagInClk_H BcTagOutClk_ x BcVref
Physical Address Considerations 4.1.1.1 Commands and Addresses The system sends probe and data movement commands to the 21264/EV67. The 21264/ EV67 can hold up to eight probe commands from the system. The system controls the number of outstanding probe commands and must ensure that the 21264/EV67 8-entry probe queue does not overflow. The Cbox contains an 8-entry miss buffer (MAF) and an 8-entry victim buffer (VAF). A miss occurs when the 21264/EV67 probes the Bcache but does not find the addressed block.
Physical Address Considerations Prefetches (LDL, LDF, LDG, LDT, LDBU, LDWU) to R31 use the LDx flow, and prefetch with modify intent (LDS) uses the STx flow. If the prefetch target is addressed to I/O space, the upper address bit is cleared, converting the address to memory space (PA[42:6] ). Notes follow the table. Table 4–1 Translation of Internal References to External Interface Reference Instruction DcHit DcW BcHit BcW Status and Action LDx Memory 1 X X X Dcache hit, done.
Physical Address Considerations Table 4–1 notes: 1. Set Dirty Flow: Based on the Cbox CSR SET_DIRTY_ENABLE[2:0], SetDirty requests can be either internally acknowledged (called a SetModify) or sent to the system environment for processing. When externally acknowledged, the shared status information for the cache block is also broadcast. The commands sent externally are SharedToDirty or CleanToDirty.
Bcache Structure 4.3 Bcache Structure The 21264/EV67 Cbox provides control signals and an interface for a second-level cache (Bcache). The 21264/EV67 supports a Bcache from 1MB to 16MB, with 64-byte blocks. A 128bit bidirectional data bus is used for transfers between the 21264/EV67 and the Bcache. The Bcache is fully synchronous and the synchronous static RAMs (SSRAMs) must contain either one, two, or three internal registers.
Victim Data Buffer • Issuing probes and SysDc fill commands to the 21264/EV67 out-of-order with respect to their order at the system serialization point • Filtering out all probe misses from the 21264/EV67 cache system If a probe misses in the 21264/EV67 cache system (Bcache miss and VAF miss), the 21264/EV67 stalls probe processing with the expectation that a SysDc fill will allocate this block. Because of this, in duplicate tag mode, the 21264/EV67 can never generate a probe miss response.
Cache Coherency Figure 4–3 Cache Subset Hierarchy System Main Memory Bcache Dcache Icache FM-05824.AI4 The following tasks must be performed to maintain cache coherency: • Istream data from memory spaces may be cached in the Icache and Bcache. Icache coherence is not maintained by hardware—it must be maintained by software using the CALL_PAL IMB instruction. • The 21264/EV67 maintains the Dcache as a subset of the Bcache.
Cache Coherency Table 4–2 21264/EV67-Supported Cache Block States (Sheet 2 of 2) State Name Description Clean/Shared This 21264/EV67 holds a read-only copy of the block, and at least one other agent in the system may hold a copy of the block. Upon eviction, the block is not written to memory. Dirty This 21264/EV67 holds a read-write copy of the block, and must write it to memory after it is evicted from the cache. No other agent in the system holds a copy of the block.
Cache Coherency 4.5.4 Using SysDc Commands Note the following: • The conventional response for RdBlk commands is SysDc ReadData or ReadDataShared. • The conventional response for a RdBlkMod command is SysDc ReadDataDirty. • The conventional response for ChangeToDirty commands is ChangeToDirtySuccess or ChangeToDirtyFail. However, the system environment is not limited to these responses. Table 4–5 shows all 21264/EV67 commands, system responses, and the 21264/EV67 reaction.
Cache Coherency Table 4–5 System Responses to 21264/EV67 Commands and 21264/EV67 Reactions (Continued) 21264/EV67 CMD SysDc 21264/EV67 Action RdBlkModx ReadData ReadDataShared ReadDataShared/Dirty The cache block is filled and marked with a nonwritable status. If the store instruction that generated the RdBlkModx command is still active (not killed), the 21264/EV67 will retry the instruction, generating the appropriate ChangeToDirty command.
Cache Coherency Table 4–5 System Responses to 21264/EV67 Commands and 21264/EV67 Reactions (Continued) 21264/EV67 CMD SysDc 21264/EV67 Action InvalToDirty ChangeToDirtyFail Illegal. InvalToDirty instructions must provide a cache block. Fetchx Rdiox ReadData ReadDataShared ReadDataShared/Dirty ReadDataDirty The 21264/EV67 delivers the data block, independent of its status, to waiting load instructions and does not cache the block in the 21264/EV67 cache system.
Lock Mechanism 1. When the Mbox requests a Dcache fill, the Cbox uses the CTAG array entry to find if the Dcache already contains the requested physical address in another virtuallyindexed Dcache line. If it does, the Cbox invalidates that cache line after first writing the data back to the Bcache if it was in the modified state. The Cbox also checks to see if the Dcache contains an address different from the requested address, but maps to the same Bcache line.
Lock Mechanism 4.6.1 In-Order Processing of LDx_L/STx_C Instructions The 21264/EV67 uses the stWait logic in the IQ to ensure that LDx_L/STx_C pairs are issued in order. The stWait logic treats an Ldx_L instruction like Stx instructions. STx_C instructions are always loaded into the IQ with their associate stWait bit set. Thus, a STx_C instruction is not issued until the older LDx_L is out of the IQ. 4.6.
System Port If the ChangeToDirty command succeeds, the STx_C enters the writable state, and the Mbox locks the Dcache line. The Mbox does not release the Dcache line until the STx_C data is transferred to the Dcache. This ensures that no other agent, by way of a probe, can take the block before the STx_C can update the locked block. 4.6.
System Port Figure 4–4 System Interface Signals SysAddIn_L[14:0] 21264 SysAddInClk_L SysAddOut_L[14:0] SysAddOutClk_L SysVref SysData_L[63:0] SysCheck_L[7:0] SysDataInClk_H[7:0] SysDataOutClk_L[7:0] SysDataInValid_L SysDataOutValid_L SysFillValid_L IRQ_H[5:0] FM-05652-EV67 4.7.1 System Port Pins Table 3–1 defines the 21264/EV67 signal types referred to in this section. Table 4–6 lists the system port pin groups along with their type, number, and functional description.
System Port 4.7.2 Programming the System Interface Clocks The system forwarded clocks are free running and derived from the 21264/EV67 GCLK. The period of the system forwarded clocks is controlled by three Cbox CSRs, based on the bit-rate ratio (similar to the Bcache bit-rate ratio) except that all transfers are dual-data. • SYS_CLK_LD_VECTOR[15:0] • SYS_BPHASE_LD_VECTOR[3:0] • SYS_FDBK_EN[7:0] Table 4–7 lists the programming values used to program the system interface clocks.
System Port Table 4–8 Program Values for Data-Sample/Drive CSRs (Continued) CBOX CSR Description SYS_DDM_RD_RISE_EN[0] Enables the sampling of incoming data on the rising edge of the incoming forwarded clock. (Always asserted) SYS_DDMF_ENABLE Enables the falling edge of the system forwarded clock. (Always asserted) SYS_DDMR_ENABLE Enables the rising edge of the system forwarded clock.
System Port Table 4–10 Bank Interleave on Cache Block Boundary Mode of Operation (Continued) SysAddOut_L[14:2] Cycle 2 PA[27:22], PA[12:6] Cycle 3 M2 Cycle 4 RV Mask[7:0] CH ID[2:0] PA[21:13], PA[5:3] SysAddOut_L[1] SysAddOut_L[0] PA[35] PA[37] PA[40] PA[42] PA[39] PA[41] 4.7.3.2 Page Hit Mode Table 4–11 shows the command format for page hit mode (21264/EV67-to-system).
System Port System designers can minimize pin count for systems with a small memory by configuring both the bank interleave on cache block boundary mode and the page hit mode formats into a short bus format. The pin SysAddOut_L[1] and/or SysAddOut_L[0] are not used (selected by Cbox CSR SYS_BUS_SIZE[1:0]). Table 4–13 lists the values for SYSBUS_FORMAT and SYS_BUS_SIZE[1:0] and shows the maximum physical memory size.
System Port Table 4–14 21264/EV67-to-System Commands Descriptions (Continued) Command Command [4:0] Function ReadBlkMod 10001 Memory read with modify intent. ReadBlkI 10010 Memory read for Istream. FetchBlk 10011 Noncached memory read. ReadBlkSpec2 10100 Speculative memory read (optional). ReadBlkModSpec2 10101 Speculative memory read with modify intent (optional). ReadBlkSpecI2 10110 Memory read for Istream (optional). 2 10111 Speculative memory noncached ReadBlk (optional).
System Port Table 4–14 footnotes: 1. Systems can optionally enable MB instructions to the external system by asserting Cbox CSR SYSBUS_MB_ENABLE. This mode is described in Section 2.12.1. 2. To minimize load-to-use memory latency, systems can optionally enable speculative transactions to memory space by asserting the Cbox CSR SPEC_READ_ENABLE[0]. If the Cbox system command queue is empty, a bypass between the Bcache interface and the system interface is enabled (in combination with this mode).
System Port Table 4–16 Programming SET_DIRTY_ENABLE[2:0] SET_DIRTY_ENABLE [2,0] (DS,CS,C) Cbox Action 000 Everything acknowledged internally (uniprocessor). 001 Only clean blocks generate external acknowledge (CleanToDirty commands only). 010 Only clean/shared blocks generate external acknowledge (SharedToDirty command only). 011 Clean and clean/shared blocks generate external acknowledge. 100 Only dirty/shared blocks generate external acknowledge (SharedToDirty commands only).
System Port Table 4–18 describes the ProbeResponse command fields. Table 4–18 ProbeResponse Fields Descriptions ProbeResponse Field Description Command[4:0] The value 00001 identifies the command as a ProbeResponse. DM Indicates that data movement should occur (copy of probe valid bit). See Section 4.4. VS Write victim sent bit. VDB[2:0] ID number of the VDB entry containing the requested cache block. This field is valid when either the DM bit or the VS bit equals 1. MS MAF address sent.
System Port • There is no mechanism for the system to reject a 21264/EV67-to-system command. ProbeResponse, VDBFlushReq, NOP, NZNOP, and RdBlkxSpec (with a clear RV bit) commands do not require a response from the system. Systems must provide adequate resources for responses to all probes sent to the 21264/EV67.
System Port Table 4–20 describes the system-to-21264/EV67 probe commands fields descriptions. Table 4–20 System-to-21264/EV67 Probe Commands Fields Descriptions SysAddIn_L[14:0] Field Description Probe[4:0] Probe type and next tag state (see Tables 4–21 and 4–22). SysDc[4:0] Controls data movement in and out of the 21264/EV67. See Table 4–24 for a list of data movement types. RVB Clears the victim or I/O write buffer (IOWB) valid bit specified in ID[3:0].
System Port Table 4–22 Next Cache Block State Selection by Probe[2:0] (Continued) Probe[2:0] Next Tag State 101 Invalid 110 Transition12: Clean ⇒ Clean/Shared Dirty ⇒ Dirty/Shared 111 Reserved 1 2 Transition3 is useful in nonduplicate tag systems that want to give writable status to the reader and do not know if the block is clean or dirty. Transition1 is useful in nonduplicate tag systems that do not update memory on ReadBlk hits to a dirty block in another processor.
System Port Table 4–24 describes the SysDc[4:0] field. Table 4–24 SysDc[4:0] Field Description SysDc[4:0] Command SysDc[4:0] Description NOP 00000 NOP, SysData is ignored by the 21264/EV67. ReadDataError 00001 Data is returned for read commands. The system drives the SysData bus, I/O, or memory NXM. ChangeToDirtySuccess 00100 No data. SysData is ignored by the 21264/EV67. This command is also used for the InvalToDirty response. ChangeToDirtyFail 00101 No data.
System Port The ChangeToDirtySuccess and ChangeToDirtyFail commands cannot be issued in the shadow of SysDc cache fill commands (ReadDataError, ReadData, ReadDataDirty, ReadDataShared, and ReadDataShared/Dirty). Each cache fill command allocates eight cycles on the SysData bus. Systems are required to ensure that any future SysDc commands do not cause conflicts with those eight SysData bus cycles.
System Port If both the sender and the receiver are sampling at the same rate, these three principles are sufficient to safely make point-to-point transfers using clock forwarding. However, it is often desirable for systems to align clock-forwarded transactions on a slower SYSCLK that is the basis of all non-processor system transactions. The 21264/EV67 supports three ratios for SYSCLK to INT_FWD_CLK: one-to-one (1-1), two-to-one (2-1), and four-to-one (4-1).
System Port The command precedes data by at least one SYSCLK period. Table 4–25 shows the number of SYSCLK cycles between SysAddOut and SysData for all system clock ratios (clock forwarded bit times) and system framing clock multiples. Table 4–25 SYSCLK Cycles Between SysAddOut and SysData GCLK/INT_FWD_CLK (Data Rate Ratio) System framing clock ratio 1.5X 2.0X 2.5X 3.0X 3.5X 4.0X 5.0X 6.0X 7.0X 8.
System Port Table 4–26 shows four example configurations and shows their use of the SYSDC_DELAY[4:0]. Table 4–26 Cbox CSR SYSDC_DELAY[4:0] Examples System Bit Rate System Framing Clock Ratio1 SYSDC_DELAY System 1 1.5X 4:1 5 (3 SYSCLK cycles) System 2 2.0X 2:1 2 (3 SYSCLK cycles) System 3 2.5X 2:1 0 (2 SYSCLK cycles) System 4 4X 2:1 6 (2 SYSCLK cycles) 1 The system framing clock ratio is the number of INT_FWD_CLK cycles per SYSCLK cycles.
System Port Table 4–27 lists information for the four timing examples. In Table 4–27, note the following: • SysDc write commands are not affected by the SYSDC_DELAY parameter. • The SYS_RCV_MUX_PRESET adds delay at the rate of one INT_FWD_CLK at a time. For example, adding the delay of one bit time to system 1 adds 1.5 GCLK cycles to the delay and drives the SysDc write command-to-data relationship from one to two SYSCLKs.
System Port 1. The SysDataInValid_L signal must be asserted for both cycles of a SysDc fill command, and two quadwords of data must be delivered to the 21264/EV67 in succeeding bit-clock cycles with the appropriate timing in reference to the SysDc fill command (SYSDC_DELAY + 10 CPU cycles). 2. Any number of bubble cycles can be introduced within the fill by deasserting SysDataInValid_L between octaword transfers. 3.
System Port Figure 4–6 SysFillValid_L Timing SysAddIn_L[14:0] SysDc Transport Delay on Address T3 Command Receiver SysFillValid_L SysData_L[63:0] D0 D1 D2 D3 D4 FM-05823B.FH8 4.7.8.6 Data Wrapping All data movement between the 21264/EV67 and the system is composed of 64 bytes in eight cycles on the data bus. All 64 bytes of memory data are valid. This applies to memory read transactions, memory write transactions, and system probe read transactions. The wrap order is interleaved.
System Port point is the QW pointed to by the 21264/EV67; however, some systems may find it more beneficial to begin the transfer elsewhere. The system must always indicate the starting point to the 21264/EV67. The wrap order for subsequent QWs is interleaved. Table 4–29 defines the method for systems to specify wrap and deliver data.
System Port Table 4–30 Wrap Interleave Order (Continued) PA Bits [5:3] of Transferred QW Sixth quadword 101 111 001 011 Seventh quadword 110 100 010 000 Eighth quadword 111 101 011 001 Table 4–31 defines the wrap order for double-pumped data transfers.
System Port Table 4–32 shows each 21264/EV67 command, with NXM addresses, and the appropriate system response. Table 4–32 21264/EV67 Commands with NXM Addresses and System Response 21264/EV67 Command NXM Address System/21264/EV67 Response ProbeResponse Probe responses for addresses to NXM space are of UNPREDICTABLE status. Although the final status of a ReadDataError is Invalid, the 21264/EV67 fills the block Valid/Clean and uses an atomic Evict command to invalidate the block.
System Port Table 4–32 21264/EV67 Commands with NXM Addresses and System Response (Continued) 21264/EV67 Command NXM Address System/21264/EV67 Response CleanToDirty ChangeToDirty commands to NXM space are impossible in the 21264/EV67 because all SharedToDirty NXM references to memory space are atomically filled with an Invalid cache status. STCChangeToDirty InvalToDirty InvalToDirtyVic InvalToDirty commands are not speculative, so InvalToDirty commands to NXM space indicate an operating system error.
System Port • Probes that invalidate locked blocks do not generate a ReadBlkMod command. The 21264/EV67 fails the STx_C instruction as defined in the Alpha Architecture Handbook, Version 4. • All read commands (RdBlk, RdBlkMod, Fetch, InvalToDirty) do not interact because the 21264/EV67 does not yet own the block.
Bcache Port 4.7.10.2 System Probes and SysDc Commands Ordering of cache transactions at the system serialization point must be reflected in the 21264/EV67 cache system. Table 4–34 shows the rules that a system must follow to control the order of cache status update within the 21264/EV67 cache structures (including the VAF) at the 21264/EV67 pins.
Bcache Port The Bcache supports the following multiples of the GCLK period: 1.5X (dual-data mode only), 2X, 2.5X, 3X, 3.5X, 4X, 5X, 6X, 7X, and 8X. However, the 21264/EV67 imposes a maximum Bcache clock period based on the SYSCLK ratio. Table 4–35 lists the range of maximum Bcache clock periods. Section 4.7.8.2 describes fast mode. Table 4–35 Range of Maximum Bcache Clock Ratios SYSCLK Ratio Bcache Clock Ratio with Fast Mode Enabled Bcache Clock Ratio with Fast Mode Disabled 1.5X 4.0X 7.0X 2.0X 4.
Bcache Port Table 4–36 Bcache Port Pins (Continued) Pin Name Type Count Reference Clock Description BcDataWr_L O_PP 1 Int_Index_BcClk Bcache data write enable BcLoad_L O_PP 1 Int_Index_BcClk Bcache burst enable BcTag_H[42:20] B_DA_PP 23 Int_Data_BcClk ⇒ output Bcache tag data BcTagInClk_H ⇒ input BcTagDirty_H B_DA_PP 1 Int_Data_BcClk ⇒ output Bcache tag dirty bit BcTagInClk_H ⇒ input BcTagInClk_H I_DA 1 NA Tag input data reference clock BcTagOE_L O_PP 1 Int_Index_BcClk Bcache
Bcache Port BcTagShared_H BcTagValid_H 3. The Bcache clock pins (BcDataOutClk_x[3:0] and BcTagOutClk_x) clock the index and data pins at the SSRAMs. These clocks can be delayed from Int_Data_BcClk from 0 to 2 GCLK phases (half cycles) using Cbox CSR BC_CPU_CLK_DELAY[1:0]. Table 4–37 provides the BC_CPU_CLK_DELAY[1:0] values, which is the delay from BC_ADDRESS to BC_WRITE_DATA (and BC_CLOCK_OUT) in GCLK cycles.
Bcache Port 3. BC_FDBK_EN[7:0] To program these three CSRs, the programmer must know the bit-rate of the Bcache data, and whether only the rising edge or both edges of the clock are used to latch data. For example, a 200-MHz late-write SSRAM has a data period of 5 ns. For a 2-ns GCLK, the READCLK_RATIO must be set to 2.5X. This part is called a 2.5X SD (single-data part). Table 4–39 shows how the three CSRs are programmed for single-data devices.
Bcache Port Table 4–40 Program Values to Set the Cache Clock Period (Dual-Data Rate) (Continued) Bcache Transfer BC_CLK_LD_VECTOR1 BC_BPHASE_LD_VECTOR1 BC_FDBK_EN1 4.0X-DD 0F0F 0 01 5.0X-DD 7C1F 0 40 6.0X-DD F03F 0 10 7.0X-DD C07F 0 04 8.0X-DD 00FF 0 01 1 These are hexadecimal values. In addition to programming the clock CSRs, the data-sample/drive Cbox CSRs, at the pads, must be set appropriately. Table 4–41 lists these CSRs and provides their programmed value.
Bcache Port have been programmed for the Bcache clock period, and with satisfactory delay parameters for the SSRAM setup/hold Bcache address latch requirements, a Bcache read command proceeds through the 21264/EV67 Cbox as follows: 1.
Bcache Port priate programming of the Bcache clock period and delay parameters to satisfy SSRAM setup/hold requirements of the Bcache address latch, a Bcache write transaction proceeds through the Cbox as follows: 1. The Cbox transmits the index and write control signals during an Int_Adr_BcClk edge. 2. The data is placed on Bcache data, tag, and tag status pins on the appropriate Int_Data_BcClk edge from 0 to 7 Bcache bit-times later, based on the Cbox CSR BC_LATE_WRITE_NUM[2:0].
Bcache Port 4–50 Term Description Ratio The number of GCLK cycles per peak Bcache bandwidth transfer. For example, a ratio of 2.5 means the peak Bcache bandwidth is 16 bytes for every 2.5 GCLK cycles. rd_wr The minimum spacing required between the read and write indices at the data/tag pins, expressed as GCLK cycles. wr_rd The minimum spacing required between the write and read indices at the data/tag pins, expressed as GCLK cycles.
Bcache Port The Relationship Between Write-to-Read — BC_WR_RD_BUBBLES and wr_rd The following formulas calculate the relationship between the Cbox CSR BC_WR_RD_BUBBLES and wr_rd: wr_rd = (BC_WR_RD_BUBBLES – 1) * bcfrm or BC_WR_RD_BUBBLES = ((wr_rd + bcfrm – 1) / bcfrm) + 1 There is never a need to use a value of 0 or 1 for BC_WR_RD_BUBBLES.
Bcache Port When the Cbox CSR BC_BANK_ENABLE[0] is not set, the unused BcAdd_H[23:4] pins are tied to zero. For example, when configured as a 4MB cache, the 21264/EV67 never changes BcAdd_H[23:22] from logic zero, and when BC_BANK_ENABLE[0] is asserted, the 21264/EV67 drives the complement of the MSB index on the next higher BcAdd_H pin. 4.8.4.
Bcache Port Table 4–46 lists the combination of control pin assertion for RAM_TYPE C. Table 4–46 Control Pin Assertion for RAM_TYPE C TYPE_C NOP RA0 RA1 RA2 RA3 NOP NOP WA0 WA1 WA2 WA3 NOP BcLoad_L H H H H H H H H H H H H BcDataOE_L H H L L L L L H H H H H BcDataWr_L H H H H H H H L L L L H BcTagOE_L H L L H H H H H H H H H BcTagWr_L H H H H H H H L H H H H Table 4–47 lists the combination of control pin assertion for RAM_TYPE D.
Interrupts 4.8.5 Bcache Banking Bcache banking is possible by decoding the index MSB (as determined by Cbox CSR BC_SIZE[3:0]) and asserting Cbox CSR BC_BANK_ENABLE[0]. To facilitate banking, the 21264/EV67 provides the complement of the MSB bit in the next higher unused index bit. For example, when configured as an 8MB cache with banking enabled, the 21264/EV67 drives the inversion of PA[22] on BcAdd_H[23] for use as a chip enable in a banked configuration.
5 Internal Processor Registers This chapter describes 21264/EV67 internal processor registers (IPRs). They are separated into the following circuit logic groups: Ebox, Ibox, Mbox, and Cbox. The gray areas in register figures indicate reserved fields. Bit ranges that are coupled with the field name specify those bits in that named field that are included in the IPR. For example, in Figure 5–2, the field named COUNTER[31:4] contains bits 31 through 4 of the COUNTER field from Section 5.1.1.
Table 5–1 Internal Processor Registers (Continued) MT/MF Issued from Ebox Access Pipe Latency for MFPR (Cycles) Register Name Mnemonic Index (Binary) ScoreBoard Bit Instruction VA format IVA_FORM 0000 0111 5 RO 0L 3 Current mode CM 0000 1001 4 RW 0L 3 Interrupt enable IER 0000 1010 4 RW 0L 3 Interrupt enable and current mode IER_CM 0000 10xx 4 RW 0L 3 Software interrupt request SIRR 0000 1100 4 RW 0L 3 Interrupt summary ISUM 0000 1101 — RO — — Hardware interru
Ebox IPRs Table 5–1 Internal Processor Registers (Continued) MT/MF Issued from Ebox Access Pipe Latency for MFPR (Cycles) Mnemonic Index (Binary) ScoreBoard Bit Cbox data C_DATA 0010 1011 6 RW 0L 3 Cbox shift control C_SHFT 0010 1100 6 WO 0L Ò Register Name Cbox IPRs 1 When n equals 1, that process context field is selected (FPE, PPCE, ASTRR, ASTER, ASN). 5.1 Ebox IPRs This section describes the internal processor registers that control Ebox functions. 5.1.
Ebox IPRs Table 5–2 describes the CC_CTL register fields. Table 5–2 Cycle Counter Control Register Fields Description Name Extent Type Description Reserved [63:33] — — CC_ENA [32] WO Counter Enable. When set, this bit allows the cycle counter to increment. COUNTER[31:4] [31:4] WO CC[31:4] may be written by way of this field. Write transactions to CC_CTL result in CC[3:0] being cleared. Reserved [3:0] — — 5.1.
Ebox IPRs Table 5–3 describes the virtual address control register fields. Table 5–3 Virtual Address Control Register Fields Description Name Extent Type Description VPTB[63:30] [63:30] WO Virtual Page Table Base. See the VA_FORM register section for details. Reserved [29:3] — — VA_FORM_32 [2] WO,0 This bit is used to control address formatting when reading the VA_FORM register. See the section on the VA_FORM register for details.
Ibox IPRs Figure 5–6 Virtual Address Format Register (VA_48 = 1, VA_FORM_32 = 0) 63 43 42 38 37 3 2 0 VPTB[63:43] SEXT(VA[47]) VA[47:13] LK99-0012A Figure 5–7 shows VA_FORM when VA_CTL(VA_48) equals 0 and VA_CTL(VA_FORM_32) equals 1. Figure 5–7 Virtual Address Format Register (VA_48 = 0, VA_FORM_32 = 1) 63 30 29 22 21 3 2 0 VPTB[63:30] VA[31:13] LK99-0013A 5.2 Ibox IPRs This section describes the internal processor registers that control Ibox functions. 5.2.
Ibox IPRs Figure 5–9 ITB PTE Array Write Register 63 44 43 13 12 11 10 9 8 7 6 5 4 3 0 PFN[43:13] URE SRE ERE KRE GH[1:0] ASM LK99-0016A 5.2.3 ITB Invalidate All Process (ASM=0) Register – ITB_IAP The ITB invalidate all process register (ITB_IAP) is a pseudo register that, when written to, invalidates all ITB entries whose ASM bit is clear. An explicit write to IC_FLUSH_ASM is required to flush the Icache of blocks with ASM equal to zero. 5.2.
Ibox IPRs 5.2.6 ProfileMe PC Register – PMPC The ProfileMe PC register (PMPC) is a read-only register that contains the PC of the last profiled instruction. Additional information is available in the I_STAT and PCTR_CTL register descriptions. Usage of PMPC in performance monitoring is described in Section 6.10. Figure 5–11 shows the ProfileMe PC register. Figure 5–11 ProfileMe PC Register 63 2 1 0 PC[63:2] PAL LK99-0018A Table 5–4 describes the ProfileMe PC register fields.
Ibox IPRs 5.2.8 Instruction Virtual Address Format Register — IVA_FORM The instruction virtual address format register (IVA_FORM) is a read-only register. It contains the virtual PTE address derived from the faulting virtual address stored in the EXC_ADDR register, and from the virtual page table base, VA_48 and VA_FORM_32 bits, stored in the I_CTL register. Figure 5–13 shows IVA_FORM when I_CTL(VA_48) equals 0 and I_CTL(VA_FORM_32) equals 0.
Ibox IPRs Figure 5–16 Interrupt Enable and Current Processor Mode Register 63 39 38 33 32 31 30 29 28 14 13 12 5 4 3 2 0 EIEN[5:0] SLEN CREN PCEN[1:0] SIEN[15:1] ASTEN CM[1:0] LK99-0022A Table 5–5 describes the interrupt enable and current processor mode register fields.
Ibox IPRs Figure 5–17 Software Interrupt Request Register 63 29 28 14 13 0 SIR[15:1] LK99-0023A Table 5–6 describes the software interrupt request register fields. Table 5–6 Software Interrupt Request Register Fields Description Name Extent Type Description Reserved [63:29] — — SIR[15:1] [28:14] RW Software Interrupt Requests Reserved [13:0] — — 5.2.
Ibox IPRs Table 5–7 describes the interrupt summary register fields. Table 5–7 Interrupt Summary Register Fields Description Name Extent Type Description Reserved [63:39] — — EI[5:0] [38:33] RO External Interrupts SL [32] RO Serial Line Interrupt CR [31] RO Corrected Read Error Interrupts PC[1:0] [30:29] RO Performance Counter Interrupts PC0 when PC[0] is set. PC1 when PC[1] is set.
Ibox IPRs Table 5–8 describes the hardware interrupt clear register fields.
Ibox IPRs Figure 5–20 Exception Summary Register 63 48 47 46 45 44 43 42 41 40 14 13 12 8 7 6 5 4 3 2 1 0 SEXT(SET_IOV) SET_IOV SET_INE SET_UNF SET_OVF SET_DZE SET_INV PC_OVFL BAD_IVA REG[4:0] INT IOV INE UNF FOV DZE INV SWC LK99-0026A Table 5–9 describes the exception summary register fields. Table 5–9 Exception Summary Register Fields Description Name Extent Type Description SEXT(SET_IOV) [63:48] RO, 0 Sign-extended value of bit 47, SET_IOV. SET_IOV [47] RO PALcode should set FPCR[IOV].
Ibox IPRs Table 5–9 Exception Summary Register Fields Description (Continued) Name Extent Type Description REG[4:0] [12:8] RO Destination register of load or operate instruction that triggered the trap OR source register of store that triggered the trap. These bits may contain the Rc field of an operate instruction or the Ra field of a load or store instruction. The value is UNPREDICTABLE if the trap was triggered by an ITB miss, interrupt, OPCDEC, or other non load/st/operate.
Ibox IPRs Figure 5–22 Ibox Control Register 63 48 47 30 29 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 3 2 1 0 SEXT(VPTB[47]) VPTB[47:30] CHIP_ID[5:0] BIST_FAIL TB_MB_EN MCHK_EN ST_WAIT_64K PCT1_EN PCT0_EN SINGLE_ISSUE_H VA_FORM_32 VA_48 SL_RCV SL_XMIT HWE BP_MODE[1:0] SBE[1:0] SDE[1:0] SPE[2:0] IC_EN[1:0] SPCE LK99-0029A Table 5–11 describes the Ibox control register fields.
Ibox IPRs Table 5–11 Ibox Control Register Fields Description (Continued) Name Extent Type Description MCHK_EN [21] RW,0 Machine check enable — set to enable machine checks. ST_WAIT_64K [20] RW,0 The stWait table is used to reduce load/store order traps. When set, the stWait table is cleared after 64K cycles. When clear, the stWait table is cleared after 16K cycles. See Section 2.11. PCT1_EN [19] RW,0 Enable performance counter #1.
Ibox IPRs Table 5–11 Ibox Control Register Fields Description (Continued) Name Extent Type Description SBE[1:0] [9:8] RW,0 Stream Buffer Enable. The value in this bit field specifies the number of Istream buffer prefetches (besides the demand-fill) that are launched after an Icache miss. If the value is zero, only demand requests are launched. SDE[1:0] [7:6] RW,0 PALshadow Register Enable. Enables access to the PALshadow registers.
Ibox IPRs Figure 5–23 Ibox Status Register 63 41 40 39 38 37 34 33 32 30 29 28 0 MIS TRP LS0 TRAP TYPE[3:0] ICM OVR[2:0] PAR LK99-0031A Table 5–12 describes the Ibox status register fields. Table 5–12 Ibox Status Register Fields Description Name Extent Type Description Reserved [63:41] RO Reserved for Compaq. MIS [40] RO ProfileMe Mispredict Trap. If the I_STAT[TRP] bit is set, this bit indicates that the profiled instruction caused a mispredict trap.
Ibox IPRs Table 5–12 Ibox Status Register Fields Description (Continued) Name Extent Type Description TRAP TYPE[3:0] [37:34] RO ProfileMe Trap Types.
Ibox IPRs 5.2.17 Icache Flush Register – IC_FLUSH The Icache flush register (IC_FLUSH) is a pseudo register. Writing to this register invalidates all Icache blocks. The cache is flushed when the next HW_RET/STALL instruction is retired. See Section D.20 for more information. 5.2.18 Icache Flush ASM Register – IC_FLUSH_ASM The Icache flush ASM register (IC_FLUSH_ASM) is a pseudo register. Writing to this register invalidates all Icache blocks with their ASM bit clear. 5.2.
Ibox IPRs Figure 5–24 Process Context Register 63 47 46 39 38 13 12 9 8 5 4 3 2 1 0 ASN[7:0] ASTRR[3:0] ASTER[3:0] FPE PPCE LK99-0032A Table 5–14 describes the process context register fields. Table 5–14 Process Context Register Fields Description Name Extent Type Description Reserved [63:47] — — ASN[7:0] [46:39] RW Address space number. Reserved [38:13] — — ASTRR[3:0] [12:9] RW AST request register—used to request AST interrupts in each of the four processor modes.
Ibox IPRs Table 5–14 Process Context Register Fields Description (Continued) Name Extent Type Description FPE [2] RW,1 Floating-point enable—if clear, floating-point instructions generate FEN exceptions. This bit is set by hardware on reset. PPCE [1] RW Process performance counting enable. Enables performance counting for an individual process with counters PCTR0 or PCTR1, which are enabled by setting PCT0_EN or PCT1_EN, respectively.
Ibox IPRs Table 5–15 describes the performance counter control register fields. Table 5–15 Performance Counter Control Register Fields Description Name Extent Type Description SEXT(PCTR0_CTL[47]) [63:48] RO When read, this field is sign extended from PCTR_CTL[47]. Writes to this field are ignored. PCTR0[19:0] Performance counter 0. PCTR0 is enabled by I_CTL[PCT0_EN] and either I_CTL[SPCE] or PCTX[PPCE].
Mbox IPRs Table 5–15 Performance Counter Control Register Fields Description (Continued) Name Extent Type Description VAL [1] RO Profiled instruction valid. When set, indicates a nontrapping profiled instruction retired valid. When clear, indicates that a nontrapping profiled instruction was killed after the cycle in which it was mapped. Valid retire/abort status for a trapping profiled instruction is determined by the trap type (see I_STAT[TRAP_TYPE]).
Mbox IPRs 5.3.2 DTB PTE Array Write Registers 0 and 1 – DTB_PTE0, DTB_PTE1 The DTB PTE array write registers 0 and 1 (DTB_PTE0 and DTB_PTE1) are registers through which the DTB PTE arrays are written. The entries to be written are chosen by a round-robin allocation scheme. Write transactions to the DTB_PTE registers, when retired, result in both the DTB_TAG and DTB_PTE arrays being written. Figure 5–27 shows the DTB PTE array write registers 0 and 1.
Mbox IPRs Table 5–17 describes the DTB_ALTMODE register fields. Table 5–17 DTB Alternate Processor Mode Register Fields Description Name Extent Type Description Reserved [63:2] — — ALT_MODE[1:0] [1:0] WO Alt_Mode: ALT_MODE[1:0] 00 01 10 11 Mode Kernel Executive Supervisor User 5.3.4 Dstream TB Invalidate All Process (ASM=0) Register – DTB_IAP The Dstream translation buffer invalidate all process (ASM=0) register (DTB_IAP) is a write-only pseudo register.
Mbox IPRs 5.3.7 Dstream TB Address Space Number Registers 0 and 1 – DTB_ASN0,1 The Dstream translation buffer address space number registers (DTB_ASN0 and DTB_ASN1) are write-only registers that should be written with the address space number (ASN) of the current process. Figure 5–30 shows the Dstream translation buffer address space number registers 0 and 1. Figure 5–30 Dstream Translation Buffer Address Space Number Registers 0 and 1 63 56 55 0 ASN[7:0] LK99-0038A 5.3.
Mbox IPRs Table 5–18 Memory Management Status Register Fields Description (Continued) Name Extent Type Description FOR [2] RO This bit is set when a fault-on-read error occurs during a read transaction and PTE[FOR] was set. ACV [1] RO This bit is set when an access violation occurs during a transaction. Access violations include a bad virtual address. WR [0] RO This bit is set when an error occurs during a write transaction.
Mbox IPRs Table 5–19 describes the Mbox control register fields. Table 5–19 Mbox Control Register Fields Description Name Extent Type Description Reserved [63:6] — — SMC[1:0] [5:4] WO,0 Speculative miss control (see Section 4.6.4). Bits Meaning When Set 00 01 Allow full-time speculation. Force full-time conservative mode. Make retries wait until retire, force all new stores that do not hit dirty to retry, and cause prefetches with modify intent (see Section 2.6.
Mbox IPRs Figure 5–33 Dcache Control Register 63 8 7 6 5 4 3 2 1 0 DCDAT_ERR_EN DCTAG_PAR_EN F_BAD_DECC F_BAD_TPAR F_HIT SET_EN[1:0] LK99-0041A Table 5–20 describes the Dcache control register fields. Table 5–20 Dcache Control Register Fields Description Name Extent Type Description Reserved [63:8] — — DCDAT_ERR_EN [7] WO,0 Dcache data ECC and parity error enable. DCTAG_PAR_EN [6] WO,0 Dcache tag parity enable. F_BAD_DECC [5] WO,0 Force Bad Data ECC.
Cbox CSRs and IPRs Figure 5–34 Dcache Status Register 63 5 4 3 2 1 0 SEO ECC_ERR_LD ECC_ERR_ST TPERR_P1 TPERR_P0 LK99-0042A Table 5–21 describes the Dcache status register fields. Table 5–21 Dcache Status Register Fields Description Name Extent Type Description Reserved [63:5] — — SEO [4] W1C Second error occurred. When set, this bit indicates that a second Dcache store ECC error occurred within 6 cycles of the previous Dcache store ECC error. ECC_ERR_LD [3] W1C ECC error on load.
Cbox CSRs and IPRs 5.4.1 Cbox Data Register – C_DATA Figure 5–35 shows the Cbox data register. Figure 5–35 Cbox Data Register 63 6 5 0 C_DATA[5:0] LK99-0043A Table 5–22 describes the Cbox data register fields. Table 5–22 Cbox Data Register Fields Description Name Extent Type Description Reserved [63:6] — — C_DATA[5:0] [5:0] RW Cbox data register. A HW_MTPR instruction to this register causes six bits of data to be placed into a serial shift register.
Cbox CSRs and IPRs • Only a brief description of each CSR is given. The functional description of these CSRs is contained in Chapter 4. • The order of multibit vectors is [MSB:LSB], so the LSB is first bit in the Cbox chain. Table 5–24 describes the Cbox WRITE_ONCE chain order from LSB to MSB. Table 5–24 Cbox WRITE_ONCE Chain Order Cbox WRITE_ONCE Chain Description 32_BYTE_IO[0] Enable 32_BYTE I/O mode. SKEWED_FILL_MODE[0] Asserted when Bcache is at 1.5X ratio.
Cbox CSRs and IPRs Table 5–24 Cbox WRITE_ONCE Chain Order (Continued) Cbox WRITE_ONCE Chain Description DUP_TAG_ENABLE Duplicate CSR. SKEWED_FILL_MODE Duplicate CSR. BC_RDVICTIM Duplicate CSR. SKEWED_FILL_MODE Duplicate CSR. BC_RDVICTIM Duplicate CSR. BC_CLEAN_VICTIM Duplicate CSR. DUP_TAG_MODE Duplicate CSR. SKEWED_FILL_MODE Duplicate CSR. ENABLE_PROBE_CHECK Enable error checking during probe processing. SPEC_READ_ENABLE[0] Enable speculative references to the system port.
Cbox CSRs and IPRs Table 5–24 Cbox WRITE_ONCE Chain Order (Continued) Cbox WRITE_ONCE Chain Description BC_TAG_DDM_RISE_EN[0] Enables the update of the 21264/EV67 Bcache tag outputs based on the rising edge of the forwarded clock. BC_CLKFWD_ENABLE[0] Enable clock forwarding on the Bcache interface. BC_RCV_MUX_CNT_PRESET[0:1] Initial value for the Bcache clock forwarding unload pointer FIFO. BC_LATE_WRITE_UPPER[0] Duplicate CSR.
Cbox CSRs and IPRs Table 5–24 Cbox WRITE_ONCE Chain Order (Continued) Cbox WRITE_ONCE Chain Description SYS_DDM_FALL_EN Duplicate CSR. SYS_DDM_RISE_EN Duplicate CSR. SYS_CLKFWD_ENABLE Duplicate CSR. SYS_RCV_MUX_CNT_PRESET[0:1] Duplicate CSR. SYS_CLK_DELAY[0:1] Duplicate CSR. SYS_DDMR_ENABLE Duplicate CSR. SYS_DDMF_ENABLE Duplicate CSR. BC_DDM_FALL_EN Duplicate CSR. BC_DDM_RISE_EN Duplicate CSR. BC_CLKFWD_ENABLE Duplicate CSR. BC_RCV_MUX_CNT_PRESET[0:1] Duplicate CSR.
Cbox CSRs and IPRs Table 5–24 Cbox WRITE_ONCE Chain Order (Continued) Cbox WRITE_ONCE Chain Description CFR_FRMCLK_DELAY[0:1] Number of FrameClk_x cycles to delay internal ClkFwdRst. BC_LATE_WRITE_NUM[0:2] Duplicate CSR. BC_CPU_LATE_WRITE_NUM[1:0] Duplicate CSR. JITTER_CMD[0] Add one GCLK cycle to the SYSDC write path. FAST_MODE_DISABLE[0] Duplicate CSR. SYSDC_DELAY[3:0] Number of GCLK cycles to delay SysDc fill commands before action by the Cbox.
Cbox CSRs and IPRs Table 5–25 describes the Cbox WRITE_MANY chain order from LSB to MSB. Table 5–25 Cbox WRITE_MANY Chain Order Cbox WRITE_MANY Chain Description For Information: BC_ENABLE[0] Enable the Bcache Table 4–42 INIT_MODE[0] Enable initialize mode Section 7.
Cbox CSRs and IPRs ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; SET_DIRTY_ENABLE = 6 BC_BANK_ENABLE = 1 BC_WRT_STS = 0 The value for the write_many chain is based on Table 5–25. The value is sampled from MSB, 6 bits at a time, as it is written to EV6__DATA. Therefore, before the value can be shifted in, it must be inverted on a by 6 basis. The code then writes out 6 bits at a time, shifting right by 6 after each write.
Cbox CSRs and IPRs 5.4.5 Cbox Read Register (IPR) Description The Cbox read register is read 6 bits at a time. Table 5–26 shows the ordering from LSB to MSB. Table 5–26 Cbox Read IPR Fields Description Name Description C_SYNDROME_1[7:0] If CMD is ChxToDirty, then C_SYNDROME_1 is X; otherwise, is syndrome for upper QW in OW of victim that was scrubbed. C_SYNDROME_0[7:0] If CMD is ChxToDirty, then C_SYNDROME_0 is X; otherwise, is syndrome for lower QW in OW of victim that was scrubbed.
6 Privileged Architecture Library Code This chapter describes the 21264/EV67 privileged architecture library code (PALcode). The chapter is organized as follows: • PALcode description • PALmode environment • Required PALcode function codes • Opcodes reserved for PALcode • Internal processor register access mechanisms • PALshadow registers • PALcode emulation of FPCR • PALcode entry points • Translation buffer fill flows • Performance counter support 6.
PALmode Environment • There are some necessary support functions that are too complex to implement directly in a processor chip’s hardware, but that cannot be handled by a normal operating system software routine. Routines to fill the translation buffer (TB), acknowledge interrupts, and dispatch exceptions are some examples.
Required PALcode Function Codes When executing in PALmode, there are certain restrictions for using the privileged instructions because PALmode gives the programmer complete access to many of the internal details of the 21264/EV67. Refer to Section 6.4 for information on these special PALmode instructions. Caution: It is possible to cause unintended side effects by writing what appears to be perfectly acceptable PALcode. As such, PALcode is not something that many users will want to change.
Opcodes Reserved for PALcode Figure 6–1 HW_LD Instruction Format 31 26 25 21 20 RA OPCODE 16 15 13 12 11 RB 0 DISP TYPE LEN FM-05654.AI4 Table 6–3 describes the HW_LD instruction fields. Table 6–3 HW_LD Instruction Fields Descriptions Extent Mnemonic Value Description [31:26] OPCODE 1B16 The opcode value. [25:21] RA — Destination register number. [20:16] RB — Base register for memory address.
Opcodes Reserved for PALcode Table 6–4 describes the HW_ST instruction fields. Table 6–4 HW_ST Instruction Fields Descriptions Extent Mnemonic Value Description [31:26] OPCODE 1F16 The opcode value. [25:21] RA — Write data register number. [20:16] RB — Base register for memory address. [15:13] TYPE 0002 Physical — The effective address for the HW_ST instruction is physical. 0012 Physical/Cond — The effective address for the HW_ST instruction is physical.
Opcodes Reserved for PALcode Figure 6–3 HW_RET Instruction Format 31 26 25 21 20 RA OPCODE 16 15 14 13 12 0 RB DISP HINT STALL FM-05656.AI4 Table 6–5 describes the HW_RET instruction fields. Table 6–5 HW_RET Instruction Fields Descriptions Extent Mnemonic Value Description [31:26] OPCODE 1E16 The opcode value. [25:21] RA — Register number. It should be R31. [20:16] RB — Target PC of the HW_RET instruction. Bit [0] of the register’s contents determines the new value of PALmode.
Internal Processor Register Access Mechanisms Table 6–6 describes the HW_MFPR and HW_MTPR instructions fields. Table 6–6 HW_MFPR and HW_MTPR Instructions Fields Descriptions Extent Mnemonic Value Description [31:26] OPCODE 1916 The opcode value for the HW_MFPR instruction. 1D16 The opcode value for the HW_MTPR instruction. [25:21] RA — Destination register for the HW_MFPR instruction. It should be R31 for the HW_MTPR instruction. [20:16] RB — Source register for the HW_MTPR instruction.
Internal Processor Register Access Mechanisms 6.5.1 IPR Scoreboard Bits In previous Alpha implementations, IPR registers were not scoreboarded in hardware. Software was required to schedule HW_MTPR and HW_MFPR instructions for each machine’s pipeline organization in order to ensure correct behavior. This software scheduling task is more difficult in the 21264/EV67 because the Ibox performs dynamic scheduling.
Internal Processor Register Access Mechanisms 6.5.3 Hardware Structure of Implicitly Written IPRs Implicitly written IPRs are physically built using only a single level of register, however the IPR has two hardware states associated with it: 1. Default State—The contents of the register may be written when an instruction generates an exception. If an exception occurs, write a new value into the IPR and go to state 2. 2.
Internal Processor Register Access Mechanisms Table 6–7 Paired Instruction Fetch Order (Continued) Second Instruction Explicit Writer First Instruction Reader reads second register. Writer cannot write second register until it is retired. Write-one-to-clear bits, or performance counter special case. For example, performance counter increments are typically not scoreboarded against read transactions. Reader reads second Scoreboard bits stall second register.
PALshadow Registers 6.5.6 Correct Ordering of Explicit Readers Followed by Implicit Writers Certain IPRs that are updated as a result of faulting memory operations require PALcode assistance to maintain ordering against newer instructions. Consider the following code sequence: HW_MFPR IPR_MM_STAT LDQ rx,(ry) It is typically the case that these instructions would issue in-order: • The MFPR is data-ready and both instructions use a lower subcluster.
PALcode Entry Points 3. Correct actions must occur when the FPCR is written by way of a MT_FPCR instruction. 6.7.1 Status Flags The FPCR status bits in the 21264/EV67 are set with PALcode assistance. Floatingpoint exceptions, for which the associated FPCR status bit is clear or for which the associated trap is enabled, result in a hardware trap to the ARITH PALcode routine.
PALcode Entry Points Each CALL_PAL instruction includes a function field that is used to calculate the PC of its associated PALcode entry point.
Translation Buffer (TB) Fill Flows Table 6–8 PALcode Exception Entry Locations (Continued) Entry Name Type Offset16 Description MCHK Interrupt 500 Machine check. ITB_MISS Fault 580 Istream TB miss. ARITH Synch. Trap 600 Arithmetic exception or update to FPCR. INTERRUPT Interrupt Interrupts: hardware, software, and AST. MT_FPCR Synch. Trap 700 Invoked when a MT_FPCR instruction is issued. RESET/WAKEUP Interrupt Chip reset or wake-up from sleep mode. 680 780 6.
Translation Buffer (TB) Fill Flows hw_mtprp4, ; (0,4,2,6) (0L) write pte0 hw_mtprp4, ; (3,7,1,5) (1L) write pte1 ASSUME ne 2 .if ne pte_eco bne p7, trap__dtbm_single_mb ; branch for mb hw_ret (p23) ; return trap__dtbm_single_mb: mb hw_ret(p23) ; return .iff hw_ret(p23) ; return ; (assumes tb_mb_en on multi-processors) .
Translation Buffer (TB) Fill Flows • The conditional branch is placed in the code so that all of the MTPR instructions are issued and retired or none of them are issued and retired. This allows the TB fill hardware to update the TB whenever it sees the retiring of PTE1 and to ignore writes to TAG0/TAG1/PTE0/PTE1 in the interim between the issuing of those writes and a retire of PTE1.
Performance Counter Support srl r4, #OSF_PTE__PFN__S, r6 ; (xU) shift PFN to <0> sll r6, #EV6__ITB_PTE__PFN__S, r6 ; (xU) shift PFN into place and r4, #<1@OSF_PTE__FOE__S>, r7 ; (xL) get FOE bit blbc r4, trap__invalid_ipte ; (xU) invalid => branch bne r7, trap__foe ; (xU) FOE => branch srl r4, #7, r7 ; check for mb bit bis r5, r6, r6 ; (xL) PTE in ITB format hw_mtpr r23, EV6__ITB_TAG ; (6,0L) write tag hw_mtpr r6, EV6__ITB_PTE ; (0&4,0L) write PTE ASSUME
Performance Counter Support ProfileMe mode, supports a new way of statistically sampling individual instructions during program execution. This mode counts events triggered by a targeted inflight instruction. Counter support uses the hardware registers listed in Table 6–9. Table 6–9 IPRs Used for Performance Counter Support Register Name Mnemonic Relevant Fields Described in Section ProfileMe PC PMPC All fields 5.2.6 Interrupt enable and current processor mode IER_CM PCEN[1:0] 5.2.
Performance Counter Support The legal range for PCTR0 when writing the IPR is 0:(2**20-16). The legal range for PCTR1 when writing the IPR is 0:(2**20-4). 6.10.2.2 Operation 1. Setup The following IPRs need to be set up by PALcode instructions. IPR Name Relevant Fields Meaning IER_CM PCEN[1:0] Enable Interrupts. PCTX PPCE Enable Process Performance Counting or use I_CTL[SPCE]. PCTR_CTL SL0 Selects Aggregate or ProfileMe mode; set to 0 for Aggregate mode.
Performance Counter Support 6.10.2.3 Aggregate Counting Mode Description 6.10.2.3.1 Cycle counting Counts cycles. PCTR0 is incremented by the number of cycles counted, that is, 1. 6.10.2.3.2 Retired instructions cycles PCTR0 is incremented by up to 8 retired instructions per cycle when enabled via I_CTL[PCT0_EN] and either I_CTL[SPCE] or PCTX[PPCE]. On overflow, an interrupt is triggered as ISUM[PC0] if enabled via IER_CM[PCEN0].
Performance Counter Support The CMOV instruction is decomposed into two valid fetched instructions that, in the absence of stalls, are fetched in consecutive cycles. See Table 6–12 for more information. Table 6–12 CMOV Decomposed Instruction New Instructions CMOV Ra, Rb--> Rc CMOV1 Ra, oldRc −−> newRc1 CMOV2 newRc1, Rb −−> newRc2 6.10.3.2 Operation 1. Setup The following IPRs need to be set up by using PALcode instructions. IPR Name Relevant Fields Meaning IER_CM PCEN[1:0] Enable Interrupts.
Performance Counter Support For instructions that cause a trap, the last cycle in the window is the 2nd cycle after the trap. Mispredicted branches are included in this category. For nontrapping instructions that retire, the last cycle in the window is the 2nd cycle after the instruction retires. For instructions that abort, the last cycle in the window is the 2nd cycle after the trap that caused the abort.
Performance Counter Support 6.10.3.3 ProfileMe Counting Mode Description 6.10.3.3.1 Cycle counting In ProfileMe mode, either counter counts cycles during the window of the profiled instruction. 6.10.3.3.2 Inum retire delay cycles This input is used to measure a lower bound on the inum retire delay of the profiled instruction. The maximum final value of PCTR1 is the length of the ProfileMe window minus 2.
Performance Counter Support 6.10.3.4 Counter Modes for ProfileMe Mode Table 6–14 shows the counter modes that are used with ProfileMe mode.
7 Initialization and Configuration This chapter provides information on 21264/EV67-specific microprocessor system initialization and configuration. It is organized as follows: • • • • • • • • • • • Power-up reset flow Fault reset flow Energy star certification and sleep mode flow Warm reset flow Array initialization Initialization mode processing External interface initialization Internal processor register (IPR) reset state IEEE 1149.
Power-Up Reset Flow and the Reset_L and DCOK_H Pins 1. The clock forwarding and system clock ratio configuration information is loaded onto the 21264/EV67. See Section 7.1.2. 2. The internal PLL is ramped up to operating frequency. 3. The internal arrays built-in self-test (BiST) is run, followed by Icache initialization using an external serial ROM (SROM) interface. The 21264/EV67 systems, unlike the Alpha 21064 and 21164 microprocessor systems, are required to have an SROM.
Power-Up Reset Flow and the Reset_L and DCOK_H Pins Figure 7–1 Power-Up Timing Sequence A0 A1 IRQ_H valid DCOK_H a Reset_L B state WAIT_SETTLE WAIT_NOMINAL f RAMP1 RAMP2 WAIT_ClkFwdRst0 b WAIT_BiST WAIT_ClkFwdRst1 RUN c e SromOE_L ClkFwdRst_H no min no min C d internal ClkFwdRst TestStat_H g external Clks End of BiST BiST Fails BiST Passes FM-06486B.FH8 7.1.
Power-Up Reset Flow and the Reset_L and DCOK_H Pins Table 7–2 Signal Pin Reset State (Continued) Signal Reset State Signal Reset State SysAddOut_L[14:0] Initially, during power-up reset, state SysDataOutValid_L is not defined. If not during powerup, preserves previous state. Then, after the clock forward reset period (as the external clocks start), signal driven to NZNOP until the reset state machine enters RUN, when it is driven to NOP.
Power-Up Reset Flow and the Reset_L and DCOK_H Pins Table 7–3 summarizes the pins and the suggested/required initialization state. Most of this information is supplied by placing (switch-selectable or hardwired) weak pull-ups or pull-downs on the IRQ_H pins. The IRQ_H pins are sampled on the rising edge of DCOK_H, during which time the 21264/EV67 is in reset and is not generating any system activity. During normal operation, the IRQ_H pins supply interrupt requests to the 21264/EV67.
Power-Up Reset Flow and the Reset_L and DCOK_H Pins Table 7–3 Pin Signal Names and Initialization State (Continued) Signal Name Sample Time Function DCOK_H Continuous input When deasserted, initializes the internal 21264/ — EV67 reset state machine and keeps the PLL internal oscillator running at a nominal speed. Assertion, which implies power to the 21264/ EV67 is good, causes configuration information to be sampled.
Power-Up Reset Flow and the Reset_L and DCOK_H Pins As BiST completes, the TestStat_H pin is held low for 16 GCLK cycles. Then, if BiST succeeds, the pin remains low. Otherwise, it is asserted. After successfully completing BiST, the 21264/EV67 then performs the SROM load sequence (described in Chapter 11). After the SROM load sequence is finished, the 21264/EV67 deasserts SromOE_L. 7.1.
Fault Reset Flow 7.2 Fault Reset Flow The fault reset sequence of operation is triggered by the assertion of the ClkFwdRst_H signal line. Figure 7–2 shows the fault reset sequence of operation. The reset state machine is initially in RUN state. ClkFwdRst_H is asserted by the system, which causes the state machine to transition to the WAIT_FAULT_RESET state. The 21264/EV67 internally resets a minimum amount of internal state. Note the effects of that reset on the IPRs in Table 7–5.
Energy Star Certification and Sleep Mode Flow Figure 7–2 Fault Reset Sequence of Operation internal clks aligned e state RUN WAIT_ClkFwdRst0 WAIT_FAULT_RESET a WAIT_ClkFwdRst1 b c RUN g SromOE_L ClkFwdRst_H no min no min A f internal ClkFwdRst external Clks FM-06488B.AI4 7.3 Energy Star Certification and Sleep Mode Flow The 21264/EV67 is Energy Star compliant. Energy Star is a program administered by the Environmental Protection Agency to reduce energy consumption.
Energy Star Certification and Sleep Mode Flow After the PLL has finished ramping down, the reset state machine enters the WAIT_INTERRUPT state. Note the effects of the entry into that state on the IPRs listed in Table 7–6.
Warm Reset Flow Figure 7–3 Sleep Mode Sequence of Operation internal clks b state RUN DOWN1 DOWN2 DOWN3 WAIT_INTR RAMP1 RAMP2 f WAIT_ClkFwdRst0 a SLEEP IPR WAIT_BiSI WAIT_ClkFwdRst1 c RUN e Wake-up interrupt SromOE_L ClkFwdRst_H no min no min A d TestStat_H internal ClkFwdRst external Clks FM-06487A.AI4 Table 7–7 describes each signal and constraint for the sleep mode sequence.
Array Initialization The 21264/EV67 waits until Reset_L is deasserted before transitioning from the WAIT_RESET state. The 21264/EV67 ramps up the PLL until the state machine enters the WAIT_ClkFwdRst0 state. Note that the system must assert ClkFwdRst_H before the state machine enters the WAIT_ClkFwdRst0 state. Then, similarly to the other flows, SromOE_L is asserted and the system waits for the deassertion of ClkFwdRst_H.
Initialization Mode Processing Table 7–9 WRITE_MANY Chain CSR Values for Bcache Initialization WRITE_MANY Chain CSRs Required Value at Initialization Mode EVICT_ENABLE 0 BC_WRT_STS[3:0] 0 BC_BANK_ENABLE 0 Except for INIT_MODE, all the CSR registers have been described in earlier sections. When asserted, INIT_MODE has the following behavior: • Cache block updates to the Dcache set the block to the Clean state. • Updates to the Bcache use the BC_WRT_STS[3:0] bits.
External Interface Initialization SweepMemory: ;Write good parity/ecc to memory by ; writing a all memory locations. This is ;done by WH64 of memory addresses turn_on_bcache: ;bc_enable_a 0 ;bc_size_a Actual Bcache size ;zeroblk_enable_a 3 ;set_dirty_enable_a 6 ;init_mode_a 0 ;enable_evict_a 0 ;bc_wrt_sts_a 0 ;bc_bank_enable_a 0 ;This loop generates legal ECC data, and ;invalidate tags which are written to the ;Bcache for all but the final 64KB of address.
Internal Processor Register Power-Up Reset State Table 7–10 Internal Processor Registers at Power-Up Reset State (Continued) Mnemonic Register Name Reset State Comments ITB_IAP ITB invalidate-all (ASM=0) X — ITB_IA ITB invalidate all X Must be written to in PALcode. ITB_IS ITB invalidate single X — PMPC ProfileMePC X — EXC_ADDR Exception address X — IVA_FORM Instruction VA format X — IER_CM Interrupt enable current mode X Must be written to in PALcode.
IEEE 1149.1 Test Port Reset Table 7–10 Internal Processor Registers at Power-Up Reset State (Continued) Mnemonic Register Name Reset State Comments DTB_IS0 DTB invalidate single (array 0) X — DTB_IS1 DTB invalidate single (array 1) X — DTB_ASN0 DTB address space number 0 Cleared — DTB_ASN1 DTB address space number 1 Cleared — MM_STAT Memory management status X — M_CTL Mbox control Cleared — DC_CTL Dcache control DC_STAT Dcache status X Must be cleared in PALcode.
Reset State Machine Figure 7–5 21264/EV67 Reset State Machine State Diagram PLL Ramp Up Reset_L deasserted RAMP1 [2,4] Counter finished WAIT_ NOMINAL [16,32] DCOK_H asserted RAMP2 [1,2] Counter finished WAIT_ SETTLE [16,32] WAIT_ClkFwd Rst0 Counter finished ClkFwdRst_H Enabled Interrupt COLD Reset_L asserted WAIT_ INTERRUPT Out of Sleep Mode deasserted Reset_L deasserted Reset_L asserted Out of FAULT_ RESET* WAIT_ BiST FAULT_ RESET WAIT_ BiSI BiST finished WAIT_ RESET BiSI finished
Reset State Machine Table 7–11 21264/EV67 Reset State Machine State Descriptions (Continued) State Name Description RAMP2 Triggered by the duration counter reaching 4108 cycles, the Xdiv and Zdiv divisors are changed to 1 and 2, respectively, and the frequency is increased. The duration counter is reloaded to count 8205-cycles. WAIT_ClkFwdRst0 Triggered by the duration counter reaching 8205 cycles (or by the deassertion of Reset_L while in the WAIT_RESET state).
Phase-Lock Loop (PLL) Functional Description Table 7–11 21264/EV67 Reset State Machine State Descriptions (Continued) State Name Description DOWN2 Triggered by duration counter reaching 8205 cycles, the PLL ramps GCLK frequency down by the first divider ratio (Xdiv and Zdiv equal 2 and 4, respectively). This has the effect of halving the GCLK frequency. The duration counter is set to 4108 cycles.
Phase-Lock Loop (PLL) Functional Description Table 7–12 shows the allowable ClkIn_x frequencies for a given operating frequency of the 21264/EV67 and the Ydiv divider. For example, to set the 21264/EV67 GCLK frequency to 500 MHz with a ClkIn_x frequency of 166.7 MHz, the system must select a Ydiv divider of 3 by placing the value 00112 on pins IRQ_H[3:0].
8 Error Detection and Error Handling This chapter gives an overview of the 21264/EV67 error detection and error handling mechanisms, and is organized as follows: • Data error correction code • Icache data or tag parity error • Dcache tag parity error • Dcache data correctable ECC error • Dcache store second error • Dcache duplicate tag parity error • Bcache tag parity error • Bcache data correctable ECC error • Memory/system port data correctable ECC error • Bcache data correctable ECC e
Data Error Correction Code 8.1 Data Error Correction Code The 21264/EV67 supports a quadword error correction code (ECC) for the system data bus. ECC is generated by the 21264/EV67 for all memory write transactions (WrVictimBlk) emitted from the 21264/EV67 and for all probe data. ECC is also checked on every memory read transaction for single-bit correction and double-bit error detection.
Dcache Data Single-Bit Correctable ECC Error 3. The virtual address associated with the error is available in the VA register. 4. The PALcode flushes the error block by temporarily disabling DC_CTL[DCTAG_PAR_EN] and evicting the block using two HW_LD instructions. The onchip duplicate tag provides the correct victim address and cache coherence state. If a retried load instruction detects the Dcache tag parity error, the memory reference may have already been retired, so the EXC_ADDR is not available.
Dcache Store Second Error – C_ADDR contains bits [19:6] of the Dcache address of the block that contains the error (bits [42:20] of the physical address are not updated). – DC_STAT[ECC_ERR_LD] is set. – The load queue retries the load and rewrites the register. – A corrected read data (CRD) error interrupt is posted, when enabled. Note: Errors in speculative load instructions cause a CRD error interrupt to be posted but the data is not scrubbed by hardware.
Bcache Tag Parity Error • C_STAT[DC_PERR] is set. • C_ADDR contains bits [42:6] of the Dcache duplicate tag address of the block that contains the error. • When enabled, a machine check (MCHK) is posted. The MCHK is taken when not in PALmode. 8.7 Bcache Tag Parity Error The Bcache tag parity is checked on all Bcache tag references, including references invoked by system probes. If an error is detected, the following actions are taken: • • • Bcache tag parity errors are not recoverable.
Bcache Data Single-Bit Correctable ECC Error 8.8.2 Dcache Fill from Bcache If the quadword in error is not used to satisfy a load instruction, a hardware recovery flow is not invoked. The quadword in error, and its associated check bits, are written into the Dcache. However, status is logged as shown in the bulleted list below, and a corrected read data (CRD) error interrupt is posted, when enabled. PALcode may elect to correct the error by scrubbing the block.
Memory/System Port Single-Bit Data Correctable ECC Error The Bcache access error is written out to memory and is subsequently detected and corrected by the next consumer of the data. • No correction is made. • No status is logged (C_STAT = 0). • A CRD error interrupt is posted, when enabled. 8.8.3.2 Bcache Victim Read During an ECB Instruction A victim from the Bcache that occurs while an ECB instruction is being executed is written directly to the system port without correction.
Bcache Data Single-Bit Correctable ECC Error on a Probe If the quadword in error is used to satisfy a load instruction, then the flow is very similar to that used for a Dcache ECC error: • The load instruction’s destination register is written with incorrect data; however, the load queue will retain the state associated with the load instruction.
Double-Bit Fill Errors 8.11 Double-Bit Fill Errors Double-bit errors for fills are detected, but not corrected, in the 21264/EV67.
Error Case Summary Table 8–3 Error Case Summary (Continued) Error Exception Status Hardware Action PALcode Action Dcache single-bit ECC error on speculative load CRD DC_STAT[ECC_ERR_LD] C_STAT contains zero None Log as CRD Dcache single-bit ECC error on small store CRD DC_STAT[ECC_ERR_ST] Corrected and scrubbed Log as CRD Dcache single-bit None ECC error on victim read None Corrected and scrubbed None Dcache second error MCHK1 on store DC_STAT[SEO] No correction on either store Log as
Error Case Summary Table 8–3 Error Case Summary (Continued) Error Exception Status Hardware Action PALcode Action Bcache double-bit error on Dcache fill MCHK1 C_STAT[DSTREAM_BC_DBL] C_ADDR[error address]4 None Log as MCHK Memory double-bit error on Icache fill MCHK1 C_STAT[ISTREAM_MEM_DBL] C_ADDR[error address]4 None Log as MCHK Memory double-bit error on Dcache fill MCHK1 C_STAT[DSTREAM_MEM_DBL] C_ADDR[error address]4 None Log as MCHK 1 2 3 4 Machine check taken in native mode.
9 Electrical Data This chapter describes the electrical characteristics of the 21264/EV67 and its interface pins. The chapter contains both ac and dc electrical characteristics and power supply considerations, and is organized as follows: • Electrical characteristics • DC characteristics • Power supply sequencing • AC characteristics 9.1 Electrical Characteristics Table 9–1 lists the maximum electrical ratings for the 21264/EV67.
DC Characteristics 9.2 DC Characteristics This section contains the dc characteristics for the 21264/EV67. The 21264/EV67 pins can be divided into 10 distinct electrical signal types. The mapping between these signal types and the package pins is shown in Chapter 3. Table 9–2 shows the signal types.
DC Characteristics Note: Current out of a 21264/EV67 pin is represented by a – symbol while a + symbol indicates current flowing into a 21264/EV67 pin. Table 9–3 VDD (I_DC_POWER) Parameter Symbol Description Test Conditions Minimum Maximum VDD Processor core supply voltage — 1.9 V 2.15 V Power (sleep) Processor power required (sleep) @ VDD = 2.1 V Note 3 — 19 W1 PLL_VDD PLL supply voltage — 3.135 V 3.
DC Characteristics Table 9–7 Pin Type: Open-Drain Output Driver (O_OD) Parameter Symbol Description Test Conditions Minimum Maximum VOL Low-level output voltage IOL = 70 mA — 400 mV |IOZ | High impedance output current 0 < V < VDD — 150 µA COD Open-drain pin capacitance Freq = 10 MHz — 5.
Power Supply Sequencing and Avoiding Potential Failure Mechanisms Table 9–11 Push-Pull Output Driver (O_PP) Parameter Symbol Description Test Conditions Minimum Maximum VOL Low-level output voltage IOL = 40 mA — 500 mV VOH High-level output voltage IOL = –40 mA VDD – 500 mV — | IOZ | High-impedance output current 0 < V < VDD — 150 µA COD Open-drain pin capacitance Freq = 10 MHz — 6.
AC Characteristics the tester environment and does not need to be disabled. EV6Clk_L and EV6Clk_H are outputs that are both generated and consumed by the 21264/EV67; thus, VDD tracks for both the producer and consumer. On the push-pull interfaces: • Disabling all output drivers leaves the output signal at the DC bias point of the termination network. • Disabling the bidirectional drivers leaves the other consumers of the bus as the bus master.
AC Characteristics • The input voltage swing is Vref ± 0.40 Volts. • All output skew data is based on simulation into a 50-ohm transmission line that is terminated with 50 ohms to VDD/2 for Bcache timing, and with 50 ohms to VDD for all other timing. Timings are measured at the pins as follows: – – – – For open-drain outputs, timing is measured to (Vol + Vterm)/2. Where Vterm is the offchip termination voltage for system signals. For non-open-drain outputs, timing is measured to (Vol + Voh)/2.
AC Characteristics Table 9–13 AC Specifications (Continued) Signal Name Type Reference Signal TSU1 TDH2 TSkew Duty Cycle TSlew BcTagShared_H B_DA_PP BcTagInClk_H 400 ps 400 ps NA NA 1.0 V/ns BcTagValid_H B_DA_PP BcTagInClk_H 400 ps 400 ps NA NA 1.
AC Characteristics 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 The TSkew value applies only when the BC_CLK_DELAY[0:1] entry in the Cbox WRITE_ONCE chain (Table 5–24) is set to zero phases of delay for Bcache clock. The TSkew specified for BcAdd_H signals is only with respect to the associated clock. The duty cycle for 2.5X single data mode 2 GCLK phases high and 3 GCLK phases low. The duty cycle for 3.5X single data mode 3 GCLK phases high and 4 GCLK phases low.
10 Thermal Management This chapter describes the 21264/EV67 thermal management and thermal design considerations, and is organized as follows: • Operating temperature • Heat sink specifications • Thermal design considerations 10.1 Operating Temperature The 21264/EV67 is specified to operate when the temperature at the center of the heat sink (Tc) is as shown in Table 10–1. Temperature Tc should be measured at the center of the heat sink, between the two package studs.
Operating Temperature Table 10–2 lists the values for the center of heat-sink-to-ambient (θca) for the 21264/ EV67 587-pin PGA. Tables 10–3 through 10–8 show the allowable Ta (without exceeding Tc) at various airflows. Table 10–2 θca at Various Airflows for 21264/EV67 Airflow (linear ft/min) 100 200 400 800 1000 θca with heat sink type 1 (°C/W) 2.0 1.2 0.65 0.40 0.37 θca with heat sink type 2 (°C/W) 1.4 0.78 0.45 0.33 0.31 θca with heat sink type 31 (°C/W) 1 — 0.
Heat Sink Specifications Table 10–7 Maximum Ta for 21264/EV67 @ 750 MHz and @ 2.0 V with Various Airflows Airflow (linear ft/min) 100 200 400 800 1000 Maximum Ta with heat sink type 1 (°C) — — 22.1 42.6 45.1 Maximum Ta with heat sink type 2 (°C) — — 38.5 48.4 50.0 Maximum Ta with heat sink type 3 (°C) 1 1 — 44.3 — Heat sink type 3 has a 80 mm × 80 mm × 15 mm fan attached. Table 10–8 Maximum Ta for 21264/EV67 @ 833 MHz and @ 2.
Heat Sink Specifications Figure 10–1 Type 1 Heat Sink 80.5 mm (3.17 in) 80.5 mm (3.17 in) 25.4 mm (1.0 in) 32.5 mm (1.280 in) FM-06119.
Heat Sink Specifications Figure 10–2 shows the heat sink type 2, along with its approximate dimensions. Figure 10–2 Type 2 Heat Sink 81.0 mm (3.19 in) 81.0 mm (3.19 in) 25.4 mm (1.0 in) 44.5 mm (1.75) FM-06120.
Heat Sink Specifications Figure 10–3 shows heat sink type 3, along with its approximate dimensions. The cooling fins of heat sink type 3 are cross-cut. Also, an 80 mm × 80 mm × 15 mm fan is attached to heat sink type 3. Figure 10–3 Type 3 Heat Sink 80.0 mm (3.15 in) 71.5 mm (2.815 in) 25.4 mm (1.0 in) 40.0 mm (1.575 in) 27.3 mm (1.075 in) 80.0 mm (3.15 in) 71.5 mm (2.815 in) 80.0 mm (3.15 in) Fan Fan 15 mm (0.59 in) (1.62 in) 70.65 mm (2.815 in) 10–6 Thermal Management FM-06121.
Thermal Design Considerations 10.3 Thermal Design Considerations Follow these guidelines for printed circuit board (PCB) component placement: • Orient the 21264/EV67 on the PCB with the heat sink fins aligned with the airflow direction. • Avoid preheating ambient air. Place the 21264/EV67 on the PCB so that inlet air is not preheated by any other PCB components. • Do not place other high power devices in the vicinity of the 21264/EV67. Do not restrict the airflow across the 21264/EV67 heat sink.
11 Testability and Diagnostics This chapter describes the 21264/EV67 user-oriented testability and diagnostic features. These features include automatic power-up self-test, Icache initialization from external serial ROMs, and the serial diagnostic terminal port. The boundary-scan register, which is another testability and diagnostic feature, is listed in Appendix B. The boundary-scan register is compatible with IEEE Standard 1149.1.
SROM/Serial Diagnostic Terminal Port Table 11–1 Dedicated Test Port Pins (Continued) Pin Name Type Function SromClk_H Output SROM clock/Diagnostic terminal data output SromOE_L Output SROM enable/Diagnostic terminal enable TestStat_H Output BiST status/timeout output 11.2 SROM/Serial Diagnostic Terminal Port This port supports two functions. During power-up, it supports automatic initialization of the Cbox configuration registers and the Icache from the system serial ROMs.
IEEE 1149.1 Port On the receive side, while in native mode, any transition on the Ibox I_CTL [SL_RCV], driven from the SromData_H pin, results in a trap to the PALcode interrupt handler. When in PALmode, all interrupts are blocked. The interrupt routine then begins sampling I_CTL [SL_RCV] under a software timing loop to input as much data as needed, using the chosen serial line protocol. 11.3 IEEE 1149.1 Port The IEEE 1149.1 Test Access Port consists of the Tdi_H, Tdo_H, Tms_H, Tck_H, and Trst_L pins.
TestStat_H Pin Figure 11–1 TAP Controller State Machine 1 Test Logic Reset 0 Run-Test/Idle 1 Select-DR-Scan 1 0 0 0 1 1 Capture-DR Capture-IR 0 0 Shift-IR 0 Shift-DR 0 1 1 1 Exit1-DR Values shown are for TMS. 1 Select-IR-Scan 1 Exit1-IR 0 0 0 Pause-DR Pause-IR 0 1 0 1 0 Exit2-DR Exit2-IR 1 1 Update-DR 1 Update-IR 0 1 Scan Sequence 0 Scan Sequence MK145508.AI4 11.4 TestStat_H Pin The TestStat_H pin serves two purposes.
Power-Up Self-Test and Initialization Figure 11–2 TestStat_H Pin Timing During Power-Up Built-In Self-Test (BiST) ClkFwdReset_L Tbox_Reset_A_L TBox Reset Engine Idle DoBist DoResult TestStatus_H DoSROM Idle BiSTResult BiSTResult OR T LKG-10950A-98WF Figure 11–3 TestStat_H Pin Timing During Built-In Self-Initialization (BiSI) Tbox_Rst_A_L1 TBox Reset Engine1 Idle DoMfgSelfinit TestStatus_H1 Idle TimeOut ClkFwdRst_L1 LKG-10951A-98WF 11.
Power-Up Self-Test and Initialization In the SROM represented in Figure 11–4, the length for fields Cbox Config Data(0,n) plus MBZ(m,0) must equal 367 bits. (If Cbox Config Data(0,n) is (0,366), MBZ would be zero.) For the 21264/EV67, Cbox Config Data is 304 bits; the value for n is 303. Therefore, the value MBZ field for Pass 3 is: MBZ(m,0) = 367 minus 304 = 63 = (62,0) Tables 11–3 and 5–24 describe the details of the Icache and Cbox bit fields, respectively.
Notes on IEEE 1149.1 Operation and Compliance The instruction cache lines are loaded in the reverse order. If the fetch_count(9,0) is zero, then, no instruction cache lines are loaded. Since the valid bits are already cleared by the BiST operation, the first instruction fetch is missed in the instruction cache and the chip seeks instructions from the offchip memory.
A Alpha Instruction Set This appendix provides a summary of the Alpha instruction set and describes the 21264/EV67 IEEE floating-point conformance. It is organized as follows: • Alpha instruction summary • Reserved opcodes • IEEE floating-point instructions • VAX floating-point instructions • Independent floating-point instructions • Opcode summary • Required PALcode function codes • IEEE floating-point conformance A.
Alpha Instruction Summary Table A–1 Instruction Format and Opcode Notation (Continued) Instruction Format Format Symbol Opcode Notation Memory/ branch Mbr oo.h oo is the 6-bit opcode field. h is the high-order 2 bits of the displacement field. Operate Opr oo.ff oo is the 6-bit opcode field. ff is the 7-bit function code field. PALcode Pcd oo oo is the 6-bit opcode field; the particular PALcode instruction is specified in the 26-bit function code field.
Alpha Instruction Summary Table A–2 Architecture Instructions (Continued) Mnemonic Format Opcode Description BSR Mbr 34 Branch to subroutine CALL_PAL Pcd 00 Trap to PALcode CMOVEQ Opr 11.24 CMOVE if = zero CMOVGE Opr 11.46 CMOVE if ≥ zero CMOVGT Opr 11.66 CMOVE if > zero CMOVLBC Opr 11.16 CMOVE if low bit clear CMOVLBS Opr 11.14 CMOVE if low bit set CMOVLE Opr 11.64 CMOVE if ≤ zero CMOVLT Opr 11.44 CMOVE if < zero CMOVNE Opr 11.26 CMOVE if ≠ zero CMPBGE Opr 10.
Alpha Instruction Summary Table A–2 Architecture Instructions (Continued) Mnemonic Format Opcode Description CVTGQ F-P 15.0AF Convert G_floating to quadword CVTLQ F-P 17.010 Convert longword to quadword CVTQF F-P 15.0BC Convert quadword to F_floating CVTQG F-P 15.0BE Convert quadword to G_floating CVTQL F-P 17.030 Convert quadword to longword CVTQS F-P 16.0BC Convert quadword to S_floating CVTQT F-P 16.0BE Convert quadword to T_floating CVTST F-P 16.
Alpha Instruction Summary Table A–2 Architecture Instructions (Continued) Mnemonic Format Opcode Description FCMOVGT F-P 17.02F FCMOVE if > zero FCMOVLE F-P 17.02E FCMOVE if ≤ zero FCMOVLT F-P 17.02C FCMOVE if < zero FCMOVNE F-P 17.02B FCMOVE if ≠ zero FETCH Mfc 18.8000 Prefetch data FETCH_M Mfc 18.A000 Prefetch data, modify intent FTOIS F-P 1C.78 Floating to integer move, S_floating FTOIT F-P 1C.70 Floating to integer move, T_floating IMPLVER Opr 11.
Alpha Instruction Summary Table A–2 Architecture Instructions (Continued) Mnemonic Format Opcode Description LDS Mem 22 Load S_floating LDT Mem 23 Load T_floating LDWU Mem 0C Load zero-extended word MAXSB8 Opr 1C.3E Vector signed byte maximum MAXSW4 Opr 1C.3F Vector signed word maximum MAXUB8 Opr 1C.3C Vector unsigned byte maximum MAXUW4 Opr 1C.3D Vector unsigned word maximum MB Mfc 18.4000 Memory barrier MF_FPCR F-P 17.025 Move from FPCR MINSB8 Opr 1C.
Alpha Instruction Summary Table A–2 Architecture Instructions (Continued) Mnemonic Format Opcode Description PKWB Opr 1C.36 Pack words to bytes RC Mfc 18.E000 Read and clear RET Mbr 1A.2 RPCC Mfc 18.C000 Read process cycle counter RS Mfc 18.F000 Read and set S4ADDL Opr 10.02 Scaled add longword by 4 S4ADDQ Opr 10.22 Scaled add quadword by 4 S4SUBL Opr 10.0B Scaled subtract longword by 4 S4SUBQ Opr 10.2B Scaled subtract quadword by 4 S8ADDL Opr 10.
Reserved Opcodes Table A–2 Architecture Instructions (Continued) Mnemonic Format Opcode Description STW Mem 0D Store word SUBF F-P 15.081 Subtract F_floating SUBG F-P 15.0A1 Subtract G_floating SUBL Opr 10.09 Subtract longword SUBL/V Opr 10.49 Subtract longword with integer overflow enable SUBQ Opr 10.29 Subtract quadword SUBQ/V Opr 10.69 Subtract quadword with integer overflow enable SUBS F-P 16.081 Subtract S_floating SUBT F-P 16.
IEEE Floating-Point Instructions A.2.2 Opcodes Reserved for PALcode Table A–4 lists the 21264/EV67-specific instructions. See Chapter 2 for more information. Table A–4 Opcodes Reserved for PALcode 21264/EV67 Mnemonic Opcode Architecture Mnemonic Function HW_LD 1B PAL1B Performs Dstream load instructions. HW_ST 1F PAL1F Performs Dstream store instructions. HW_REI 1E PAL1E Returns instruction flow to the program counter (PC) pointed to by EXC_ADDR internal processor register (IPR).
IEEE Floating-Point Instructions Table A–5 IEEE Floating-Point Instruction Function Codes (Continued) SQRTS 08B 00B 04B 0CB 18B 10B 14B 1CB SQRTT 0AB 02B 06B 0EB 1AB 12B 16B 1EB SUBS 081 001 041 0C1 181 101 141 1C1 SUBT 0A1 021 061 0E1 1A1 121 161 1E1 Mnemonic /SU /SUC /SUM /SUD /SUI /SUIC /SUIM /SUID ADDS 580 500 540 5C0 780 700 740 7C0 ADDT 5A0 520 560 5E0 7A0 720 760 7E0 CMPTEQ 5A5 CMPTLT 5A6 CMPTLE 5A7 CMPTUN 5A4 CVTQS 7BC 73C 77
VAX Floating-Point Instructions Programming Note: In order to use CMPTxx with software completion trap handling, it is necessary to specify the /SU IEEE trap mode, even though an underflow trap is not possible. In order to use CVTQS or CVTQT with software completion trap handling, it is necessary to specify the /SUI IEEE trap mode, even though an underflow trap is not possible. A.
Opcode Summary Table A–7 Independent Floating-Point Instruction Function Codes Mnemonic None /V /SV CPYS — — CPYSE 020 022 — — CPYSN 021 — — CVTLQ 010 CVTQL 030 — 130 — 530 FCMOVEQ 02A — — FCMOVGE 02D — — FCMOVGT 02F — — FCMOVLE 02E — — FCMOVLT 02C — — MF_FPCR 025 — — MT_FPCR 024 — — A.6 Opcode Summary Table A–8 lists all Alpha opcodes from 00 (CALL_PAL) through 3F (BGT).
Required PALcode Function Codes Table A–8 Opcode Summary (Continued) Offset 00 08 10 18 20 28 30 38 3/B Res LDQ_U INTM* \PAL\ (mem) (op) LDT (mem) LDQ_L (mem) FBLE (br) BLE (br) 4/C LDWU Res ITFP* FPTI* STF (mem) STL (mem) BSR (br) BLBS (br) 5/D Res STW FLTV* \PAL\ (op) STG (mem) STQ (mem) FBNE (br) BNE (br) 6/E Res STB FLTI* (op) \PAL\ STS (mem) STL_C (mem) FBGE (br) BGE (br) 7/F Res STQ_U FLTL* \PAL\ (mem) (op) STT (mem) STQ_C (mem) FBGT (br) BGT (br) Table
IEEE Floating-Point Conformance A.8 IEEE Floating-Point Conformance The 21264/EV67 supports the IEEE floating-point operations defined in the Alpha System Reference Manual, Revision 7 and therefore also from the Alpha Architecture Handbook, Version 4. Support for a complete implementation of the IEEE Standard for Binary Floating-Point Arithmetic (ANSI/IEEE Standard 754 1985) is provided by a combination of hardware and software.
IEEE Floating-Point Conformance The 21264/EV67 does not produce a denormal result for the underflow exception. Instead, a true zero (+0) is written to the destination register. In the 21264/EV67, the FPCR underflow to zero (UNDZ) bit must be set if the underflow disable (UNFD) bit is set. If desired, trapping on underflow can be enabled by the instruction and the FPCR, and software may compute the denormal value as defined in the IEEE standard.
IEEE Floating-Point Conformance Table A–11 Exceptional Input and Output Conditions (Continued) 21264/EV67 Hardware Supplied Result Exception Inf operand ±Inf (none) QNaN operand QNaN (none) SNaN operand QNaN Invalid Op 0 * Inf CQNaN Invalid Op QNaN operand QNaN (none) SNaN operand QNaN Invalid Op 0/0 or Inf/Inf CQNaN Invalid Op A/0 (A not 0) ±Inf Div Zero A/Inf ±0 (none) Inf/A ±Inf (none) +Inf operand +Inf (none) QNaN operand QNaN (none) SNaN operand QNaN Invalid Op
IEEE Floating-Point Conformance Table A–11 Exceptional Input and Output Conditions (Continued) Alpha Instructions 21264/EV67 Hardware Supplied Result Exception SNaN operand 0 Invalid Op Inexact result Result Inexact Integer overflow Truncated result Invalid Op Result Inexact Inf operand ±Inf (none) QNaN operand QNaN (none) SNaN operand QNaN Invalid Op CVTfi OUTPUT CVTif OUTPUT Inexact result CVTff INPUT CVTff OUTPUT (same as ADDx) FBEQ FBNE FBLT FBLE FBGT FBGE LDS LDT STS STT CPYS
B 21264/EV67 Boundary-Scan Register This appendix contains the BSDL description of the 21264/EV67 boundary-scan register. B.1 Boundary-Scan Register The Boundary-Scan Register (BSR) on the 21264/EV67 is 367 bits long. It is accessed by the three public (SAMPLE, EXTEST, CLAMP) instructions. The register operation for the public instructions is compliant with the IEEE 1149.1 standard.
Boundary-Scan Register SysDataInClk_H :in BcDataOutClk_L :out BcDataOutClk_H :out ClkIn_H :linkage ClkIn_L :linkage PLL_VDD :linkage EV6Clk_H :linkage EV6Clk_L :linkage Spare_4 :linkage Spare_5 :linkage BcTag_H :inout BcVref :linkage BcTagInClk_H :in BcTagClkIn_H BcTagParity_H :inout BcTagShared_H :inout BcTagDirty_H :inout BcTagValid_H :inout BcTagOutClk_L :out BcTagOutClk_H :out BcTagOE_L :out BcTagWr_L :out BcDataWr_L :out BcLoad_L :out BcDataOE_L :out BcAdd_H :out SysAddOut_L :out SysAddIn_L :in SysAdd
Boundary-Scan Register " " "SysCheck_L "SysDataInClk_H AB38, AC39, AD38, AF40, AH38, AJ39, AL41, AK38, "& AN39, AP38, AR39, AT38, AY38, AV36, AW35, AV34),"& : (L7 , AA5 , AK8 , BA13, L39 , AA41, AM40, AY34),"& : (D8 , P4 , AF6 , AY6 , E37 , R43 , AG41, AV40),"& : "SysDataOutClk_L "SysDataInValid_L "SysDataOutValid_L "BcAdd_H " " "BcDataOE_L "BcLoad_L "BcDataWr_L "BcData_H " " " " " " " " " " " " " " " "BcCheck_H " "BcDataInClk_H "Spare_7 "BcDataOutClk_L "BcDataOutClk_H "BcTag_H " " "BcTagValid_H "BcTagDi
Boundary-Scan Register "NoConnect_0 "NoConnect_1 "ClkFwdRst_H "EV6Clk_H "EV6Clk_L "Spare_4 "Spare_5 "PLL_VDD "Spare_0 "MiscVref "Spare_2 "DCOK_H "VSS: " " " " " " " " " " " " “ "VDD " " " " " " " " " " " : : : : : : : : : : : : BB14, BD2 , BE11, AM6 , AL7 , AT4 , AR3 , AV8 , BC21, AV22, BE9 , AY18, (C1 , BA41, BC33, BE25, E11 , BE3 , AU45, E41 , BA29, BE21, G15 , BC7 , AN5 , E9 , : (B2 , BB40, D34 , AV20, B8 , AF2 , AH42, B38 , AY30, B32 , AY24, BB18, "& "& "& "& "& "& "& "& "& "& "& "& W3 , R45 , AE39,
Boundary-Scan Register "BcLoad_L "BcDataWr_L "BcData_H " " " " " " " " " " " " " " " "BcCheck_H " "BcDataInClk_H "Spare_7 "BcDataOutClk_L "BcDataOutClk_H "BcTag_H " " "BcTagValid_H "BcTagDirty_H "BcTagShared_H "BcTagParity_H "BcTagOE_L "BcTagWr_L "BcTagInClk_H "BcVref "BcTagOutClk_L "BcTagOutClk_H "IRQ_H "Reset_L "SromData_H "SromCLK_H "SromOE_L "Tms_H "Tck_H "Trst_L "Tdi_H "Tdo_H "TestStat_H "ClkIn_H "ClkIn_L "FrameClk_H "FrameClk_L "PllBypass_H "NoConnect_0 "NoConnect_1 "ClkFwdRst_H "EV6Clk_H "EV6Clk_L "
Boundary-Scan Register "VSS " " " " " " " " " " " " “ "VDD " " " " " " " " " " " : (44 497 538 578 94 567 409 109 491 576 141 525 372 93 : (22 519 83 419 25 314 336 40 469 37 466 508 , , , , , , , , , , , , , , , , , , , , , , , , , , 259 233 310 106 531 229 354 328 450 15 100 461 4 228 251 273 428 557 162 69 566 239 174 222 171 34 , , , , , , , , , , , , , , , , , , , , , , , , , , 388 178 21 358 57 482 91 544 199 584 149 488 573 324 380 157 132 516 554 221 362 384 287 303 270 425 , , , , , , , , ,
Boundary-Scan Register " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " 364 363 362 361 360 359 358 357 356 355 354 353 352 351 350 349 348 347 346 345 344 343 342 341 340 339 338 337 336 335 334 333 332 331 330 329 328 327 326 325 324 323 322 321 320 319 318 317 316 315 314 313 312 311 310 309 308 307 306 305 304 303 302 301 300 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (
Boundary-Scan Register " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " B–8 299 298 297 296 295 294 293 292 291 290 289 288 287 286 285 284 283 282 281 280 279 278 277 276 275 274 273 272 271 270 269 268 267 266 265 264 263 262 261 260 259 258 257 256 255 254 253 252 251 250 249 248 247 246 245 244 243 242 241 240 239 238 237 236 235 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (
Boundary-Scan Register " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " 234 233 232 231 230 229 228 227 226 225 224 223 222 221 220 219 218 217 216 215 214 213 212 211 210 209 208 207 206 205 204 203 202 201 200 199 198 197 196 195 194 193 192 191 190 189 188 187 186 185 184 183 182 181 180 179 178 177 176 175 174 173 172 171 170 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (
Boundary-Scan Register " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " 169 168 167 166 165 164 163 162 161 160 159 158 157 156 155 154 153 152 151 150 149 148 147 146 145 144 143 142 141 140 139 138 137 136 135 134 133 132 131 130 129 128 127 126 125 124 123 122 121 120 119 118 117 116 115 114 113 112 111 110 109 108 107 106 105 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( (
Boundary-Scan Register " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " 104 103 102 101 100 99 98 97 96 95 94 93 92 91 90 89 88 87 86 85 84 83 82 81 80 79 78 77 76 75 74 73 72 71 70 69 68 67 66 65 64 63 62 61 60 59 58 57 56 55 54 53 52 51 50 49 48 47 46 45 44 43 42 41 40 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( BC_2, BC_2, BC_2, BC_2, B
Boundary-Scan Register " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " " 39 38 37 36 35 34 33 32 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_2, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3, BC_3,
C Serial Icache Load Predecode Values See the Alpha Motherboards Software Developer’s Kit (SDK) for information.
D PALcode Restrictions and Guidelines D.1 Restriction 1 : Reset Sequence Required by Retire Logic and Mapper For convenience of implementation, the Ibox retire logic done status bits are not initialized during reset. Instead, as shown in the example below, the first batch of valid instructions sweeps through inum-space and initializes these bits.
Restriction 1 : Reset Sequence Required by Retire Logic and Mapper D–2 addt mult f31,f31,f2 f31,f31,f3 /* initialize F.P. Reg. 2*/ /* initialize F.P. Reg. 3*/ addq addq addt mult r31,r31,r4 r31,r31,r5 f31,f31,f4 f31,f31,f5 /* /* /* /* initialize initialize initialize initialize Int. Int. F.P. F.P. Reg. Reg. Reg. Reg. 4*/ 5*/ 4*/ 5*/ addq addq addt mult r31,r31,r6 f31,r31,r7 f31,f31,f6 f31,f31,f7 /* /* /* /* initialize initialize initialize initialize Int. Int. F.P. F.P. Reg. Reg. Reg. Reg.
Restriction 1 : Reset Sequence Required by Retire Logic and Mapper addq addt mult r31,r31,r27 f31,f31,f26 f31,f31,f27 /* initialize Int. Reg. 27*/ /* initialize F.P. Reg. 26*/ /* initialize F.P. Reg. 27*/ addq addq addt mult r31,r31,r28 r31,r31,r29 f31,f31,f28 f31,f31,f29 /* /* /* /* initialize initialize initialize initialize Int. Int. F.P. F.P. Reg. Reg. Reg. Reg.
Restriction 1 : Reset Sequence Required by Retire Logic and Mapper ** ** ** ** ** ** ** ** ** ** ** ** ** */ or the PALcode, but it must be done in the manner and order below. It assumes that the retirator has been initialized, that the non-shadow registers are mapped, and that mapper source enables are on. Source enables are on. For fault-reset and wake from sleep, we need to ensure we are in the icache so we don’t fetch junk that touches the shadow sources before we write the destinations.
Restriction 1 : Reset Sequence Required by Retire Logic and Mapper addq br tch7: br r31,r31,r21 r31, nxt8 r31, tch8 /* initialize Shadow Reg. 5*/ /* continue executing in next block*/ /* fetch in next block*/ nxt8: addq addq br tch8: br nxt9: r31,r31,r22 r31,r31,r23 r31, nxt9 r31, nxt0 /* /* /* /* /* ** ** ** ** ** ** ** */ /* ** ** ** ** */ initialize Shadow Reg. 6*/ initialize Shadow Reg.
Restriction 1 : Reset Sequence Required by Retire Logic and Mapper br bccend:mtpr addq addq r31,bccshf /* continue shifting*/ r31,EV6__EXC_ADDR + 16/* dummy IPR write - sets SCBD bit 4 */ r31,r31,r0 /* nop*/ r31,r31,r1 /* nop*/ mtpr r31,EV6__EXC_ADDR + 16 /* also a dummy IPR write /* stalls until above write /* retires*/ /* predicts fall through in PALmode*/ /* fools ibox predictor into infinite loop*/ /* nop*/ beq br addq r31, bccnxt r31, .-4 r31,r31,r1 bccnxt:addq r31,4,r0 /* load PCTX.....
Restriction 1 : Reset Sequence Required by Retire Logic and Mapper mtpr bis bis mulq r31,EV6__PCTR_CTL /* 2nd buffer fetch block for above map-stall /* and 2nd clear PCTR_CTL (SCRBRD=4)*/ r31,1,r0 /* set up value for demon write*/ r31,1,r0 /* set up value for demon write*/ r31,r31,r0 /* nop*/ lda r0,0x780(r31) mb whint r0 mb bis r31,1,r0 ldq_p r1,0x780(r31) ldq_p r0,0x788(r31) mb mb /* this is new initialization stuff to prevent*/ /* ld/st below from going off-chip */ /* set up value for demon write*/
Restriction 2 : No Multiple Writers to IPRs in Same Scoreboard Group br r31,palbase_init palbase_init: br r0, br60 /* r0 <- current location */ br60: lda r1, (EntryPoint-br60)(r0) /* r1 <- location of codebase */ mtpr r1, EV6__PAL_BASE /* set up pal_base register */ bis mtpr r31, 2, r0 r0, EV6__VA_CTL bis mtpr r31, 8, r0 r0, EV6__M_CTL br r0, jmp0 jmp0: addq r0, (jmp1-jmp0+1), r0 hw_rets/jmp(r0) jmp1: lda sll mtpr r1, 1(r31) r1, 32, r1 r1, EV6__CC_CTL /* r1 <- cc_ctl enable bit */ /* Enable/clear th
Guideline 6 : Avoid Consecutive Read-Modify-Write-Read-Modify-Write D.4 Guideline 6 : Avoid Consecutive Read-Modify-Write-ReadModify-Write Avoid consecutive read-modify-write-read-modify-write sequences to IPRs in the same scoreboard group. The latency between the first write and the second read is determined by the retire latency of the IPR.
Restriction 9 : PALmode Istream Address Ranges Bad_interrupt_flow_entry: ADDQ R31,R31,R0 STF Fa,(Rb) ; This STF might not undergo a dirty source register ; check and might give wrong results ADDQ R31,R31,R0 ADDQ R31,R31,R0 ................................
Restriction 11: Ibox IPR Update Synchronization D.8 Restriction 11: Ibox IPR Update Synchronization When updating any Ibox IPR, a return to native (virtual) mode should use the HW_RET instruction with the associated STALL bit set to ensure that the updated IPR value affects all instructions following the return path. The new IPR value takes effect only after the associated HW_MTPR instruction is retired.
Guideline 16 : JSR-BAD VA D.12 Guideline 16 : JSR-BAD VA A JSR memory format instruction that generates a bad VA (IACV) trap requires PALcode assistance to determine the correct exception address. If the EXC_SUM[BAD_IVA] is set, bits [63,1] of the exception address are valid in the VA IPR and not the EXC_ADDR as usual. The PALmode bit, however, is always located in EXC_ADDR[0] and must be combined, if necessary, by PALcode to determine the full exception address. D.
Restriction 22: HW_RET/STALL After HW_MTPR IS0/IS1 BIS R31, R31, R31 HW_MTPR R9, ASN0, SCBD<4> HW_MTPR R9, ASN1, SCBD<7> This sequence guarantees, through the register dependency on R0, that neither HW_MTPR are issued before scoreboard bits [7:4] are cleared. In addition, there must be a HW_RET/STALL after a HW_MTPR ASN0/HW_MTPR ASN1 pair. Finally, these two writes must be executed atomically, that is, either both must be retired or neither may be retired. D.
Restriction 24: HW_RET/STALL After HW_MTPR IC_FLUSH, IC_FLUSH_ASM, xxx HW_ST/C -> R0 Bxx R0, try_again STQ ; Force next ST/C to fail if no preceding LDxL HW_RET D.20 Restriction 24: HW_RET/STALL After HW_MTPR IC_FLUSH, IC_FLUSH_ASM, CLEAR_MAP There must be a HW_RET/STALL after a HW_MTPR IC_FLUSH, IC_FLUSH_ASM, or CLEAR_MAP. The Icache flush associated with these instructions will not occur until the HW_RET/STALL occurs and all outstanding Istream fetches have been completed.
Restriction 27: Reset of ‘Force-Fail Lock Flag’ State in PALcode D.23 Restriction 27: Reset of ‘Force-Fail Lock Flag’ State in PALcode A virtual mode load or store is required in PALcode before the execution of any loadlocked or store-conditional instructions. The virtual-mode load or store may not be a HW_LD, HW_ST, LDx_L, ECB, or WH64. D.
Restriction 30 : HW_MTPR and HW_MFPR to the Cbox CSR ALIGN_FETCH_BLOCK sys__cbox: mb hw_mfpr p6, EV6__I_CTL lda p4, ^xFCFF(r31) and p6, p4, p4 ; ; ; ; quiet the dstream (4,0L) get i_ctl mask for clearing SBE bits clear SBE bits sbe_off_offset = hw_mtpr p4, EV6__I_CTL br p6, sys__cbox_sbe_off sys__cbox_sbe_off: addq p6, #, p6 bsr r31, . ALIGN_FETCH_BLOCK <^x47FF041F>; align hw_mtpr r31, EV6__IC_FLUSH bne r31, .
Restriction 31 : I_CTL[VA_48] Update sys__cbox_over6: beq bis br sys__cbox_touch6: br sys__cbox_over7: bis sll br sys__cbox_touch7: br p6, sys__cbox_over8 r31, r31, r31 r31, sys__cbox_over7 r31, sys__cbox_touch7 p7, r31, p20 p7, #6, p7 r31, sys__cbox_over2 r31, sys__cbox_touch8 sys__cbox_over8: beq r31, sys__cbox_cbox_done PVC_VIOLATE <1006> br r31, .
Restriction 33 : HW_LD Physical/Lock Use D.29 Restriction 33 : HW_LD Physical/Lock Use The HW_LD physical/lock instruction must be one of the first three instructions in a quad-instruction aligned fetch block. A pipeline error can occur if the HW_LD physical/lock is fetched as the fourth instruction of the fetch block. D.
Guideline 39: Writing Multiple DTB Entries in the Same PAL Flow D.35 Guideline 39: Writing Multiple DTB Entries in the Same PAL Flow If a PALcode flow intends to write multiple DTB entries (as would occur in a double miss), it must take care to keep subsequent HW_MTPR DTB_TAGx writes from corrupting the staging register TAG values prior to retirement of the HW_MTPR DTB_PTEx, which triggers the final DTB update.
Restriction 40: Scrubbing a Single-Bit Error hw_mtpr lda bis bis r31, r20, r31, r31, EV6__DTB_IA ^x3301(r31) r31, r31 r31, r31 ; (7,1L) flush dtb ; set WE, RE hw_mtpr srl sll bis r31, r4, #13, r6 r6, #EV6__DTB_PTE0__PFN__S, r6 r6, r20, r6 ; ; ; ; wait for retire shift byte offset shift into position produce pte hw_mtpr hw_mtpr hw_mtpr hw_mtpr r4, r4, r6, r6, ; ; ; ; (2&6,0L) (1&5,1L) (0&4,0L) (3&7,1L) mb bis bis bis r31, r31, r31 r31, r31, r31 r31, r31, r31 EV6__DTB_TAG0
Restriction 41: MTPR ITB_TAG, MTPR ITB_PTE Must Be in the Same Fetch Block D.37 Restriction 41: MTPR ITB_TAG, MTPR ITB_PTE Must Be in the Same Fetch Block Write the ITB_TAG and ITB_PTE registers in the same fetch block. This avoids a mispredict path write of invalid data to the ITB_TAG register. D.38 Restriction 42: Updating VA_CTL, CC_CTL, or CC IPRs When writing to the VA_CTL, CC_CTL, or CC IPRs, write the same value twice in distinct fetch blocks.
Restriction 46: Avoiding Live locks in Speculative Load CRD Handlers D.42 Restriction 46: Avoiding Live locks in Speculative Load CRD Handlers Speculative load CRD handlers that release from the interrupt without scrubbing a cache block could suffer from the following live-lock condition: 1. An initial error on a speculative load forces a CRD interrupt. 2. The CRD releases without scrubbing the block.
Restriction 47: Cache Eviction for Single-Bit Cache Errors If "CBOX_ERR[C_ADDR]" has not changed when the CRD_HANDLER is reentered, or "CBOX_ERR[C_STAT] == 0x0", all cache locations should be evicted to avoid the live lock described above. ; ; ; ; ; ; Sample code for evicting cache. This method loads a 64K block, then exits the CRD_HANDLER to check if the sberr has been evicted. If not it loads the next 64K block.
Restriction 48: MB Bracketing of Dcache Writes to Force Bad Data ECC and Force D.44 Restriction 48: MB Bracketing of Dcache Writes to Force Bad Data ECC and Force Bad Tag Parity Writes to DC_CTL[F_BAD_DECC] and DC_CTL[DCDAT_ERR_EN] must be bracketed by MB instructions to quiesce the memory system. The Istream must also be quiesced before and during the sequence, as described in Section D.26.
E 21264/EV67-to-Bcache Pin Interconnections This appendix provides the pin interface between the 21264/EV67 and Bcache SSRAMs. E.1 Forwarding Clock Pin Groupings Table E–1 lists the correspondance between the clock signals for the 21264/EV67 and Bcache (late-write non-bursting and dual-data rate) SSRAMs.
Late-Write Non-Bursting SSRAMs Table E–1 Bcache Forwarding Clock Pin Groupings (Continued) Pad and Pin Input Clock Output Clocks BcTagShared_H BcTagInClk_H BcTagOutClk_x BcTagDirty_H BcTagInClk_H BcTagOutClk_x BcTagValid_H BcTagInClk_H BcTagOutClk_x E.2 Late-Write Non-Bursting SSRAMs Table E–2 provides the data pin connections between late-write non-bursting SSRAMs and the 21264/EV67 or the system board. Table E–3 provides the same information for the tag pins.
Dual-Data Rate SSRAMs Table E–3 Late-Write Non-Bursting SSRAMs Tag Pin Usage (Continued) 21264/EV67 Signal Name or Board Connection Late-Write SSRAM Tag Pin Name Set from board to 1/2 the 21264/EV67 core voltage CK_L Set from board to 1/2 the 21264/EV67 core voltage VREF1_H VREF2_H Set from board (implementation dependent) ZQ_H BcTagValid_H DQx BcTagDirty_H DQx BcTagShared_H DQx Unconnected TMS_H Unconnected TDI_H Unconnected TCK_H Unconnected TDC_H E.
Dual-Data Rate SSRAMs Table E–4 Dual-Data Rate SSRAM Data Pin Usage (Continued) 21264/EV67 Signal Name or Board Connection Dual-Data Rate SSRAM Data Pin Name From board, pulled up to VDD TMS_H From board, pulled up to VDD TDI_H Unconnected or pulled down to VSS TRST_L BcDataOE_L OE_L (G_L) From board, pulled down to VSS SD/DD_L (B3) Table E–5 Dual-Data Rate SSRAM Tag Pin Usage 21264/EV67 Signal Name or Board Connection Dual-Data Rate SSRAM Tag Pin Name E–4 BcAdd_H[23:6] SA_H[17:0] BcTag_H[3
Glossary This glossary provides definitions for specific terms and acronyms associated with the Alpha 21264/EV67 microprocessor and chips in general. abort The unit stops the operation it is performing, without saving status, to perform some other operation. address space number (ASN) An optionally implemented register used to reduce the need for invalidation of cached address translations for process-specific addresses when a context switch occurs.
asynchronous system trap (AST) A software-simulated interrupt to a user-defined routine. ASTs enable a user process to be notified asynchronously, with respect to that process, of the occurrence of a specific event. If a user process has defined an AST routine for an event, the system interrupts the process and executes the AST routine when that event occurs. When the AST routine exits, the system resumes execution of the process at the point where it was interrupted.
boot Short for bootstrap. Loading an operating system into memory is called booting. BSR Boundary-scan register. buffer An internal memory area used for temporary storage of data records during input or output operations. bugcheck A software condition, usually the response to software’s detection of an “internal inconsistency,” which results in the execution of the system bugcheck code. bus A group of signals that consists of many transmission lines or wires.
cache hit The status returned when a logic unit probes a cache memory and finds a valid cache entry at the probed address. cache interference The result of an operation that adversely affects the mechanisms and procedures used to keep frequently used items in a cache. Such interference may cause frequently used items to be removed from a cache or incur significant overhead operations to ensure correct results. Either action hampers performance. cache line See cache block.
clock offset (or clkoffset) The delay intentionally added to the forwarded clock to meet the setup and hold requirements at the Receive Flop. CMOS Complementary metal-oxide semiconductor. A silicon device formed by a process that combines PMOS and NMOS semiconductor material. conditional branch instructions Instructions that test a register for positive/negative or for zero/nonzero. They can also test integer registers for even/odd.
direct-mapping cache A cache organization in which only one address comparison is needed to locate any data in the cache, because any block of main memory data can be placed in only one possible position in the cache. direct memory access (DMA) Access to memory by an I/O device that does not require processor intervention. dirty One status item for a cache block. The cache block is valid and has been written so that it may differ from the copy in system main memory.
external cache See second-level cache. FEPROM Flash-erasable programmable read-only memory. FEPROMs can be bank- or bulkerased. Contrast with EEPROM. FET Field-effect transistor. FEU The unit within the 21264/EV67 microprocessor that performs floating-point calculations. firmware Machine instructions stored in nonvolatile memory.
of the clock forward logic. Additionally, the framing clock can have a period that is less than, equal to, or greater than the time it takes to send a full four cycle command/ address. GCLK Global clock within the 21264/EV67. granularity A characteristic of storage systems that defines the amount of data that can be read and/ or written with a single instruction, or read and/or written independently. hardware interrupt request (HIR) An interrupt generated by a peripheral device.
interface reset A synchronously received reset signal that is used to preset and start the clock forwarding circuitry. During this reset, all forwarded clocks are stopped and the presettable count values are applied to the counters; then, some number of cycles later, the clocks are enabled and are free running. Internal processor register (IPR) Special registers that are used to configure options or report status. IOWB I/O write buffer. IPGA Interstitial pin grid array. IQ Integer issue queue.
machine check An operating system action triggered by certain system hardware-detected errors that can be fatal to system operation. Once triggered, machine check handler software analyzes the error. MAF Miss address file. main memory The large memory, external to the microprocessor, used for holding most instruction code and data. Usually built from cost-effective DRAM memory chips. May be used in connection with the microprocessor’s internal caches and an external cache.
MSI Medium-scale integration. multiprocessing A processing method that replicates the sequential computer and interconnects the collection so that each processor can execute the same or a different program at the same time. must be one (MBO) A field that must be supplied as one. must be zero (MBZ) A field that is reserved and must be supplied as zero. If examined, it must be assumed to be UNDEFINED. NaN Not-a-Number. An IEEE floating-point bit pattern that represents something other than a number.
output mux counter Counter used to select the output mux that drives address and data. It is reset with the Interface Reset and incremented by a copy of the locally generated forwarded clock. PAL Privileged architecture library. See also PALcode. See also Programmable array logic (hardware). A device that can be programmed by a process that blows individual fuses to create a circuit. PALcode Alpha privileged architecture library code, written to support Alpha microprocessors.
PQFP Plastic quad flat pack. primary cache The cache that is the fastest and closest to the processor. The first-level caches, located on the CPU chip, composed of the Dcache and Icache. program counter That portion of the CPU that contains the virtual address of the next instruction to be executed. Most current CPUs implement the program counter (PC) as a register. This register may be visible to the programmer through the instruction set. PROM Programmable read-only memory.
read stream buffers Arrangement whereby each memory module independently prefetches DRAM data prior to an actual read request for that data. Reduces average memory latency while improving total memory bandwidth. receive counter Counter used to enable the receive flops. It is clocked by the incoming forwarded clock and reset by the Interface Reset. receive mux counter The receive mux counter is preset to a selectable starting point and incremented by the locally generated forward clock.
SDRAM Synchronous dynamic random-access memory. second-level cache A cache memory provided outside of the microprocessor chip, usually located on the same module. Also called board-level, external, or module-level cache. set-associative A form of cache organization in which the location of a data block in main memory constrains, but does not completely determine, its location in the cache.
STRAM Self-timed random-access memory. superpipelined Describes a pipelined machine that has a larger number of pipe stages and more complex scheduling and control. See also pipeline. superscalar Describes a machine architecture that allows multiple independent instructions to be issued in parallel during a given clock cycle. system clock The primary skew controlled clock used throughout the interface components to clock transfer between ASICs, main memory, and I/O bridges.
UNPREDICTABLE Results or occurrences that do not disrupt the basic operation of the processor; the processor continues to execute instructions in its normal manner. Privileged or unprivileged software can trigger UNPREDICTABLE results or occurrences. (This meaning only applies when the word is written in all upper case.) UVPROM Ultraviolet (erasable) programmable read-only memory. VAF See victim address file. valid Allocated.
WAR Write-after-read. word Two contiguous bytes (16 bits) starting on an arbitrary byte boundary. The bits are numbered from right to left, 0 through 15. write-back A cache management technique in which write operation data is written into cache but is not written into main memory in the same operation. This may result in temporary differences between cache data and main memory data. Some logic unit must maintain coherency between cache and main memory.
Index Numerics 21264/EV67, features of, 1–3 32_BYTE_IO Cbox CSR defined, 5–34 A Abbreviations, xix binary multiples, xix register access, xix AC characteristics, 9–6 Address conventions, xx Aggregate mode, 6–18 Aligned convention, xx Alpha instruction summary, A–1 AMASK instruction values, 2–38 ARITH synchronous trap, 6–14 B B_DA_OD pin type, 3–3, 9–2 values for, 9–4 B_DA_PP pin type, 3–3, 9–2 values for, 9–4 BC_BANK_ENABLE Cbox CSR, 4–52, 5–39, 7–13 BC_BPHASE_LD_VECTOR Cbox CSR, 4–45 defined, 5–38 BC_BUR
BC_SJ_BANK_ENABLE Cbox CSR defined, 5–34 BC_TAG_DDM_FALL_EN Cbox CSR, 4–47 defined, 5–35 BC_TAG_DDM_RISE_EN Cbox CSR, 4–47 defined, 5–36 BC_WR_RD_BUBBLES Cbox CSR, 4–49 defined, 5–34 BC_WR_WR_BUBBLE Cbox CSR, 4–54 defined, 5–34 BC_WRT_STS Cbox CSR, 5–39, 7–13 BcTagParity_H signal pin, 3–4, 4–44 BcTagShared_H signal pin, 3–4, 4–44 BcTagValid_H signal pin, 3–4, 4–44 BcTagWr_L signal pin, 3–4, 4–44 BcVref signal pin, 3–4, 4–44 Bidirectional differential amplifier receiver open-drain.
Conventions, xix abbreviations, xix address, xx aligned, xx bit notation, xx caution, xx data units, xxi do not care, xxi external, xxi field notation, xxi note, xxi numbering, xxi ranges and extents, xxi register figures, xxi signal names, xxi unaligned, xx X, xxi CTAG, 4–13 Cbox data register C_DATA, 5–33 described, 2–11, 4–3 duplicate Dcache tag array, 2–11 duplicate Dcache tag array with, 4–13 HW_MTPR and HW_MFPR to CSR, D–15 I/O write buffer, 2–11 internal processor registers, 5–3 probe queue, 2–11 re
Dcache described, 2–12 duplicate tag parity errors, 8–4 duplicate tags with, 4–13 error case summary for, 8–9 fill from Bcache error, 8–6 fill from memory errors, 8–7 initialized by BiST, 7–12 pipelined, 2–16 single-bit correctable ECC error, 8–3 store second error, 8–4 tag parity errors, 8–2 victim extracts, 8–4 Dcache data single-bit correctable ECC errors, 8–3 Dcache tag, initialized by BiST, 7–12 DCOK_H signal pin, 3–4 power-on reset flow, 7–1 DCVIC_THRESHOLD Cbox CSR, defined, 5–34 DFAULT fault, 6–13 D
ECC 64-bit data and check bit code, 8–2 Dcache data single-bit correctable errors, 8–3 for system data bus, 8–2 memory/system port single-bit correctable errors, 8–7 store instructions, 8–4 ENABLE_EVICT Cbox CSR, 4–23, 5–39 ENABLE_PROBE_CHECK Cbox CSR, 8–2 defined, 5–35 ENABLE_STC_COMMAND Cbox CSR, defined, 5–35 FetchBlkSpec, 21264/EV67 command, 4–22, 4–39 Field notation convention, xxi Floating-point arithmetic trap, pipeline abort delay with, 2–16 Floating-point control register, 2–36 PALcode emulation o
I_CTL Ibox control register, 5–15 after fault reset, 7–8 after warm reset, 7–11 at power-on reset state, 7–15 PALshadow registers, 6–11 through sleep mode, 7–10 VA_48 field update, D–17 I_DA pin type, 3–3, 9–2 values for, 9–3 I_DA_CLK pin type, 3–3, 9–2 values for, 9–3 I_DC_POWER pin type, 9–2 I_DC_REF pin type, 3–3, 9–2 values for, 9–3 I_STAT Ibox status register, 5–18 at power-on reset state, 7–15 IACV fault, 6–13 Ibox branch predictor, 2–3 clear virtual-to-physical map register CLR_MAP, 5–21 exception ad
2–16 Integer execution unit.
MB, 21264/EV67 command, 4–13, 4–21 MB_CNT Cbox CSR, operation, 2–32 MBDone, SysDc command, 4–13 Mbox Dcache control register DC_CTL, 5–30 Dcache status register DC_STAT, 5–31 described, 2–12 Dstream translation buffer, 2–13 DTB address space number registers 0 and 1 DTB_ASNx, 5–28 DTB alternate processor mode register DTB_ALTMODE, 5–26 DTB invalidate-all (ASM=0) process register DTB_IAP, 5–27 DTB invalidate-all process register DTB_IA, 5–27 DTB invalidate-single registers 0 and 1 DTB_ISx, 5–27 DTB PTE array
PALcode conditional branches in, D–14 described, 6–1 entries points for, 6–12 exception entry points, 6–13 guidelines for, D–1 HW_LD instruction, 6–3 HW_MFPR instruction, 6–6 HW_MTPR instruction, 6–6 HW_RET instruction, 6–5 HW_ST instruction, 6–4 required function codes, 6–3 reserved opcodes for, 6–3 restrictions for, D–1 PALmode environment, 6–2 PALshadow registers, 6–11 PCTR_CTL performance counter control counter register updating, D–17 PCTR_CTL performance counter control register, 5–23 at power-on rese
ReadBlk, 21264/EV67 command, 4–21 system probes, with, 4–41 ReadBlkI, 21264/EV67 command, 4–22 Security holes with UNPREDICTABLE results, xxii Serial terminal port, 11–2 ReadBlkMod, 21264/EV67 command, 4–22 system probes, with, 4–41 ReadBlkModSpec, 21264/EV67 command, 4–22 SET_DIRTY_ENABLE Cbox CSR, 4–23, 5–39, 7–12 programming, 4–24 SharedToDirty, 21264/EV67 command, 4–22, 4–40 system probes, with, 4–41 Signal name convention, xxi ReadBlkModVic, 21264/EV67 command, 4–22 ReadBlkSpec, 21264/EV67 command,
Store instructions Dcache ECC errors with, 8–4 I/O address space, 2–29 I/O reference ordering, 2–31 Mbox order traps, 2–31 memory address space, 2–29 memory reference ordering, 2–31 translation to external interface, 4–5 Store queue, 2–13 Store-load order trap, 2–32 STx_C instructions in-order processing for, 4–15 locking mechanism for, 4–14 Supply voltage signal pins. See I_DC_POWER pin type Synchronous static random-access memory.
Traps load-load order, 2–32 Mbox order, 2–31 replay, 2–31 store-load order, 2–32 Trst_L signal pin, 3–6 WAIT_RESET reset machine state, 7–18 WAIT_SETTLE reset machine state, 7–17 WAKEUP interrupt, 6–14 WAR, eliminating, 2–6 Warm reset flow, 7–11 UNALIGN fault, 6–13 WAW eliminating, 2–6 WMB instruction processing, 2–34 Unaligned convention, xx WO,n convention, xx U V VA virtual address register, 5–4 at power-on reset state, 7–15 VA_CTL virtual address control register, 5–4 at power-on reset state, 7–1