Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1 NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture, Order Number 253665; Instruction Set Reference A-M, Order Number 253666; Instruction Set Reference N-Z, Order Number 253667; System Programming Guide, Part 1, Order Number 253668; System Programming Guide, Part 2, Order Number 253669.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.
CONTENTS PAGE CHAPTER 1 ABOUT THIS MANUAL 1.1 1.2 1.3 1.3.1 1.3.2 1.3.3 1.3.4 1.3.5 1.3.6 1.3.7 1.4 PROCESSORS COVERED IN THIS MANUAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 OVERVIEW OF THE SYSTEM PROGRAMMING GUIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-3 NOTATIONAL CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-6 Bit and Byte Order . . . . . . . . .
CONTENTS PAGE 2.7.5 2.7.6 2.7.6.1 2.7.7 2.7.7.1 2.7.8 Controlling the Processor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31 Reading Performance-Monitoring and Time-Stamp Counters . . . . . . . . . . . . . . . . . . . . . 2-32 Reading Counters in 64-Bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-33 Reading and Writing Model-Specific Registers . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 4.9.3 4.10 4.10.1 4.10.1.1 4.10.1.2 4.10.1.3 4.10.1.4 4.10.2 4.10.2.1 4.10.2.2 4.10.2.3 4.10.3 4.10.3.1 4.10.3.2 4.10.3.3 4.10.4 4.11 4.11.1 4.11.2 4.12 4.13 Caching Paging-Related Information about Memory Typing . . . . . . . . . . . . . . . . . . . . . . .4-38 CACHING TRANSLATION INFORMATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-38 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 5.8.7.1 5.8.8 5.9 5.10 5.10.1 5.10.2 5.10.3 5.10.4 5.10.5 5.11 5.11.1 5.11.2 5.11.3 5.11.4 5.11.5 5.12 5.13 5.13.1 5.13.2 5.13.3 5.13.4 SYSENTER and SYSEXIT Instructions in IA-32e Mode. . . . . . . . . . . . . . . . . . . . . . . . . . 5-31 Fast System Calls in 64-bit Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-32 PRIVILEGED INSTRUCTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 6.14 6.14.1 6.14.2 6.14.3 6.14.4 6.14.5 6.15 EXCEPTION AND INTERRUPT HANDLING IN 64-BIT MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . 6-22 64-Bit Mode IDT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-23 64-Bit Mode Stack Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6-24 IRET in IA-32e Mode . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 8 MULTIPLE-PROCESSOR MANAGEMENT 8.1 8.1.1 8.1.2 8.1.2.1 8.1.2.2 8.1.3 8.1.4 8.2 8.2.1 8.2.2 8.2.3 8.2.3.1 8.2.3.2 8.2.3.3 8.2.3.4 8.2.3.5 8.2.3.6 8.2.3.7 8.2.3.8 8.2.3.9 8.2.4 8.2.4.1 8.2.4.2 8.2.5 8.3 8.4 8.4.1 8.4.2 8.4.3 8.4.4 8.4.4.1 8.4.4.2 8.4.5 8.5 8.6 8.6.1 8.6.2 8.6.3 8.6.4 8.7 8.7.1 8.7.2 8.7.3 8.7.4 8.7.5 8.7.6 8.7.7 8.7.8 LOCKED ATOMIC OPERATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 8.7.9 Memory Ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-42 8.7.10 Serializing Instructions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-42 8.7.11 MICROCODE UPDATE Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8-43 8.7.12 Self Modifying Code . . . . . . . . . . . . . . . .
CONTENTS PAGE 9.5 9.6 9.7 9.7.1 9.7.2 9.8 9.8.1 9.8.2 9.8.3 9.8.4 9.8.5 9.8.5.1 9.8.5.2 9.8.5.3 9.8.5.4 9.9 9.9.1 9.9.2 9.10 9.10.1 9.10.2 9.10.3 9.10.4 9.11 9.11.1 9.11.2 9.11.3 9.11.4 9.11.5 9.11.6 9.11.6.1 9.11.6.2 9.11.6.3 9.11.6.4 9.11.6.5 9.11.7 9.11.7.1 9.11.7.2 9.11.8 9.11.8.1 9.11.8.2 9.11.8.3 9.11.8.4 9.11.8.5 9.11.8.6 9.11.8.7 9.11.8.8 9.11.8.9 x Vol. 3A MEMORY TYPE RANGE REGISTERS (MTRRS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 10 ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 10.1 10.2 10.3 10.4 10.4.1 10.4.2 10.4.3 10.4.4 10.4.5 10.4.6 10.4.7 10.4.7.1 10.4.7.2 10.4.7.3 10.4.7.4 10.4.8 10.5 10.5.1 10.5.1.1 10.5.1.2 10.5.1.3 10.5.2 10.5.3 10.5.4 10.5.5 10.5.6 10.5.6.1 10.5.7 10.5.8 10.5.8.1 10.6 10.6.1 10.6.2 10.6.3 10.6.3.1 10.6.4 10.6.5 10.7 10.7.1 10.7.1.1 10.7.2 10.7.2.1 10.7.2.2 10.7.2.3 LOCAL AND I/O APIC OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 10.7.2.4 10.7.2.5 10.7.2.6 10.7.3 10.7.4 10.8 10.9 10.9.1 10.9.2 10.9.3 10.9.3.1 10.9.4 10.9.5 10.9.5.1 10.9.6 10.9.6.1 10.10 10.11 Deriving Logical x2APIC ID from the Local x2APIC ID . . . . . . . . . . . . . . . . . . . . . . . . . 10-50 Broadcast/Self Delivery Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-51 Lowest Priority Delivery Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 11.11 MEMORY TYPE RANGE REGISTERS (MTRRS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-30 11.11.1 MTRR Feature Identification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-32 11.11.2 Setting Memory Ranges with MTRRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11-33 11.11.2.1 IA32_MTRR_DEF_TYPE MSR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 13.1.6.1 13.2 13.3 13.4 13.5 13.5.1 13.6 13.6.1 13.7 13.8 13.8.1 Numeric Error flag and IGNNE# . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13-8 EMULATION OF SSE/SSE2/SSE3/SSSE3/SSE4 EXTENSIONS. . . . . . . . . . . . . . . . . . . . . . . . . . 13-8 SAVING AND RESTORING THE SSE/SSE2/SSE3/SSSE3/SSE4 STATE . . . . . . . . . . . . . . . . . . 13-8 SAVING THE SSE/SSE2/SSE3/SSSE3/SSE4 STATE ON TASK OR CONTEXT SWITCHES .
CONTENTS PAGE 15.3 MACHINE-CHECK MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-2 15.3.1 Machine-Check Global Control MSRs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3 15.3.1.1 IA32_MCG_CAP MSR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15-3 15.3.1.2 IA32_MCG_STATUS MSR. . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 16 DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.1 16.2 16.2.1 16.2.2 16.2.3 16.2.4 16.2.5 16.2.6 16.3 16.3.1 16.3.1.1 16.3.1.2 16.3.1.3 16.3.1.4 16.3.1.5 16.3.2 16.4 16.4.1 16.4.2 16.4.3 16.4.4 16.4.5 16.4.6 16.4.7 16.4.8 16.4.8.1 16.4.8.2 16.4.8.3 16.4.9 16.4.9.1 16.4.9.2 16.4.9.3 16.4.9.4 16.4.9.5 16.5 16.5.1 16.6 16.6.1 16.6.2 16.7 16.7.1 16.7.2 16.7.3 16.8 OVERVIEW OF DEBUG SUPPORT FACILITIES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 16.9 16.10 16.10.1 16.10.2 16.10.3 16.11 16.11.1 16.11.2 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (PENTIUM M PROCESSORS). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (P6 FAMILY PROCESSORS) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DEBUGCTLMSR Register . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 18 MIXING 16-BIT AND 32-BIT CODE 18.1 18.2 18.3 18.4 18.4.1 18.4.2 18.4.2.1 18.4.2.2 18.4.3 18.4.4 18.4.5 DEFINING 16-BIT AND 32-BIT PROGRAM MODULES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-2 MIXING 16-BIT AND 32-BIT OPERATIONS WITHIN A CODE SEGMENT . . . . . . . . . . . . . . . . . 18-2 SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-4 TRANSFERRING CONTROL AMONG MIXED-SIZE CODE SEGMENTS . . . . .
CONTENTS PAGE 19.18.6.3 Numeric Underflow Exception (#U) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.18.6.4 Exception Precedence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.18.6.5 CS and EIP For FPU Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19.18.6.6 FPU Error Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 19.25 EXCEPTIONS AND/OR EXCEPTION CONDITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-28 19.25.1 Machine-Check Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-30 19.25.2 Priority OF Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19-30 19.26 INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 20.5 20.6 20.7 20.8 VIRTUAL-MACHINE CONTROL STRUCTURE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . DISCOVERING SUPPORT FOR VMX. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ENABLING AND ENTERING VMX OPERATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . RESTRICTIONS ON VMX OPERATION. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 22 VMX NON-ROOT OPERATION 22.1 INSTRUCTIONS THAT CAUSE VM EXITS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1 22.1.1 Relative Priority of Faults and VM Exits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-1 22.1.2 Instructions That Cause VM Exits Unconditionally . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-2 22.1.3 Instructions That Cause VM Exits Conditionally . . . . . . . . .
CONTENTS PAGE 23.3.1.3 23.3.1.4 23.3.1.5 23.3.1.6 23.3.2 23.3.2.1 23.3.2.2 23.3.2.3 23.3.2.4 23.3.2.5 23.3.3 23.4 23.5 23.5.1 23.5.1.1 23.5.1.2 23.5.1.3 23.5.2 23.6 23.6.1 23.6.2 23.6.3 23.6.4 23.6.5 23.6.6 23.6.7 23.6.8 23.6.9 23.7 23.8 Checks on Guest Descriptor-Table Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checks on Guest RIP and RFLAGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checks on Guest Non-Register State . . . . . . . .
CONTENTS PAGE 24.5.6 24.6 24.7 24.8 Clearing Address-Range Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-37 LOADING MSRS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-38 VMX ABORTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24-38 MACHINE CHECK DURING VM EXIT . . . . . . . . .
CONTENTS PAGE 26.11 26.11.1 26.12 26.12.1 26.13 26.14 SMBASE RELOCATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-19 Relocating SMRAM to an Address Above 1 MByte . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26-20 I/O INSTRUCTION RESTART . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 27.7.1 Handling VM Exits Due to Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-11 27.7.1.1 Reflecting Exceptions to Guest Software. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27-11 27.7.1.2 Resuming Guest Software after Handling an Exception . . . . . . . . . . . . . . . . . . . . . . 27-13 27.8 MULTI-PROCESSOR CONSIDERATIONS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 29 HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR 29.1 29.2 29.3 29.3.1 29.3.2 29.3.2.1 29.3.2.2 29.3.2.3 29.3.2.4 29.3.2.5 29.3.3 29.3.3.1 29.3.3.2 29.3.3.3 29.3.3.4 29.4 29.4.1 29.4.2 29.4.3 29.4.3.1 29.4.3.2 29.4.3.3 29.4.3.4 29.4.3.5 29.5 OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29-1 INTERRUPT HANDLING IN VMX OPERATION. . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 30.5 30.6 30.6.1 30.6.1.1 30.6.1.2 30.6.1.3 30.6.2 30.6.2.1 30.6.2.2 30.6.2.3 30.7 30.8 30.8.1 30.8.2 30.8.3 30.8.4 30.8.5 30.8.5.1 30.8.5.2 30.8.5.3 30.8.5.4 30.8.5.5 30.8.5.6 30.8.5.7 30.8.5.8 30.8.5.9 30.8.6 30.8.6.1 30.8.6.2 30.8.6.3 30.8.6.4 30.8.7 30.8.7.1 30.8.7.2 30.8.7.3 30.8.7.4 30.8.7.5 30.8.8 30.9 30.9.1 30.9.2 30.9.3 30.9.4 30.10 30.10.1 30.10.2 PERFORMANCE MONITORING (PROCESSORS BASED ON INTEL® ATOM™ MICROARCHITECTURE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 30.10.3 Incrementing the Time-Stamp Counter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-77 30.10.4 Non-Halted Reference Clockticks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30-77 30.10.5 Cycle Counting and Opportunistic Processor Operation . . . . . . . . . . . . . . . . . . . . . . . . . 30-77 30.11 PERFORMANCE MONITORING, BRANCH PROFILING AND SYSTEM EVENTS . . . . . . . . . . 30-78 30.
CONTENTS PAGE B.3 B.4 B.5 B.5.1 B.6 B.7 B.8 B.9 MSRS IN THE INTEL® ATOM™ PROCESSOR FAMILY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B-58 MSRS IN THE INTEL® MICROARCHITECTURE (NEHALEM). . . . . . . . . . . . . . . . . . . . . . . . . . . . B-73 MSRS IN THE PENTIUM® 4 AND INTEL® XEON® PROCESSORS . . . . . . . . . . . . . . . . . . . . B-96 MSRs Unique to Intel Xeon Processor MP with L3 Cache . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE E.4.3 E.4.3.1 E.4.3.2 E.4.3.3 Processor Model Specific Error Code Field. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . E-21 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . MCA Error Type A: L3 ErrorE-21 Processor Model Specific Error Code Field Type B: Bus and Interconnect Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE H.4.2 H.4.3 H.4.4 Natural-Width Read-Only Data Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-10 Natural-Width Guest-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-10 Natural-Width Host-State Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . H-11 APPENDIX I VMX BASIC EXIT REASONS xxxii Vol.
CONTENTS PAGE FIGURES Figure 1-1. Figure 1-2. Figure 2-1. Figure 2-2. Figure 2-3. Figure 2-4. Figure 2-5. Figure 2-6. Figure 2-7. Figure 3-1. Figure 3-2. Figure 3-3. Figure 3-4. Figure 3-5. Figure 3-6. Figure 3-7. Figure 3-8. Figure 3-9. Figure 3-10. Figure 3-11. Figure 4-1. Figure 4-2. Figure 4-3. Figure 4-4. Figure 4-5. Figure 4-6. Figure 4-7. Figure 4-8. Figure 4-9. Figure 4-10. Figure 4-11. Figure 4-12. Figure 5-1. Figure 5-2. Figure 5-3. Figure 5-4. Figure 5-5. Figure 5-6. Figure 5-7. Figure 5-8.
CONTENTS PAGE Figure 6-2. Figure 6-3. Figure 6-4. Figure 6-5. Figure 6-6. Figure 6-7. Figure 6-8. Figure 6-9. Figure 7-1. Figure 7-2. Figure 7-3. Figure 7-4. Figure 7-5. Figure 7-6. Figure 7-7. Figure 7-8. Figure 7-9. Figure 7-10. Figure 7-11. Figure 8-1. Figure 8-2. Figure 8-3. Figure 8-4. Figure 8-5. Figure 8-6. Figure 8-7. Figure 9-1. Figure 9-2. Figure 9-3. Figure 9-4. Figure 9-5. Figure 9-6. Figure 9-7. Figure 9-8. Figure 9-9. Figure 10-1. Figure 10-2. Figure 10-3. Figure 10-4. Figure 10-5.
CONTENTS PAGE Figure 10-14. Figure 10-15. Figure 10-16. Figure 10-17. Figure 10-18. Figure 10-19. Figure 10-20. Figure 10-21. Figure 10-22. Figure 10-23. Figure 10-24. Figure 10-25. Figure 10-26. Figure 10-27. Figure 10-28. Figure 10-29. Figure 10-30. Figure 10-31. Figure 10-32. Figure 10-33. Figure 11-1. Figure 11-2. Figure 11-3. Figure 11-4. Figure 11-5. Figure 11-6. Figure 11-7. Figure 11-8. Figure 11-9. Figure 12-1. Figure 12-2. Figure 13-1. Figure 13-2. Figure 13-3. Figure 13-4. Figure 14-1.
CONTENTS PAGE Figure 14-11. Figure 14-12. Figure 15-1. Figure 15-2. Figure 15-3. Figure 15-4. Figure 15-5. Figure 15-6. Figure 15-7. Figure 15-8. Figure 15-9. Figure 15-10. Figure 16-1. Figure 16-2. Figure 16-3. Figure 16-4. Figure 16-5. Figure 16-6. Figure 16-7. Figure 16-8. Figure 16-9. Figure 16-10. Figure 16-11. Figure 16-12. Figure 16-13. Figure 16-14. Figure 16-15. Figure 16-16. Figure 16-17. Figure 16-18. Figure 17-1. Figure 17-2. Figure 17-3. Figure 17-4. Figure 17-5. Figure 18-1. Figure 19-1.
CONTENTS PAGE Figure 29-1. Figure 30-1. Figure 30-2. Figure 30-3. Figure 30-4. Figure 30-5. Figure 30-6. Figure 30-7. Figure 30-8. Figure 30-9. Figure 30-10. Figure 30-11. Figure 30-12. Figure 30-13. Figure 30-14. Figure 30-15. Figure 30-16. Figure 30-17. Figure 30-18. Figure 30-19. Figure 30-20. Figure 30-21. Figure 30-22. Figure 30-23. Figure 30-24. Figure 30-25. Figure 30-26. Figure 30-27. Figure 30-28. Figure 30-29. Figure 30-30. Figure 30-31. Figure 30-32. Figure 30-33. Figure 30-34. Figure 30-35.
CONTENTS PAGE TABLES Table 2-1. Table 2-2. Table 3-1. Table 3-2. Table 4-1. Table 4-3. Table 4-2. Table 4-4. Table 4-5. Table 4-6. Table 4-7. Table 4-8. Table 4-9. Table 4-10. Table 4-11. Table 4-12. Table 4-13. Table 4-15. Table 4-14. Table 4-16. Table 4-17. Table 5-1. Table 5-2. Table 5-3. Table 5-4. Table 5-5. Table 5-6. Table 5-7. Table 5-8. Table 5-9. Table 6-1. Table 6-2. Table 6-3. Table 6-4. Table 6-5. Table 6-6. Table 6-7. Table 6-8. Table 7-1. Table 7-2. Table 8-1. xxxviii Vol.
CONTENTS PAGE Table 8-2. Table 8-3. Table 9-1. Table 9-2. Table 9-3. Table 9-4. Table 9-5. Table 9-6. Table 9-7. Table 9-8. Table 9-9. Table 9-10. Table 9-11. Table 9-12. Table 9-13. Table 9-14. Table 9-15. Table 9-17. Table 9-16. Table 9-18. Table 10-1 Table 10-2. Table 10-3. Table 10-4. Table 10-5. Table 10-6 Table 10-7 Table 11-1. Table 11-2. Table 11-3. Table 11-4. Table 11-5. Table 11-6. Table 11-7. Table 11-8. Table 11-9. Table 11-10. Table 11-11. Table 11-12. Table 12-1. Table 12-2. Table 12-3.
CONTENTS PAGE Table 13-1. Table 13-2. Table 13-3. Table 13-4. Table 13-5. Table 14-1. Table 15-1. Table 15-2. Table 15-3. Table 15-4. Table 15-5. Table 15-6. Table 15-7. Table 15-8. Table 15-9. Table 15-10. Table 15-11. Table 15-12. Table 15-13. Table 15-14. Table 15-15. Table 15-16. Table 15-17. Table 15-18. Table 15-19. Table 15-20. Table 16-1. Table 16-2. Table 16-3. Table 16-4. Table 16-5. Table 16-6. Table 16-7. Table 16-8. Table 16-9. Table 16-10. Table 17-1. Table 17-2. Table 18-1. Table 19-1.
CONTENTS PAGE Table 21-4. Table 21-5. Table 21-6. Table 21-7. Table 21-8. Table 21-9. Table 21-10. Table 21-11. Table 21-12. Table 21-13. Table 21-14. Table 21-15. Table 21-16. Table 24-1. Table 24-2. Table 24-3. Table 24-4. Table 24-5. Table 24-6. Table 24-7. Table 24-8. Table 24-9. Table 24-10. Table 24-11. Table 24-12. Table 24-13. Table 25-1. Table 25-2. Table 25-3. Table 25-4. Table 25-5. Table 26-1. Table 26-2. Table 26-3. Table 26-4. Table 26-5. Table 26-6. Table 26-7. Table 26-8. Table 26-9.
CONTENTS PAGE Table 30-1. Table 30-2. Table 30-3. Table 30-4. Table 30-5. Table 30-6. Table 30-7. Table 30-8. Table 30-10. Table 30-9. Table 30-11. Table 30-12. Table 30-13. Table 30-14. Table 30-15. Table 30-16. Table 30-17. Table 30-18. Table 30-19. Table 30-20. Table 30-21. Table A-1. Table A-2. Table A-3. Table A-4. Table A-5. Table A-6. Table A-7. Table A-8. Table A-9. Table A-10. Table A-11. Table A-12. Table A-14. Table A-13. xlii Vol.
CONTENTS PAGE Table A-15. Table A-16. Table A-17. Table A-18. Table A-19. Table A-20. Table B-1. Table B-2. Table B-3. Table B-4. Table B-5. Table B-6. Table B-7. Table B-8. Table B-9. Table B-10. Table B-11. Table B-12. Table C-1. Table E-1. Table E-2. Table E-3. Table E-4. Table E-5. Table E-6. Table E-7. Table E-8. Table E-9. Table E-10. Table E-11. Table E-12. Table E-13. Table E-14. Table E-15. Table E-16. Table E-17. Table E-18. Table E-19. Table E-20. Table F-1.
CONTENTS PAGE Table F-2. Table F-3. Table F-4. Table G-1. Table H-1. Table H-2. Table H-3. Table H-4. Table H-5. Table H-6. Table H-7. Table H-8. Table H-9. Table H-10. Table H-11. Table H-12. Table H-13. Table H-14. Table H-15. Table I-1. xliv Vol. 3A Short Message (21 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .F-2 Non-Focused Lowest Priority Message (34 Cycles). . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 1 ABOUT THIS MANUAL The Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1 (order number 253668) and the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2 (order number 253669) are part of a set that describes the architecture and programming environment of Intel 64 and IA-32 Architecture processors.
ABOUT THIS MANUAL • • • • • • • • • • • • • • • • • • • Dual-Core Intel® Xeon® processor LV Intel® Core™2 Duo processor Intel® Core™2 Quad processor Q6000 series Intel® Xeon® processor 3000, 3200 series Intel® Xeon® processor 5000 series Intel® Xeon® processor 5100, 5300 series Intel® Core™2 Extreme processor X7000 and X6800 series Intel® Core™2 Extreme QX6000 series Intel® Xeon® processor 7100 series Intel® Pentium® Dual-Core processor Intel® Xeon® processor 7200, 7300 series Intel® Core™2 Extreme QX9000
ABOUT THIS MANUAL The Intel® CoreTM i7 processor and the Intel® CoreTM i5 processor are based on the Intel® microarchitecture (Nehalem) and support Intel 64 architecture. Processors based on the Next Generation Intel Processor, codenamed Westmere, support Intel 64 architecture. P6 family, Pentium® M, Intel® Core™ Solo, Intel® Core™ Duo processors, dual-core Intel® Xeon® processor LV, and early generations of Pentium 4 and Intel Xeon processors support IA-32 architecture.
ABOUT THIS MANUAL Chapter 6 — Interrupt and Exception Handling. Describes the basic interrupt mechanisms defined in the Intel 64 and IA-32 architectures, shows how interrupts and exceptions relate to protection, and describes how the architecture handles each exception type. Reference information for each exception is given at the end of this chapter. Chapter 7 — Task Management. Describes mechanisms the Intel 64 and IA-32 architectures provide to support multitasking and inter-task protection.
ABOUT THIS MANUAL Chapter 16 — Debugging, Branch Profiles and Time-Stamp Counter. Describes the debugging registers and other debug mechanism provided in Intel 64 or IA-32 processors. This chapter also describes the time-stamp counter. Chapter 17 — 8086 Emulation. Describes the real-address and virtual-8086 modes of the IA-32 architecture. Chapter 18 — Mixing 16-Bit and 32-Bit Code. Describes how to mix 16-bit and 32-bit code modules within the same program or task.
ABOUT THIS MANUAL Chapter 30 — Performance Monitoring. Describes the Intel 64 and IA-32 architectures’ facilities for monitoring performance. Appendix A — Performance-Monitoring Events. Lists architectural performance events. Non-architectural performance events (i.e. model-specific events) are listed for each generation of microarchitecture. Appendix B — Model-Specific Registers (MSRs).
ABOUT THIS MANUAL means the bytes of a word are numbered starting from the least significant byte. Figure 1-1 illustrates these conventions. 1.3.2 Reserved Bits and Software Compatibility In many register and memory layout descriptions, certain bits are marked as reserved. When bits are marked as reserved, it is essential for compatibility with future processors that software treat these bits as having a future, though unknown, effect.
ABOUT THIS MANUAL 1.3.3 Instruction Operands When instructions are represented symbolically, a subset of assembly language is used. In this subset, an instruction has the following format: label: mnemonic argument1, argument2, argument3 where: • • A label is an identifier which is followed by a colon. • The operands argument1, argument2, and argument3 are optional. There may be from zero to three operands, depending on the opcode.
ABOUT THIS MANUAL For example, a program can keep its code (instructions) and stack in separate segments. Code addresses would always refer to the code space, and stack addresses would always refer to the stack space.
ABOUT THIS MANUAL 6\QWD[ 5HSUHVHQWDWLRQ IRU &38,' ,QSXW DQG 2XWSXW &38,' + (&; 66( >ELW @ ,QSXW YDOXH IRU ($; GHILQHV RXWSXW 127( 6RPH OHDYHV UHTXLUH LQSXW YDOXHV IRU ($; DQG (&; ,I RQO\ RQH YDOXH LV SUHVHQW ($; LV LPSOLHG 2XWSXW UHJLVWHU DQG IHDWXUH IODJ RU ILHOG QDPH ZLWK ELW SRVLWLRQ V 9DOXH RU UDQJH RI RXWSXW )RU &RQWURO 5HJLVWHU 9DOXHV &5 26);65>ELW @ ([DPSOH &5 QDPH )HDWXUH IODJ RU ILHOG QDPH ZLWK ELW SRVLWLRQ V 9DOXH RU UDQJH RI RXWSXW )RU 0RGHO 6SHFLILF 5HJLVWHU 9DO
ABOUT THIS MANUAL This example refers to a page-fault exception under conditions where an error code naming a type of fault is reported. Under some conditions, exceptions which produce error codes may not be able to report an accurate code. In this case, the error code is zero, as shown below for a general-protection exception: #GP(0) 1.4 RELATED LITERATURE Literature related to Intel 64 and IA-32 processors is listed on-line at: http://developer.intel.com/products/processor/index.
ABOUT THIS MANUAL • Intel® 64 Architecture Processor Topology Enumeration: http://softwarecommunity.intel.com/articles/eng/3887.htm • Intel® Trusted Execution Technology Measured Launched Environment Programming Guide, http://www.intel.com/technology/security/index.htm • Developing Multi-threaded Applications: A Platform Consistent Approach • Using Spin-Loops on Intel Pentium 4 Processor and Intel Xeon Processor MP http://www3.intel.com/cd/ids/developer/asmona/eng/dc/threading/knowledgebase/19083.
CHAPTER 2 SYSTEM ARCHITECTURE OVERVIEW IA-32 architecture (beginning with the Intel386 processor family) provides extensive support for operating-system and system-development software. This support offers multiple modes of operation, which include: • Real mode, protected mode, virtual 8086 mode, and system management mode. These are sometimes referred to as legacy modes.
SYSTEM ARCHITECTURE OVERVIEW initiates the switch from real-address mode to protected mode. If IA-32e mode operation is desired, software also initiates a switch from protected mode to IA-32e mode. 2.1 OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE System-level architecture consists of a set of registers, data structures, and instructions designed to support basic system-level operations such as memory management, interrupt and exception handling, task management, and control of multiple processors.
SYSTEM ARCHITECTURE OVERVIEW Physical Address EFLAGS Register Control Registers CR4 CR3 CR2 CR1 CR0 Task Register Interrupt Vector Code, Data or Stack Segment Linear Address Task-State Segment (TSS) Segment Selector Register Global Descriptor Table (GDT) Segment Sel. Seg. Desc. TSS Seg. Sel. TSS Desc. Interrupt Handler Code Current Stack TSS Seg. Desc. Interrupt Descriptor Table (IDT) Task-State Segment (TSS) TSS Desc. Interrupt Gate Task Code Data Stack LDT Desc.
SYSTEM ARCHITECTURE OVERVIEW RFLAGS Physical Address Control Register CR8 CR4 CR3 CR2 CR1 CR0 Task Register Interrupt Vector Code, Data or Stack Segment (Base =0) Linear Address Task-State Segment (TSS) Segment Selector Register Global Descriptor Table (GDT) Segment Sel. Seg. Desc. TR TSS Desc. NULL Seg. Desc. Interrupt Descriptor Table (IDT) Interr. Handler Seg. Desc. Interrupt Gate LDT Desc. GDTR Trap Gate IST Local Descriptor Table (LDT) NULL Call-Gate Segment Selector Seg. Desc.
SYSTEM ARCHITECTURE OVERVIEW 2.1.1 Global and Local Descriptor Tables When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or an optional local descriptor table (LDT) as shown in Figure 2-1. These tables contain entries called segment descriptors. Segment descriptors provide the base address of segments well as access rights, type, and usage information. Each segment descriptor has an associated segment selector.
SYSTEM ARCHITECTURE OVERVIEW The architecture also defines a set of special descriptors called gates (call gates, interrupt gates, trap gates, and task gates). These provide protected gateways to system procedures and handlers that may operate at a different privilege level than application programs and most procedures.
SYSTEM ARCHITECTURE OVERVIEW 2. Loads the task register with the segment selector for the new task. 3. Accesses the new TSS through a segment descriptor in the GDT. 4. Loads the state of the new task from the new TSS into the general-purpose registers, the segment registers, the LDTR, control register CR3 (base address of the paging-structure hierarchy), the EFLAGS register, and the EIP register. 5. Begins execution of the new task. A task can also be accessed through a task gate.
SYSTEM ARCHITECTURE OVERVIEW The IDTR register is expanded to hold a 64-bit base address. Task gates are not supported. 2.1.5 Memory Management System architecture supports either direct physical addressing of memory or virtual memory (through paging). When physical addressing is used, a linear address is treated as a physical address.
SYSTEM ARCHITECTURE OVERVIEW 2.1.6 System Registers To assist in initializing the processor and controlling system operations, the system architecture provides system flags in the EFLAGS register and several system registers: • The system flags and IOPL field in the EFLAGS register control task and mode switching, interrupt handling, instruction tracing, and access rights. See also: Section 2.3, “System Flags and Fields in the EFLAGS Register.
SYSTEM ARCHITECTURE OVERVIEW On systems that support IA-32e mode, the extended feature enable register (IA32_EFER) is available. This model-specific register controls activation of IA-32e mode and other IA-32e mode operations. In addition, there are several modelspecific registers that govern IA-32e mode instructions: • • • • IA32_KernelGSbase — Used by SWAPGS instruction. IA32_LSTAR — Used by SYSCALL instruction. IA32_SYSCALL_FLAG_MASK — Used by SYSCALL instruction.
SYSTEM ARCHITECTURE OVERVIEW running program or task. SMM-specific code may then be executed transparently. Upon returning from SMM, the processor is placed back into its state prior to the SMI. • Virtual-8086 mode — In protected mode, the processor supports a quasioperating mode known as virtual-8086 mode. This mode allows the processor execute 8086 software in a protected, multitasking environment.
SYSTEM ARCHITECTURE OVERVIEW The VM flag in the EFLAGS register determines whether the processor is operating in protected mode or virtual-8086 mode. Transitions between protected mode and virtual-8086 mode are generally carried out as part of a task switch or a return from an interrupt or exception handler. See also: Section 17.2.5, “Entering Virtual-8086 Mode.” The LMA bit (IA32_EFER.LMA.LMA[bit 10]) determines whether the processor is operating in IA-32e mode.
SYSTEM ARCHITECTURE OVERVIEW 31 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved (set to 0) V V I I I A V R 0 N T C M F D P F I O P L O D I T S Z P C A F F F F F F 0 F 0 F 1 F ID — Identification Flag VIP — Virtual Interrupt Pending VIF — Virtual Interrupt Flag AC — Alignment Check VM — Virtual-8086 Mode RF — Resume Flag NT — Nested Task Flag IOPL— I/O Privilege Level IF — Interrupt Enable Flag TF — Trap Flag Reserved Figure 2-4.
SYSTEM ARCHITECTURE OVERVIEW changing to the state of this flag can generate unexpected exceptions in application programs. See also: Section 7.4, “Task Linking.” RF Resume (bit 16) — Controls the processor’s response to instruction-breakpoint conditions. When set, this flag temporarily disables debug exceptions (#DB) from being generated for instruction breakpoints (although other exception conditions can cause an exception to be generated).
SYSTEM ARCHITECTURE OVERVIEW VIP Virtual interrupt pending (bit 20) — Set by software to indicate that an interrupt is pending; cleared to indicate that no interrupt is pending. This flag is used in conjunction with the VIF flag. The processor reads this flag but never modifies it. The processor only recognizes the VIP flag when either the VME flag or the PVI flag in control register CR4 is set and the IOPL is less than 3.
SYSTEM ARCHITECTURE OVERVIEW 47(79) System Table Registers 16 15 0 GDTR 32(64)-bit Linear Base Address 16-Bit Table Limit IDTR 32(64)-bit Linear Base Address 16-Bit Table Limit System Segment Registers 15 0 Task Register LDTR Segment Descriptor Registers (Automatically Loaded) Attributes Seg. Sel. 32(64)-bit Linear Base Address Segment Limit Seg. Sel. 32(64)-bit Linear Base Address Segment Limit Figure 2-5. Memory Management Registers 2.4.
SYSTEM ARCHITECTURE OVERVIEW 2.4.3 IDTR Interrupt Descriptor Table Register The IDTR register holds the base address (32 bits in protected mode; 64 bits in IA-32e mode) and 16-bit table limit for the IDT. The base address specifies the linear address of byte 0 of the IDT; the table limit specifies the number of bytes in the table. The LIDT and SIDT instructions load and store the IDTR register, respectively.
SYSTEM ARCHITECTURE OVERVIEW • The MOV CRn instructions do not check that addresses written to CR2 and CR3 are within the linear-address or physical-address limitations of the implementation. • Register CR8 is available in 64-bit mode only. The control registers are summarized below, and each architecturally defined control field in these control registers are described individually. In Figure 2-6, the width of the register in 64-bit mode is indicated in parenthesis (except for CR0).
SYSTEM ARCHITECTURE OVERVIEW 31(63) 18 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 S M X E Reserved (set to 0) OSXSAVE V M 0 X E 0 T P V P P M P P C G C A S D S V M E D I E E E E E E CR4 OSXMMEXCPT OSFXSR 31(63) 12 11 5 4 3 2 P P C W D T Page-Directory Base 31(63) CR3 (PDBR) 0 Page-Fault Linear Address CR2 31(63) 0 CR1 31 30 29 28 P C N G D W 19 18 17 16 15 A M 6 5 4 3 2 1 0 W P N E T E M P E T S M P E CR0 Reserved Figure 2-6.
SYSTEM ARCHITECTURE OVERVIEW See also: Section 11.5.3, “Preventing Caching,” and Section 11.5, “Cache Control.” NW Not Write-through (bit 29 of CR0) — When the NW and CD flags are clear, write-back (for Pentium 4, Intel Xeon, P6 family, and Pentium processors) or write-through (for Intel486 processors) is enabled for writes that hit the cache and invalidation cycles are enabled. See Table 11-5 for detailed information about the affect of the NW flag on caching for other settings of the CD and NW flags.
SYSTEM ARCHITECTURE OVERVIEW delayed until an x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instruction is actually executed by the new task. The processor sets this flag on every task switch and tests it when executing x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instructions.
SYSTEM ARCHITECTURE OVERVIEW Table 2-1. Action Taken By x87 FPU Instructions for Different Combinations of EM, MP, and TS CR0 Flags 1 EM 1 x87 FPU Instruction Type 1 #NM Exception #NM exception. Emulation (bit 2 of CR0) — Indicates that the processor does not have an internal or external x87 FPU when set; indicates an x87 FPU is present when clear. This flag also affects the execution of MMX/SSE/SSE2/SSE3/SSSE3/SSE4 instructions.
SYSTEM ARCHITECTURE OVERVIEW flag is set, caching of the page-directory is prevented; when the flag is clear, the page-directory can be cached. This flag affects only the processor’s internal caches (both L1 and L2, when present). The processor ignores this flag if paging is not used (the PG flag in register CR0 is clear) or the CD (cache disable) flag in CR0 is set. See also: Chapter 11, “Memory Cache Control” (for more about the use of the PCD flag) and Section 4.
SYSTEM ARCHITECTURE OVERVIEW when set; when clear, processor aliases references to registers DR4 and DR5 for compatibility with software written to run on earlier IA-32 processors. See also: Section 16.2.2, “Debug Registers DR4 and DR5.” PSE Page Size Extensions (bit 4 of CR4) — Enables 4-MByte pages with 32-bit paging when set; restricts 32-bit paging to pages to 4 KBytes when clear. See also: Section 4.3, “32-Bit Paging.
SYSTEM ARCHITECTURE OVERVIEW processor will generate an invalid opcode exception (#UD) if it attempts to execute any SSE/SSE2/SSE3and instruction, with the exception of PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI, CLFLUSH, CRC32, and POPCNT. The operating system or executive must explicitly set this flag. NOTE CPUID feature flags FXSR indicates availability of the FXSAVE/FXRSTOR instructions.
SYSTEM ARCHITECTURE OVERVIEW all interrupts are enabled. This field is available in 64-bit mode. A value of 15 means all interrupts will be disabled. 2.5.1 CPUID Qualification of Control Register Flags The VME, PVI, TSD, DE, PSE, PAE, MCE, PGE, PCE, OSFXSR, and OSXMMEXCPT flags in control register CR4 are model specific. All of these flags (except the PCE flag) can be qualified with the CPUID instruction to determine if they are implemented on the processor before they are used.
SYSTEM ARCHITECTURE OVERVIEW state, SSE state, or a future processor extended state) is represented by a bit in XCR0. The OS can enable future processor extended states in a forward manner by specifying the appropriate bit mask value using the XSETBV instruction according to the results of the CPUID leaf 0DH. With the exception of bit 63, each bit in the XFEATURE_ENABLED_MASK register (XCR0) corresponds to a subset of the processor states.
SYSTEM ARCHITECTURE OVERVIEW Table 2-2. Summary of System Instructions (Contd.
SYSTEM ARCHITECTURE OVERVIEW Table 2-2. Summary of System Instructions (Contd.) Useful to Application? Protected from Application? Instruction Description XGETBV Return the state of the the XFEATURE_ENABLED_MASK register Yes No XSETBV Enable one or more processor extended states No6 Yes NOTES: 1. Useful to application programs running at a CPL of 1 or 2. 2. The TSD and PCE flags in control register CR4 control access to these instructions by application programs running at a CPL of 3. 3.
SYSTEM ARCHITECTURE OVERVIEW The LMSW (load machine status word) and SMSW (store machine status word) instructions operate on bits 0 through 15 of control register CR0. These instructions are provided for compatibility with the 16-bit Intel 286 processor. Programs written to run on 32-bit IA-32 processors should not use these instructions. Instead, they should access the control register CR0 using the MOV instruction.
SYSTEM ARCHITECTURE OVERVIEW Instructions),” for a detailed explanation of the function and use of this instruction. 2.7.3 Loading and Storing Debug Registers Internal debugging facilities in the processor are controlled by a set of 8 debug registers (DR0-DR7). The MOV instruction allows setup data to be loaded to and stored from these registers. On processors that support Intel 64 architecture, debug registers DR0-DR7 are 64 bits.
SYSTEM ARCHITECTURE OVERVIEW introduced with the Pentium Pro processor). If any non-wake events are pending during shutdown, they will be handled after the wake event from shutdown is processed (for example, A20M# interrupts). The LOCK prefix invokes a locked (atomic) read-modify-write operation when modifying a memory operand.
SYSTEM ARCHITECTURE OVERVIEW Fixed-function performance counters record only specific events that are defined in Chapter 20, “Introduction to Virtual-Machine Extensions”, and the width/number of fixed-function counters are enumerated by CPUID leaf 0AH. The time-stamp counter is a model-specific 64-bit counter that is reset to zero each time the processor is reset. If not reset, the counter will increment ~9.5 x 1016 times per year when the processor is operating at a clock rate of 3GHz.
SYSTEM ARCHITECTURE OVERVIEW 2.7.7.1 Reading and Writing Model-Specific Registers in 64-Bit Mode RDMSR and WRMSR require an index to specify the address of an MSR. In 64-bit mode, the index is 32 bits; it is specified using ECX. 2.7.8 Enabling Processor Extended States The XSETBV instruction is required to enable OS support of individual processor extended states in the XFEATURE_ENABLED_MASK register (see Section 2.6). 2-34 Vol.
CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT This chapter describes the Intel 64 and IA-32 architecture’s protected-mode memory management facilities, including the physical memory requirements, segmentation mechanism, and paging mechanism. See also: Chapter 5, “Protection” (for a description of the processor’s protection mechanism) and Chapter 17, “8086 Emulation” (for a description of memory addressing protection in real-address and virtual-8086 modes). 3.
PROTECTED-MODE MEMORY MANAGEMENT segment, the segment type, and the location of the first byte of the segment in the linear address space (called the base address of the segment). The offset part of the logical address is added to the base address for the segment to locate a byte within the segment. The base address plus the offset thus forms a linear address in the processor’s linear address space.
PROTECTED-MODE MEMORY MANAGEMENT storage. When using paging, each segment is divided into pages (typically 4 KBytes each in size), which are stored either in physical memory or on the disk. The operating system or executive maintains a page directory and a set of page tables to keep track of the pages.
PROTECTED-MODE MEMORY MANAGEMENT FFFF_FFF0H. RAM (DRAM) is placed at the bottom of the address space because the initial base address for the DS data segment after reset initialization is 0. 3.2.2 Protected Flat Model The protected flat model is similar to the basic flat model, except the segment limits are set to include only the range of addresses for which physical memory actually exists (see Figure 3-3).
PROTECTED-MODE MEMORY MANAGEMENT More complexity can be added to this protected flat model to provide more protection. For example, for the paging mechanism to provide isolation between user and supervisor code and data, four segments need to be defined: code and data segments at privilege level 3 for the user, and code and data segments at privilege level 0 for the supervisor. Usually these segments all overlay each other and start at address 0 in the linear address space.
PROTECTED-MODE MEMORY MANAGEMENT Segment Registers Segment Descriptors Linear Address Space (or Physical Memory) CS Access Limit Base Address Stack SS Access Limit Base Address DS Access Limit Base Address ES Access Limit Base Address FS Access Limit Base Address GS Access Limit Base Address Access Limit Base Address Code Data Data Data Access Limit Base Address Access Limit Base Address Data Access Limit Base Address Figure 3-4.
PROTECTED-MODE MEMORY MANAGEMENT In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as an additional base registers in linear address calculations.
PROTECTED-MODE MEMORY MANAGEMENT 3.3.1 Intel® 64 Processors and Physical Address Space On processors that support Intel 64 architecture (CPUID.80000001:EDX[29] = 1), the size of the physical address range is implementation-specific and indicated by CPUID.80000008H:EAX[bits 7-0]. For the format of information returned in EAX, see “CPUID—CPU Identification” in Chapter 3 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A. See also: Chapter 4, “Paging.” 3.
PROTECTED-MODE MEMORY MANAGEMENT Logical Address 0 31(63) Offset (Effective Address) 15 0 Seg. Selector Descriptor Table Segment Descriptor Base Address + 31(63) 0 Linear Address Figure 3-5. Logical Address to Linear Address Translation If paging is not used, the processor maps the linear address directly to a physical address (that is, the linear address goes out on the processor’s address bus).
PROTECTED-MODE MEMORY MANAGEMENT TI (table indicator) flag (Bit 2) — Specifies the descriptor table to use: clearing this flag selects the GDT; setting this flag selects the current LDT. 15 3 2 1 0 Index T RPL I Table Indicator 0 = GDT 1 = LDT Requested Privilege Level (RPL) Figure 3-6. Segment Selector Requested Privilege Level (RPL) (Bits 0 and 1) — Specifies the privilege level of the selector. The privilege level can range from 0 to 3, with 0 being the most privileged level. See Section 5.
PROTECTED-MODE MEMORY MANAGEMENT For a program to access a segment, the segment selector for the segment must have been loaded in one of the segment registers. So, although a system can define thousands of segments, only 6 can be available for immediate use. Other segments can be made available by loading their segment selectors into these registers during program execution. Visible Part Segment Selector Hidden Part Base Address, Limit, Access Information CS SS DS ES FS GS Figure 3-7.
PROTECTED-MODE MEMORY MANAGEMENT 3.4.4 Segment Loading Instructions in IA-32e Mode Because ES, DS, and SS segment registers are not used in 64-bit mode, their fields (base, limit, and attribute) in segment descriptor registers are ignored. Some forms of segment load instructions are also invalid (for example, LDS, POP ES). Address calculations that reference the ES, DS, or SS segments are treated as if the segment base is zero.
PROTECTED-MODE MEMORY MANAGEMENT 3.4.5 Segment Descriptors A segment descriptor is a data structure in a GDT or LDT that provides the processor with the size and location of a segment, as well as access control and status information. Segment descriptors are typically created by compilers, linkers, loaders, or the operating system or executive, but not application programs. Figure 3-8 illustrates the general descriptor format for all types of segment descriptors.
PROTECTED-MODE MEMORY MANAGEMENT to the segment limit. Offsets greater than the segment limit generate general-protection exceptions (#GP). For expand-down segments, the segment limit has the reverse function; the offset can range from the segment limit to FFFFFFFFH or FFFFH, depending on the setting of the B flag. Offsets less than the segment limit generate generalprotection exceptions.
PROTECTED-MODE MEMORY MANAGEMENT store its own data, such as information regarding the whereabouts of the missing segment. D/B (default operation size/default stack pointer size and/or upper bound) flag Performs different functions depending on whether the segment descriptor is an executable code segment, an expand-down data segment, or a stack segment. (This flag should always be set to 1 for 32-bit code and data segments and to 0 for 16-bit code and data segments.) • Executable code segment.
PROTECTED-MODE MEMORY MANAGEMENT G (granularity) flag Determines the scaling of the segment limit field. When the granularity flag is clear, the segment limit is interpreted in byte units; when flag is set, the segment limit is interpreted in 4-KByte units. (This flag does not affect the granularity of the base address; it is always byte granular.) When the granularity flag is set, the twelve least significant bits of an offset are not tested when checking the offset against the segment limit.
PROTECTED-MODE MEMORY MANAGEMENT Table 3-1.
PROTECTED-MODE MEMORY MANAGEMENT For code segments, the three low-order bits of the type field are interpreted as accessed (A), read enable (R), and conforming (C). Code segments can be executeonly or execute/read, depending on the setting of the read-enable bit. An execute/read segment might be used when constants or other static data have been placed with instruction code in a ROM.
PROTECTED-MODE MEMORY MANAGEMENT • • • • • Task-state segment (TSS) descriptor. Call-gate descriptor. Interrupt-gate descriptor. Trap-gate descriptor. Task-gate descriptor. These descriptor types fall into two categories: system-segment descriptors and gate descriptors. System-segment descriptors point to system segments (LDT and TSS segments).
PROTECTED-MODE MEMORY MANAGEMENT See also: Section 3.5.1, “Segment Descriptor Tables”, and Section 7.2.2, “TSS Descriptor” (for more information on the system-segment descriptors); see Section 5.8.3, “Call Gates”, Section 6.11, “IDT Descriptors”, and Section 7.2.5, “Task-Gate Descriptor” (for more information on the gate descriptors). 3.5.1 Segment Descriptor Tables A segment descriptor table is an array of segment descriptors (see Figure 3-10).
PROTECTED-MODE MEMORY MANAGEMENT Each system must have one GDT defined, which may be used for all programs and tasks in the system. Optionally, one or more LDTs can be defined. For example, an LDT can be defined for each separate task being run, or some or all tasks can share the same LDT. The GDT is not a segment itself; instead, it is a data structure in linear address space. The base linear address and limit of the GDT must be loaded into the GDTR register (see Section 2.
PROTECTED-MODE MEMORY MANAGEMENT 47 16 15 79 0 Limit 32-bit Base Address 16 15 64-bit Base Address 0 Limit Figure 3-11. Pseudo-Descriptor Formats 3.5.2 Segment Descriptor Tables in IA-32e Mode In IA-32e mode, a segment descriptor table can contain up to 8192 (213) 8-byte descriptors. An entry in the segment descriptor table can be 8 bytes. System descriptors are expanded to 16 bytes (occupying the space of two entries). GDTR and LDTR registers are expanded to hold 64-bit base address.
CHAPTER 4 PAGING Chapter 3 explains how segmentation converts logical addresses to linear addresses. Paging (or linear-address translation) is the process of translating linear addresses so that they can be used to access memory or I/O devices. Paging translates each linear address to a physical address and determines, for each translation, what accesses to the linear address are allowed (the address’s access rights) and the type of caching used for such accesses (the address’s memory type).
PAGING paging modes. Section 4.1.3 discusses how CR0.WP, CR4.PSE, CR4.PGE, and IA32_EFER.NXE modify the operation of the different paging modes. 4.1.1 Three Paging Modes If CR0.PG = 0, paging is not used. The logical processor treats all linear addresses as if they were physical addresses. CR4.PAE and IA32_EFER.LME are ignored by the processor, as are CR0.WP, CR4.PSE, and CR4.PGE, and IA32_EFER.NXE. Paging is enabled if CR0.PG = 1. Paging can be enabled only if protection is enabled (CR0.PE = 1).
PAGING Table 4-1. Properties of Different Paging Modes Paging Mode CR0.PG CR4.PAE LME in IA32_EFER LinearAddress Width PhysicalAddress Width1 Page Size(s) Supports ExecuteDisable? None 0 N/A N/A 32 32 N/A No 32-bit 1 0 02 32 Up to 403 4-KByte 4-MByte4 No PAE 1 1 0 32 Up to 52 4-KByte 2-MByte Yes5 IA-32e 1 1 2 48 Up to 52 4-KByte 2-MByte Yes5 NOTES: 1. The physical-address width is always bounded by MAXPHYADDR; see Section 4.1.4. 2.
PAGING enable these modes and make transitions between them.
PAGING • • Software can always disable paging by clearing CR0.PG with MOV to CR0. • Software cannot make transitions directly between IA-32e paging and either of the other two paging modes. It must first disable paging (by clearing CR0.PG with MOV to CR0), then set CR4.PAE and IA32_EFER.LME to the desired values (with MOV to CR4 and WRMSR), and then re-enable paging (by setting CR0.PG with MOV to CR0). As noted earlier, an attempt to clear either CR4.PAE or IA32_EFER.
PAGING 4.1.4 Enumeration of Paging Features by CPUID Software can discover support for different paging features using the CPUID instruction: • PSE: page-size extensions for 32-bit paging. If CPUID.01H:EDX.PSE [bit 3] = 1, CR4.PSE may be set to 1, enabling support for 4-MByte pages with 32-bit paging (see Section 4.3). • PAE: physical-address extension. If CPUID.01H:EDX.PAE [bit 6] = 1, CR4.PAE may be set to 1, enabling PAE paging (this setting is also required for IA-32e paging).
PAGING 4.2 HIERARCHICAL PAGING STRUCTURES: AN OVERVIEW All three paging modes translate linear addresses use hierarchical paging structures. This section provides an overview of their operation. Section 4.3, Section 4.4, and Section 4.5 provide details for the three paging modes. Every paging structure is 4096 Bytes in size and comprises a number of individual entries. With 32-bit paging, each entry is 32 bits (4 bytes); there are thus 1024 entries in each structure.
PAGING and bits 20:12 identify a fourth. Again, the last identifies the page frame. (See Figure 4-8 for an illustration.) The translation process in each of the examples above completes by identifying a page frame. However, the paging structures may be configured so that translation terminates before doing so. This occurs if process encounters a paging-structure entry that is marked “not present” (because its P flag — bit 0 — is clear) or in which a reserved bit is set.
PAGING Table 4-2.
PAGING Table 4-3. Use of CR3 with 32-Bit Paging (Contd.) Bit Position(s) Contents 31:12 Physical address of the 4-KByte aligned page directory used for linear-address translation 63:32 Ignored (these bits exist only on processors supporting the Intel-64 architecture) 32-bit paging may map linear addresses to either 4-KByte pages or 4-MByte pages. Figure 4-2 illustrates the translation process when it uses a 4-KByte page; Figure 4-3 covers the case of a 4-MByte page.
PAGING 31 Linear Address 22 21 Offset Directory 22 10 Page Directory 0 4-MByte Page Physical Address PDE with PS=1 18 32 CR3 Figure 4-3. Linear-Address Translation to a 4-MByte Page using 32-Bit Paging Because a PDE is identified using bits 31:22 of the linear address, it controls access to a 4-Mbyte region of the linear-address space. Use of the PDE depends on CR.PSE and the PDE’s PS flag (bit 7): • If CR4.PSE = 1 and the PDE’s PS flag is 1, the PDE maps a 4-MByte page (see Table 4-4).
PAGING Table 4-4. Format of a 32-Bit Page-Directory Entry that Maps a 4-MByte Page Bit Position(s) Contents 0 (P) Present; must be 1 to map a 4-MByte page 1 (R/W) Read/write; if 0, writes may not be allowed to the 4-MByte page referenced by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-MByte page referenced by this entry (see Section 4.
PAGING Table 4-5. Format of a 32-Bit Page-Directory Entry that References a Page Table Bit Position(s) Contents 0 (P) Present; must be 1 to reference a page table 1 (R/W) Read/write; if 0, writes may not be allowed to the 4-MByte region controlled by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-MByte region controlled by this entry (see Section 4.
PAGING Table 4-6. Format of a 32-Bit Page-Table Entry that Maps a 4-KByte Page Bit Position(s) Contents 0 (P) Present; must be 1 to map a 4-KByte page 1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page referenced by this entry (see Section 4.
PAGING those that do neither because they are “not present”; bit 0 (P) and bit 7 (PS) are highlighted because they determine how such an entry is used.
PAGING ters. (This is different from the other paging modes, in which there is one hierarchy referenced by CR3.) Section 4.4.1 discusses the PDPTE registers. Section 4.4.2 describes linear-address translation with PAE paging. 4.4.1 PDPTE Registers When PAE paging is used, CR3 references the base of a 32-Byte page-directorypointer table. Table 4-7 illustrates how CR3 is used with PAE paging. Table 4-7.
PAGING Table 4-8 gives the format of a PDPTE. If any of the PDPTEs sets both the P flag Table 4-8. Format of an PAE Page-Directory-Pointer-Table Entry (PDPTE) Bit Position(s) Contents 0 (P) Present; must be 1 to reference a page directory 2:1 Reserved (must be 0) 3 (PWT) Page-level write-through; indirectly determines the memory type used to access the page directory referenced by this entry (see Section 4.
PAGING Directory Pointer Linear Address 31 30 29 21 20 12 11 Table Directory 0 Offset 12 Page Table Physical Address 9 Page Directory PTE 9 PDE with PS=0 2 4-KByte Page 40 40 PDPTE Registers 40 PDPTE value Figure 4-5. Linear-Address Translation to a 4-KByte Page using PAE Paging Directory Pointer Linear Address 31 30 29 21 20 Offset Directory 0 21 9 Page Directory 2-MByte Page Physical Address PDPTE Registers 2 PDE with PS=1 PDPTE value 31 40 Figure 4-6.
PAGING 4.4.1) A page directory comprises 512 64-bit entries (PDEs). A PDE is selected using the physical address defined as follows: — Bits 51:12 are from PDPTEi. — Bits 11:3 are bits 29:21 of the linear address. — Bits 2:0 are 0. Because a PDE is identified using bits 31:21 of the linear address, it controls access to a 2-Mbyte region of the linear-address space. Use of the PDE depends on its PS flag (bit 7): • If the PDE’s PS flag is 1, the PDE maps a 2-MByte page (see Table 4-9).
PAGING Table 4-9. Format of a PAE Page-Directory Entry that Maps a 2-MByte Page Bit Position(s) Contents 0 (P) Present; must be 1 to map a 2-MByte page 1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte page referenced by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte page referenced by this entry (see Section 4.
PAGING Table 4-10. Format of a PAE Page-Directory Entry that References a Page Table Bit Position(s) Contents 0 (P) Present; must be 1 to reference a page table 1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte region controlled by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte region controlled by this entry (see Section 4.
PAGING Table 4-11. Format of a PAE Page-Table Entry that Maps a 4-KByte Page Bit Position(s) Contents 0 (P) Present; must be 1 to map a 4-KByte page 1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page referenced by this entry (see Section 4.
PAGING that do neither because they are “not present”; bit 0 (P) and bit 7 (PS) are highlighted because they determine how a paging-structure entry is used. 6666555555555 3210987654321 M1 M-1 33322222222221111111111 210987654321098765432109876543210 Ignored2 Reserved3 Address of page-directory-pointer table Address of page directory Ign. Ignored X D Ignored Rsvd. X D Ignored Rsvd. Address of 2MB page frame Ignored Rsvd. PDTPE: not present P PPUR Reserved A Ign.
PAGING bits corresponds to 4 PBytes, linear addresses are limited to 48 bits; at most 256 TBytes of linear-address space may be accessed at any given time. IA-32e paging uses a hierarchy of paging structures to produce a translation for a linear address. CR3 is used to locate the first paging-structure, the PML4 table. Table 4-12 illustrates how CR3 is used with IA-32e paging. Table 4-12.
PAGING 47 Linear Address 39 38 30 29 21 20 PML4 Directory Table Directory Ptr 9 9 9 12 11 0 Offset 12 4-KByte Page Physical Addr PTE Page-DirectoryPointer Table PDPTE PDE with PS=0 40 Page-Directory 40 Page Table 40 9 40 PML4E 40 CR3 Figure 4-8. Linear-Address Translation to a 4-KByte Page using IA-32e Paging • A 4-KByte naturally aligned page-directory-pointer table is located at the physical address specified in bits 51:12 of the PML4E (see Table 4-13).
PAGING 47 Linear Address 39 38 21 20 30 29 Directory PML4 Directory Ptr 0 Offset 21 9 9 2-MByte Page Physical Addr Page-DirectoryPointer Table PDE with PS=1 31 Page-Directory PDPTE 40 9 40 PML4E 40 CR3 Figure 4-9. Linear-Address Translation to a 2-MByte Page using IA-32e Paging Because a PDE is identified using bits 47:21 of the linear address, it controls access to a 2-MByte region of the linear-address space. Use of the PDE depends on its PS flag (bit 7): 4-26 Vol.
PAGING Table 4-13. Format of an IA-32e PML4 Entry (PML4E) that References a PageDirectory-Pointer Table Bit Position(s) Contents 0 (P) Present; must be 1 to reference a page-directory-pointer table 1 (R/W) Read/write; if 0, writes may not be allowed to the 512-GByte region controlled by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 512-GByte region controlled by this entry (see Section 4.
PAGING Table 4-14. Format of an IA-32e Page-Directory-Pointer-Table Entry (PDPTE) that References a Page Directory • Bit Position(s) Contents 0 (P) Present; must be 1 to reference a page directory 1 (R/W) Read/write; if 0, writes may not be allowed to the 1-GByte region controlled by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 1-GByte region controlled by this entry (see Section 4.
PAGING Table 4-15. Format of an IA-32e Page-Directory Entry that Maps a 2-MByte Page Bit Position(s) Contents 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 2-MByte page referenced by this entry (see Section 4.6) 3 (PWT) Page-level write-through; indirectly determines the memory type used to access the 2-MByte page referenced by this entry (see Section 4.
PAGING comprises 512 64-bit entries (PTEs). A PTE is selected using the physical address defined as follows: Table 4-16. Format of an IA-32e Page-Directory Entry that References a Page Table Bit Position(s) Contents 0 (P) Present; must be 1 to reference a page table 1 (R/W) Read/write; if 0, writes may not be allowed to the 2-MByte region controlled by this entry (depends on CPL and CR0.WP; see Section 4.
PAGING Table 4-17. Format of an IA-32e Page-Table Entry that Maps a 4-KByte Page Bit Position(s) Contents 0 (P) Present; must be 1 to map a 4-KByte page 1 (R/W) Read/write; if 0, writes may not be allowed to the 4-KByte page referenced by this entry (depends on CPL and CR0.WP; see Section 4.6) 2 (U/S) User/supervisor; if 0, accesses with CPL=3 are not allowed to the 4-KByte page referenced by this entry (see Section 4.
PAGING • • • If the P flag of a PML4E or a PDPTE is 1, the PS flag is reserved. If the P flag and the PS flag of a PDE are both 1, bits 20:13 are reserved. If IA32_EFER.NXE = 0 and the P flag of a paging-structure entry is 1, the XD flag (bit 63) is reserved. A reference using a linear address that is successfully translated to a physical address is performed only if allowed by the access rights of the translation; see Section 4.6.
PAGING 6666555555555 3210987654321 M1 M-1 Reserved2 X D Ignored 33322222222221111111111 210987654321098765432109876543210 Address of PML4 table Rsvd. Address of page-directory-pointer table Ignored Ign. Ignored X D Ignored Rsvd. Address of page directory Ign. Ignored Rsvd. X D Ignored Rsvd. Address of 2MB page frame Address of page table Ignored X D Ignored Rsvd.
PAGING both the R/W flag and the U/S flag are 1 in every paging-structure entry controlling the translation. — Instruction fetches. • For 32-bit paging or if IA32_EFER.NXE = 0, instructions may be fetched from any linear address with a valid translation for which the U/S flag is 1 in every paging-structure entry controlling the translation. • For PAE paging or IA-32e paging with IA32_EFER.
PAGING 4 3 2 1 0 31 P W/R U/S RSVD I/D Reserved P 0 The fault was caused by a non-present page. 1 The fault was caused by a page-level protection violation. W/R 0 The access causing the fault was a read. 1 The access causing the fault was a write. U/S 0 The access causing the fault originated when the processor was executing in supervisor mode. 1 The access causing the fault originated when the processor was executing in user mode. RSVD 0 The fault was not caused by reserved bit violation.
PAGING Page-fault exceptions occur only due to an attempt to use a linear address. Failures to load the PDPTE registers with PAE paging (see Section 4.4.1) cause generalprotection exceptions (#GP(0)) and not page-fault exceptions. 4.8 ACCESSED AND DIRTY FLAGS For any paging-structure entry that is used during linear-address translation, bit 5 is the accessed flag.1 For paging-structure entries that map a page (as opposed to referencing another paging structure), bit 6 is the dirty flag.
PAGING 4.9 PAGING AND MEMORY TYPING The memory type of a memory access refers to the type of caching used for that access. Chapter 11, “Memory Cache Control” provides many details regarding memory typing in the Intel-64 and IA-32 architectures. This section describes how paging contributes to the determination of memory typing. The way in which paging contributes to memory typing depends on whether the processor supports the Page Attribute Table (PAT; see Section 11.12).1 Section 4.9.1 and Section 4.9.
PAGING The PAT is a 64-bit MSR (IA32_PAT; MSR index 277H) comprising eight (8) 8-bit entries (entry i comprises bits 8i+7:8i of the MSR). For any access to a physical address, the table combines the memory type specified for that physical address by the MTRRs with a memory type selected from the PAT. Table 11-11 in Section 11.12.3 specifies how a memory type is selected from the PAT.
PAGING tively. Section 4.10.3 explains how software can remove inconsistent cached information by invalidating portions of the TLBs and paging-structure caches. Section 4.10.4 describes special considerations for multiprocessor systems. 4.10.1 Translation Lookaside Buffers (TLBs) A processor may cache information about the translation of linear addresses in translation lookaside buffers (TLBs). In general, TLBs contain entries that map page numbers to page frames; these terms are defined in Section 4.
PAGING 4.10.1.2 Caching Translations in TLBs The processor may accelerate the paging process by caching individual translations in translation lookaside buffers (TLBs). Each entry in a TLB is an individual translation. Each translation is referenced by a page number. It contains the following information from the paging-structure entries used to translate linear addresses with the page number: • • The physical address corresponding to the page number (the page frame).
PAGING entries in memory. See Section 4.10.3.2 for how software can ensure that the processor uses the modified paging-structure entries. If the paging structures specify a translation using a page larger than 4 KBytes, some processors may choose to cache multiple smaller-page TLB entries for that translation. Each such TLB entry would be associated with a page number corresponding to the smaller page size (e.g., bits 47:12 of a linear address with IA-32e paging), even though part of that page number (e.g.
PAGING — The value of the R/W flag of the PML4E. — The value of the U/S flag of the PML4E. — The value of the XD flag of the PML4E. — The values of the PCD and PWT flags of the PML4E. The following items detail how a processor may use the PML4 cache: — If the processor has a PML4-cache entry for a linear address, it may use that entry when translating the linear address (instead of the PML4E in memory).
PAGING — The processor may create a PDPTE-cache entry even if there are no translations for any linear address that might use that entry. — If the processor creates a PDPTE-cache entry, the processor may retain it unmodified even if software subsequently modifies the corresponding PML4E or PDPTE in memory. • PDE cache.
PAGING For example, if the R/W flag is 0 in a PML4E, then the R/W flag will be 0 in any PDPTEcache entry for a PDPTE from the page-directory-pointer table referenced by that PML4E. This is because the R/W flag of each such PDPTE-cache entry is the logicalAND of the R/W flags in the appropriate PML4E and PDPTE. The paging-structure caches contain information only from paging-structure entries that reference other paging structures (and not those that map pages).
PAGING (Any of the above steps would be skipped if the processor does not support the cache in question.) If the processor does not find a TLB or paging-structure-cache entry for the linear address, it uses the linear address to traverse the entire paging-structure hierarchy, as described in Section 4.3, Section 4.4.2, and Section 4.5. 4.10.2.
PAGING 4.10.3 Invalidation of TLBs and Paging-Structure Caches As noted in Section 4.10.1 and Section 4.10.2, the processor may create entries in the TLBs and the paging-structure caches when linear addresses are translated, and it may retain these entries even after the paging structures used to create them have been modified.
PAGING In addition to the instructions identified above, page faults invalidate entries in the TLBs and paging-structure caches. In particular, a page-fault exception resulting from an attempt to use a linear address will invalidate any PML4-cache, PDPTEcache, and PDE-cache entries that would be used for that linear address as well as any TLB entry for that address's page number.
PAGING • If software using PAE paging modifies a PDPTE, it should reload CR3 with the register’s current value to ensure that the modified PDPTE is loaded into the corresponding PDPTE register (see Section 4.4.1). • If the nature of the paging structures is such that a single entry may be used for multiple purposes (see Section 4.10.2.3), software should perform invalidations for all of these purposes.
PAGING in response to an attempted user-mode access) but no other adverse behavior. Such an exception will occur at most once for each affected linear address (see Section 4.10.3.1). • If a paging-structure entry is modified to change the XD flag from 1 to 0, failure to perform an invalidation may result in a “spurious” page-fault exception (e.g., in response to an attempted instruction fetch) but no other adverse behavior.
PAGING TLB shootdown algorithm for processors supporting the Intel-64 and IA-32 architectures: 1. Begin barrier: Stop all but one logical processor; that is, cause all but one to execute the HLT instruction or to enter a spin loop. 2. Allow the active logical processor to change the necessary paging-structure entries. 3. Allow all logical processors to perform invalidations appropriate to the modifications to the paging-structure entries. 4. Allow all logical processors to resume normal operation.
PAGING 4.11 INTERACTIONS WITH VIRTUAL-MACHINE EXTENSIONS (VMX) The architecture for virtual-machine extensions (VMX) includes features that interact with paging. Section 4.11.1 discusses ways in which VMX-specific control transfers, called VMX transitions specially affect paging. Section 4.11.2 gives an overview of VMX features specifically designed to support address translation. 4.11.
PAGING concurrently information for multiple address spaces in its TLBs and paging-structure caches. See Section 25.1 for details. When EPT is in use, the addresses in the paging-structures are not used as physical addresses to access memory and memory-mapped I/O. Instead, they are treated as guest-physical addresses and are translated through a set of EPT paging structures to produce physical addresses.
PAGING segments can be mapped to pages in several ways. To implement a flat (unsegmented) addressing environment, for example, all the code, data, and stack modules can be mapped to one or more large segments (up to 4-GBytes) that share same range of linear addresses (see Figure 3-2 in Section 3.2.2). Here, segments are essentially invisible to applications and the operating-system or executive.
PAGING 4-54 Vol.
CHAPTER 5 PROTECTION In protected mode, the Intel 64 and IA-32 architectures provide a protection mechanism that operates at both the segment level and the page level. This protection mechanism provides the ability to limit access to certain segments or pages based on privilege levels (four privilege levels for segments and two privilege levels for pages).
PROTECTION there is no control bit for turning the protection mechanism on or off. The part of the segment-protection mechanism that is based on privilege levels can essentially be disabled while still in protected mode by assigning a privilege level of 0 (most privileged) to all segment selectors and segment descriptors. This action disables the privilege level protection barriers between segments, but other protection checks such as limit checking and type checking are still carried out.
PROTECTION procedure. The term current privilege level (CPL) refers to the setting of this field. • User/supervisor (U/S) flag — (Bit 2 of paging-structure entries.) Determines the type of page: user or supervisor. • Read/write (R/W) flag — (Bit 1 of paging-structure entries.) Determines the type of access allowed to a page: read-only or read/write. • Execute-disable (XD) flag — (Bit 63 of certain paging-structure entries.
PROTECTION Data-Segment Descriptor 31 Base 31:24 24 23 22 21 20 19 16 15 14 13 12 11 A G B 0 V L D P L Limit 19:16 31 P 0 8 7 Type Base 23:16 4 1 0 E W A 16 15 0 Base Address 15:00 Segment Limit 15:00 0 Code-Segment Descriptor 31 Base 31:24 24 23 22 21 20 19 16 15 14 13 12 11 A G D 0 V L D P L Limit 19:16 31 P 0 8 7 Type Base 23:16 4 1 1 C R A 16 15 0 Base Address 15:00 Segment Limit 15:00 0 System-Segment Descriptor 31 24 23 22 21 20 19 Base 31:24 G 0 31 16 15
PROTECTION The following sections describe how the processor uses these fields and flags to perform the various categories of checks described in the introduction to this chapter. 5.2.1 Code Segment Descriptor in 64-bit Mode Code segments continue to exist in 64-bit mode even though, for address calculations, the segment base is treated as zero.
PROTECTION Code-Segment Descriptor 31 24 23 22 21 20 19 16 15 14 13 12 11 A G D L V L D P L P 8 7 0 Type 4 1 1 C R A 0 31 0 A AVL C D DPL L Accessed Available to Sys. Programmer’s Conforming Default Descriptor Privilege Level 64-Bit Flag G R P Granularity Readable Present Figure 5-2. Descriptor Fields with Flags used in IA-32e Mode 5.3 LIMIT CHECKING The limit field of a segment descriptor prevents programs or procedures from addressing memory locations outside the segment.
PROTECTION • • A doubleword at an offset greater than the (effective-limit – 3) A quadword at an offset greater than the (effective-limit – 7) For expand-down data segments, the segment limit has the same function but is interpreted differently. Here, the effective limit specifies the last address that is not allowed to be accessed within the segment; the range of valid offsets is from (effective-limit + 1) to FFFFFFFFH if the B flag is set and from (effective-limit + 1) to FFFFH if the B flag is clear.
PROTECTION The processor examines type information at various times while operating on segment selectors and segment descriptors. The following list gives examples of typical operations where type checking is performed (this list is not exhaustive): • When a segment selector is loaded into a segment register — Certain segment registers can contain only certain descriptor types, for example: — The CS register only can be loaded with a selector for a code segment.
PROTECTION instruction. If the descriptor type is for a code segment or call gate, a call or jump to another code segment is indicated; if the descriptor type is for a TSS or task gate, a task switch is indicated. — On a call or jump through a call gate (or on an interrupt- or exception-handler call through a trap or interrupt gate), the processor automatically checks that the segment descriptor being pointed to by the gate is for a code segment.
PROTECTION Protection Rings Operating System Kernel Level 0 Operating System Services Level 1 Level 2 Applications Level 3 Figure 5-3. Protection Rings The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing a segment with a greater privilege, except under controlled situations. When the processor detects a privilege level violation, it generates a general-protection exception (#GP).
PROTECTION example, if the DPL of a data segment is 1, only programs running at a CPL of 0 or 1 can access the segment. — Nonconforming code segment (without using a call gate) — The DPL indicates the privilege level that a program or task must be at to access the segment. For example, if the DPL of a nonconforming code segment is 0, only programs running at a CPL of 0 can access the segment.
PROTECTION loads the segment selector into the segment register if the DPL is numerically greater than or equal to both the CPL and the RPL. Otherwise, a general-protection fault is generated and the segment register is not loaded. CS Register CPL Segment Selector For Data Segment RPL Data-Segment Descriptor Privilege Check DPL Figure 5-4.
PROTECTION 3 2 1 0 Code Segment C CPL=3 Lowest Privilege Segment Sel. E3 RPL=3 Code Segment A CPL=2 Segment Sel. E1 RPL=2 Code Segment B CPL=1 Segment Sel. E2 RPL=1 Data Segment E DPL=2 Code Segment D CPL=0 Highest Privilege Figure 5-5. Examples of Accessing Data Segments From Various Privilege Levels As demonstrated in the previous examples, the addressable domain of a program or task varies as its CPL changes.
PROTECTION • Load a data-segment register with a segment selector for a nonconforming, readable, code segment. • Load a data-segment register with a segment selector for a conforming, readable, code segment. • Use a code-segment override prefix (CS) to read a readable, code segment whose selector is already loaded in the CS register. The same rules for accessing data segments apply to method 1.
PROTECTION • The target operand points to a TSS, which contains the segment selector for the target code segment. • The target operand points to a task gate, which points to a TSS, which in turn contains the segment selector for the target code segment. The following sections describe first two types of references. See Section 7.3, “Task Switching,” for information on transferring program control through a task gate and/or TSS.
PROTECTION • • The RPL of the segment selector of the destination code segment. The conforming (C) flag in the segment descriptor for the destination code segment, which determines whether the segment is a conforming (C flag is set) or nonconforming (C flag is clear) code segment. See Section 3.4.5.1, “Codeand Data-Segment Descriptor Types,” for more information about this flag.
PROTECTION Code Segment B CPL=3 3 Segment Sel. D2 RPL=3 Segment Sel. C2 RPL=3 Lowest Privilege Code Segment A CPL=2 2 Segment Sel. C1 RPL=2 Segment Sel. D1 RPL=2 Code Segment C DPL=2 Nonconforming Code Segment Code Segment D DPL=1 Conforming Code Segment 1 0 Highest Privilege Figure 5-7.
PROTECTION In the example in Figure 5-7, code segment D is a conforming code segment. Therefore, calling procedures in both code segment A and B can access code segment D (using either segment selector D1 or D2, respectively), because they both have CPLs that are greater than or equal to the DPL of the conforming code segment. For conforming code segments, the DPL represents the numerically lowest privilege level that a calling procedure may be at to successfully make a call to the code segment.
PROTECTION 5.8.3 Call Gates Call gates facilitate controlled transfers of program control between different privilege levels. They are typically used only in operating systems or executives that use the privilege-level protection mechanism. Call gates are also useful for transferring program control between 16-bit and 32-bit code segments, as described in Section 18.4, “Transferring Control Among Mixed-Size Code Segments.” Figure 5-8 shows the format of a call-gate descriptor.
PROTECTION Note that the P flag in a gate descriptor is normally always set to 1. If it is set to 0, a not present (#NP) exception is generated when a program attempts to access the descriptor. The operating system can use the P flag for special purposes. For example, it could be used to track the number of times the gate is used. Here, the P flag is initially set to 0 causing a trap to the not-present exception handler.
PROTECTION 13 12 11 10 9 8 7 31 Type 0 Reserved Reserved 16 0 0 0 0 0 31 0 8 Offset in Segment 63:31 31 Offset in Segment 31:16 31 P D P L 0 8 7 16 15 14 13 12 11 Type 0 16 15 Segment Selector . 4 0 1 1 0 0 0 Offset in Segment 15:00 0 DPL Descriptor Privilege Level P Gate Valid Figure 5-9. Call-Gate Descriptor in IA-32e Mode • Target code segments referenced by a 64-bit call gate must be 64-bit code segments (CS.L = 1, CS.D = 0).
PROTECTION 5.8.4 Accessing a Code Segment Through a Call Gate To access a call gate, a far pointer to the gate is provided as a target operand in a CALL or JMP instruction. The segment selector from this pointer identifies the call gate (see Figure 5-10); the offset from the pointer is required, but not used or checked by the processor. (The offset can be set to any value.
PROTECTION CS Register CPL Call-Gate Selector RPL Call Gate (Descriptor) DPL Privilege Check Destination CodeSegment Descriptor DPL Figure 5-11. Privilege Check for Control Transfer with Call Gate The privilege checking rules are different depending on whether the control transfer was initiated with a CALL or a JMP instruction, as shown in Table 5-1. Table 5-1.
PROTECTION segments B and C. The dotted line shows that a calling procedure in code segment A cannot access call gate B. The RPL of the segment selector to a call gate must satisfy the same test as the CPL of the calling procedure; that is, the RPL must be less than or equal to the DPL of the call gate. In the example in Figure 5-15, a calling procedure in code segment C can access call gate B using gate selector B2 or B1, but it could not use gate selector B3 to access call gate B.
PROTECTION 3 Code Segment A Gate Selector A RPL=3 CPL=3 Gate Selector B3 RPL=3 Call Gate A DPL=3 Lowest Privilege Code Segment B CPL=2 Gate Selector B1 RPL=2 Call Gate B DPL=2 2 Code Segment C CPL=1 Gate Selector B2 RPL=1 No Stack Switch Occurs 1 Stack Switch Occurs Code Segment D DPL=0 0 Highest Privilege Conforming Code Segment Code Segment E DPL=0 Nonconforming Code Segment Figure 5-12.
PROTECTION Each task must define up to 4 stacks: one for applications code (running at privilege level 3) and one for each of the privilege levels 2, 1, and 0 that are used. (If only two privilege levels are used [3 and 0], then only two stacks must be defined.) Each of these stacks is located in a separate segment and is identified with a segment selector and an offset into the stack segment (a stack pointer).
PROTECTION 3. Checks the stack-segment descriptor for the proper privileges and type and generates an invalid TSS (#TS) exception if violations are detected. 4. Temporarily saves the current values of the SS and ESP registers. 5. Loads the segment selector and stack pointer for the new stack in the SS and ESP registers. 6. Pushes the temporarily saved values for the SS and ESP registers (for the calling procedure) onto the new stack (see Figure 5-13). 7.
PROTECTION dure, one of the parameters can be a pointer to a data structure, or the saved contents of the SS and ESP registers may be used to access parameters in the old stack space. The size of the data items passed to the called procedure depends on the call gate size, as described in Section 5.8.3, “Call Gates.” 5.8.5.1 Stack Switching in 64-bit Mode Although protection-check rules for call gates are unchanged from 32-bit mode, stack-switch changes in 64-bit mode are different.
PROTECTION intended to execute returns from procedures that were called with a CALL instruction. It does not support returns from a JMP instruction, because the JMP instruction does not save a return instruction pointer on the stack. A near return only transfers program control within the current code segment; therefore, the processor performs only a limit check.
PROTECTION 5. (If the RET instruction includes a parameter count operand.) Adds the parameter count (in bytes obtained from the RET instruction) to the current ESP register value, to step past the parameters on the calling procedure’s stack. The resulting ESP value is not checked against the limit of the stack segment. If the ESP value is beyond the limit, that fact is not recognized until the next stack operation. 6. (If the return requires a privilege level change.
PROTECTION • • Stack segment — Computed by adding 24 to the value in IA32_SYSENTER_CS. Stack pointer — Reads this from ECX. The SYSENTER and SYSEXIT instructions preform “fast” calls and returns because they force the processor into a predefined privilege level 0 state when SYSENTER is executed and into a predefined privilege level 3 state when SYSEXIT is executed.
PROTECTION When SYSEXIT transfers control to compatibility mode user code when the operand size attribute is 32 bits, the following fields are generated and bits set: • Target code segment — Computed by adding 16 to the value in IA32_SYSENTER_CS. • • • • New CS attributes — L-bit = 0 (go to compatibility mode). Target instruction — Fetch the target instruction from 32-bit address in EDX. Stack segment — Computed by adding 24 to the value in IA32_SYSENTER_CS.
PROTECTION When SYSRET transfers control to 32-bit mode user code using a 32-bit operand size, the processor gets the privilege level 3 target instruction and stack pointer from: • • • • Target code segment — Reads a non-NULL selector from IA32_STAR[63:48]. Target instruction — Copies the value in ECX into EIP. Stack segment — IA32_STAR[63:48] + 8. EFLAGS — Loaded from R11.
PROTECTION general-protection exception (#GP) is generated. The following system instructions are privileged instructions: • • • • • • • • • • • • • • • • LGDT — Load GDT register. LLDT — Load LDT register. LTR — Load task register. LIDT — Load IDT register. MOV (control registers) — Load and store control registers. LMSW — Load machine status word. CLTS — Clear task-switched flag in register CR0. MOV (debug registers) — Load and store debug registers. INVD — Invalidate cache, without writeback.
PROTECTION The processor automatically performs first, second, and third checks during instruction execution. Software must explicitly request the fourth check by issuing an ARPL instruction. The fifth check (offset alignment) is performed automatically at privilege level 3 if alignment checking is turned on. Offset alignment does not affect isolation of privilege levels. 5.10.
PROTECTION 5.10.2 Checking Read/Write Rights (VERR and VERW Instructions) When the processor accesses any code or data segment it checks the read/write privileges assigned to the segment to verify that the intended read or write operation is allowed. Software can check read/write rights using the VERR (verify for reading) and VERW (verify for writing) instructions. Both these instructions specify the segment selector for the segment being checked.
PROTECTION destination register and sets the ZF flag in the EFLAGS register. If the segment selector is not visible at the current privilege level or is an invalid type for the LSL instruction, the instruction does not modify the destination register and clears the ZF flag. Once loaded in the destination register, software can compare the segment limit with the offset of a pointer. 5.10.
PROTECTION Passed as a parameter on the stack. Application Program Code Segment A CPL=3 3 Gate Selector B RPL=3 Call Gate B Segment Sel. D1 RPL=3 DPL=3 Lowest Privilege 2 Access not allowed 1 Code Operating Segment C System DPL=0 0 Highest Privilege Segment Sel. D2 RPL=0 Access allowed Data Segment D DPL=0 Figure 5-15.
PROTECTION The example in Figure 5-15 demonstrates how the ARPL instruction is intended to be used. When the operating-system receives segment selector D2 from the application program, it uses the ARPL instruction to compare the RPL of the segment selector with the privilege level of the application program (represented by the code-segment selector pushed onto the stack).
PROTECTION page-fault exception mechanism. This chapter describes the protection violations which lead to page-fault exceptions. 5.11.1 Page-Protection Flags Protection information for pages is contained in two flags in a paging-structure entry (see Chapter 4): the read/write flag (bit 1) and the user/supervisor flag (bit 2). The protection checks use the flags in all paging structures. 5.11.
PROTECTION When the processor is in supervisor mode and the WP flag in register CR0 is clear (its state following reset initialization), all pages are both readable and writable (writeprotection is ignored). When the processor is in user mode, it can write only to usermode pages that are read/write accessible. User-mode pages which are read/write or read-only are readable; supervisor-mode pages are neither readable nor writable from user mode.
PROTECTION exception is generated. If an exception is generated by segmentation, no paging exception is generated. Page-level protections cannot be used to override segment-level protection. For example, a code segment is by definition not writable. If a code segment is paged, setting the R/W flag for the pages to read-write does not make the pages writable. Attempts to write into the pages will be blocked by segment-level protection checks.
PROTECTION 5.13 PAGE-LEVEL PROTECTION AND EXECUTE-DISABLE BIT In addition to page-level protection offered by the U/S and R/W flags, paging structures used with PAE paging and IA-32e paging (see Chapter 4) provide the executedisable bit. This bit offers additional protection for data pages. An Intel 64 or IA-32 processor with the execute-disable bit capability can prevent data pages from being used by malicious software to execute code.
PROTECTION 5.13.2 Execute-Disable Page Protection The execute-disable bit in the paging structures enhances page protection for data pages. Instructions cannot be fetched from a memory page if IA32_EFER.NXE =1 and the execute-disable bit is set in any of the paging-structure entries used to map the page. Table 5-5 lists the valid usage of a page in relation to the value of executedisable bit (bit 63) of the corresponding entry in each level of the paging structures.
PROTECTION Table 5-6. Legacy PAE-Enabled 4-KByte Page Level Protection Matrix with Execute-Disable Bit Capability Execute Disable Bit Value (Bit 63) Valid Usage PDE PTE Bit 63 = 1 * Data * Bit 63 = 1 Data Bit 63 = 0 Bit 63 = 0 Data/Code NOTE: * Value not checked. Table 5-7. Legacy PAE-Enabled 2-MByte Page Level Protection with Execute-Disable Bit Capability Execute Disable Bit Value (Bit 63) Valid Usage PDE Bit 63 = 1 Data Bit 63 = 0 Data/Code 5.13.
PROTECTION Table 5-8.
PROTECTION Table 5-9.
PROTECTION 5-48 Vol.
CHAPTER 6 INTERRUPT AND EXCEPTION HANDLING This chapter describes the interrupt and exception-handling mechanism when operating in protected mode on an Intel 64 or IA-32 processor. Most of the information provided here also applies to interrupt and exception mechanisms used in realaddress, virtual-8086 mode, and 64-bit mode. Chapter 17, “8086 Emulation,” describes information specific to interrupt and exception mechanisms in real-address and virtual-8086 mode. Section 6.
INTERRUPT AND EXCEPTION HANDLING 6.2 EXCEPTION AND INTERRUPT VECTORS To aid in handling exceptions and interrupts, each architecturally defined exception and each interrupt condition requiring special handling by the processor is assigned a unique identification number, called a vector. The processor uses the vector assigned to an exception or interrupt as an index into the interrupt descriptor table (IDT). The table provides the entry point to an exception or interrupt handler (see Section 6.
INTERRUPT AND EXCEPTION HANDLING (see Section 6.2, “Exception and Interrupt Vectors”). Asserting the NMI pin signals a non-maskable interrupt (NMI), which is assigned to interrupt vector 2. Table 6-1. Protected-Mode Exceptions and Interrupts Vector No. Mne- Description monic Type Error Code Source 0 #DE Divide Error Fault No DIV and IDIV instructions. 1 #DB RESERVED Fault/ Trap No For Intel use only. 2 — NMI Interrupt Interrupt No Nonmaskable external interrupt.
INTERRUPT AND EXCEPTION HANDLING Table 6-1. Protected-Mode Exceptions and Interrupts (Contd.) 18 #MC Machine Check Abort No Error codes (if any) and source are model dependent.4 19 #XM SIMD Floating-Point Exception Fault No SSE/SSE2/SSE3 floating-point instructions5 20-31 — Intel reserved. Do not use. 32255 — User Defined (Nonreserved) Interrupts Interrupt External interrupt or INT n instruction. NOTES: 1. The UD2 instruction was introduced in the Pentium Pro processor. 2.
INTERRUPT AND EXCEPTION HANDLING defined interrupt vectors from 0 through 255; those that can be delivered through the local APIC include interrupt vectors 16 through 255. The IF flag in the EFLAGS register permits all maskable hardware interrupts to be masked as a group (see Section 6.8.1, “Masking Maskable Hardware Interrupts”). Note that when interrupts 0 through 15 are delivered through the local APIC, the APIC indicates the receipt of an illegal vector. 6.3.
INTERRUPT AND EXCEPTION HANDLING 6.4.2 Software-Generated Exceptions The INTO, INT 3, and BOUND instructions permit exceptions to be generated in software. These instructions allow checks for exception conditions to be performed at points in the instruction stream. For example, INT 3 causes a breakpoint exception to be generated. The INT n instruction can be used to emulate exceptions in software; but there is a limitation.
INTERRUPT AND EXCEPTION HANDLING • Aborts — An abort is an exception that does not always report the precise location of the instruction causing the exception and does not allow a restart of the program or task that caused the exception. Aborts are used to report severe errors, such as hardware errors and inconsistent or illegal values in system tables. NOTE One exception subset normally reported as a fault is not restartable. Such exceptions result in loss of some processor state.
INTERRUPT AND EXCEPTION HANDLING EFLAGS.OF (overflow) flag. The trap handler for this exception resolves the overflow condition. Upon return from the trap handler, program or task execution continues at the instruction following the INTO instruction. The abort-class exceptions do not support reliable restarting of the program or task.
INTERRUPT AND EXCEPTION HANDLING It is possible to issue a maskable hardware interrupt (through the INTR pin) to vector 2 to invoke the NMI interrupt handler; however, this interrupt will not truly be an NMI interrupt. A true NMI interrupt that activates the processor’s NMI-handling hardware can only be delivered through one of the mechanisms listed above. 6.7.
INTERRUPT AND EXCEPTION HANDLING is an interrupt. As with the INT n instruction (see Section 6.4.2, “Software-Generated Exceptions”), when an interrupt is generated through the INTR pin to an exception vector, the processor does not push an error code on the stack, so the exception handler may not operate correctly. The IF flag can be set or cleared with the STI (set interrupt-enable flag) and CLI (clear interrupt-enable flag) instructions, respectively.
INTERRUPT AND EXCEPTION HANDLING 6.8.3 Masking Exceptions and Interrupts When Switching Stacks To switch to a different stack segment, software often uses a pair of instructions, for example: MOV SS, AX MOV ESP, StackTop If an interrupt or exception occurs after the segment selector has been loaded into the SS register but before the ESP register has been loaded, these two parts of the logical address into the stack space are inconsistent for the duration of the interrupt or exception handler.
INTERRUPT AND EXCEPTION HANDLING Table 6-2. Priority Among Simultaneous Exceptions and Interrupts (Contd.
INTERRUPT AND EXCEPTION HANDLING protected mode). Unlike the GDT, the first entry of the IDT may contain a descriptor. To form an index into the IDT, the processor scales the exception or interrupt vector by eight (the number of bytes in a gate descriptor). Because there are only 256 interrupt or exception vectors, the IDT need not contain more than 256 descriptors. It can contain fewer than 256 descriptors, because descriptors are required only for the interrupt and exception vectors that may occur.
INTERRUPT AND EXCEPTION HANDLING IDTR Register 47 16 15 IDT Base Address 0 IDT Limit + Interrupt Descriptor Table (IDT) Gate for Interrupt #n (n−1)∗8 Gate for Interrupt #3 16 Gate for Interrupt #2 8 Gate for Interrupt #1 31 0 0 Figure 6-1. Relationship of the IDTR and IDT 6.
INTERRUPT AND EXCEPTION HANDLING Task Gate 31 16 15 14 13 12 P 31 D P L 0 8 7 4 0 0 1 0 1 16 15 0 TSS Segment Selector 0 Interrupt Gate 31 16 15 14 13 12 Offset 31..16 31 P D P L 8 7 0 D 1 1 0 5 4 0 0 0 0 16 15 4 0 Segment Selector Offset 15..0 0 Trap Gate 31 16 15 14 13 12 Offset 31..16 31 P D P L 8 7 0 D 1 1 1 5 4 0 0 0 16 15 Segment Selector DPL Offset P Selector D 0 4 0 Offset 15..
INTERRUPT AND EXCEPTION HANDLING “Returning from a Called Procedure”). If index points to a task gate, the processor executes a task switch to the exception- or interrupt-handler task in a manner similar to a CALL to a task gate (see Section 7.3, “Task Switching”). 6.12.1 Exception- or Interrupt-Handler Procedures An interrupt gate or trap gate references an exception- or interrupt-handler procedure that runs in the context of the currently executing task (see Figure 6-3).
INTERRUPT AND EXCEPTION HANDLING When the processor performs a call to the exception- or interrupt-handler procedure: • If the handler procedure is going to be executed at a numerically lower privilege level, a stack switch occurs. When the stack switch occurs: a. The segment selector and stack pointer for the stack to be used by the handler are obtained from the TSS for the currently executing task.
INTERRUPT AND EXCEPTION HANDLING Stack Usage with No Privilege-Level Change Interrupted Procedure’s and Handler’s Stack EFLAGS CS EIP Error Code ESP Before Transfer to Handler ESP After Transfer to Handler Stack Usage with Privilege-Level Change Interrupted Procedure’s Stack Handler’s Stack ESP Before Transfer to Handler ESP After Transfer to Handler SS ESP EFLAGS CS EIP Error Code Figure 6-4.
INTERRUPT AND EXCEPTION HANDLING not permit transfer of execution to an exception- or interrupt-handler procedure in a less privileged code segment (numerically greater privilege level) than the CPL. An attempt to violate this rule results in a general-protection exception (#GP).
INTERRUPT AND EXCEPTION HANDLING of the EFLAGS register on the stack. Accessing a handler procedure through a trap gate does not affect the IF flag. 6.12.2 Interrupt Tasks When an exception or interrupt handler is accessed through a task gate in the IDT, a task switch results. Handling an exception or interrupt with a separate task offers several advantages: • • The entire context of the interrupted program or task is saved automatically.
INTERRUPT AND EXCEPTION HANDLING IDT Interrupt Vector TSS for InterruptHandling Task Task Gate TSS Selector GDT TSS Base Address TSS Descriptor Figure 6-5. Interrupt Task Switch 6.13 ERROR CODE When an exception condition is related to a specific segment, the processor pushes an error code onto the stack of the exception handler (whether it is a procedure or task). The error code has the format shown in Figure 6-6.
INTERRUPT AND EXCEPTION HANDLING clear, indicates that the index refers to a descriptor in the GDT or the current LDT. TI GDT/LDT (bit 2) — Only used when the IDT flag is clear. When set, the TI flag indicates that the index portion of the error code refers to a segment or gate descriptor in the LDT; when clear, it indicates that the index refers to a descriptor in the current GDT. 31 3 2 1 0 Reserved Segment Selector Index T I E X I D T T Figure 6-6.
INTERRUPT AND EXCEPTION HANDLING • The stack pointer (SS:RSP) is pushed unconditionally on interrupts. In legacy modes, this push is conditional and based on a change in current privilege level (CPL). • • • • The new SS is set to NULL if there is a change in CPL. IRET behavior changes. There is a new interrupt stack-switch mechanism. The alignment of interrupt stack frame is different. 6.14.
INTERRUPT AND EXCEPTION HANDLING ware attempts to reference an interrupt gate with a target RIP that is not in canonical form. The target code segment referenced by the interrupt gate must be a 64-bit code segment (CS.L = 1, CS.D = 0). If the target is not a 64-bit code segment, a generalprotection exception (#GP) is generated with the IDT vector number reported as the error code. Only 64-bit interrupt and trap gates can be referenced in IA-32e mode (64-bit mode and compatibility mode).
INTERRUPT AND EXCEPTION HANDLING 6.14.3 IRET in IA-32e Mode In IA-32e mode, IRET executes with an 8-byte operand size. There is nothing that forces this requirement. The stack is formatted in such a way that for actions where IRET is required, the 8-byte IRET operand size works correctly. Because interrupt stack-frame pushes are always eight bytes in IA-32e mode, an IRET must pop eight byte items off the stack. This is accomplished by preceding the IRET with a 64-bit operand-size prefix.
INTERRUPT AND EXCEPTION HANDLING In summary, a stack switch in IA-32e mode works like the legacy stack switch, except that a new SS selector is not loaded from the TSS. Instead, the new SS is forced to NULL. Legacy Mode +20 +16 +12 +8 +4 0 Stack Usage with Privilege-Level Change IA-32e Mode Handler’s Stack Handler’s Stack SS ESP EFLAGS CS EIP Error Code SS RSP RFLAGS CS RIP Error Code Stack Pointer After Transfer to Handler +40 +32 +24 +16 +8 0 Figure 6-8.
INTERRUPT AND EXCEPTION HANDLING 6.15 EXCEPTION AND INTERRUPT REFERENCE The following sections describe conditions which generate exceptions and interrupts. They are arranged in the order of vector numbers. The information contained in these sections are as follows: • Exception Class — Indicates whether the exception class is a fault, trap, or abort type. Some exceptions can be either a fault or trap type, depending on when the error condition is detected. (This section is not applicable to interrupts.
INTERRUPT AND EXCEPTION HANDLING Interrupt 0—Divide Error Exception (#DE) Exception Class Fault. Description Indicates the divisor operand for a DIV or IDIV instruction is 0 or that the result cannot be represented in the number of bits specified for the destination operand. Exception Error Code None. Saved Instruction Pointer Saved contents of CS and EIP registers point to the instruction that generated the exception.
INTERRUPT AND EXCEPTION HANDLING Interrupt 1—Debug Exception (#DB) Exception Class Trap or Fault. The exception handler can distinguish between traps or faults by examining the contents of DR6 and the other debug registers. Description Indicates that one or more of several debug-exception conditions has been detected. Whether the exception is a fault or a trap depends on the condition (see Table 6-3).
INTERRUPT AND EXCEPTION HANDLING Interrupt 2—NMI Interrupt Exception Class Not applicable. Description The nonmaskable interrupt (NMI) is generated externally by asserting the processor’s NMI pin or through an NMI request set by the I/O APIC to the local APIC. This interrupt causes the NMI interrupt handler to be called. Exception Error Code Not applicable. Saved Instruction Pointer The processor always takes an NMI interrupt on an instruction boundary.
INTERRUPT AND EXCEPTION HANDLING Interrupt 3—Breakpoint Exception (#BP) Exception Class Trap. Description Indicates that a breakpoint instruction (INT 3) was executed, causing a breakpoint trap to be generated. Typically, a debugger sets a breakpoint by replacing the first opcode byte of an instruction with the opcode for the INT 3 instruction. (The INT 3 instruction is one byte long, which makes it easy to replace an opcode in a code segment in RAM with the breakpoint opcode.
INTERRUPT AND EXCEPTION HANDLING Interrupt 4—Overflow Exception (#OF) Exception Class Trap. Description Indicates that an overflow trap occurred when an INTO instruction was executed. The INTO instruction checks the state of the OF flag in the EFLAGS register. If the OF flag is set, an overflow trap is generated. Some arithmetic instructions (such as the ADD and SUB) perform both signed and unsigned arithmetic.
INTERRUPT AND EXCEPTION HANDLING Interrupt 5—BOUND Range Exceeded Exception (#BR) Exception Class Fault. Description Indicates that a BOUND-range-exceeded fault occurred when a BOUND instruction was executed. The BOUND instruction checks that a signed array index is within the upper and lower bounds of an array located in memory. If the array index is not within the bounds of the array, a BOUND-range-exceeded fault is generated. Exception Error Code None.
INTERRUPT AND EXCEPTION HANDLING Interrupt 6—Invalid Opcode Exception (#UD) Exception Class Fault. Description Indicates that the processor did one of the following things: • • Attempted to execute an invalid or reserved opcode. • Attempted to execute an MMX or SSE/SSE2/SSE3 instruction on an Intel 64 or IA-32 processor that does not support the MMX technology or SSE/SSE2/SSE3/SSSE3 extensions, respectively.
INTERRUPT AND EXCEPTION HANDLING processor and earlier IA-32 processors, this exception is not generated as the result of prefetching and preliminary decoding of an invalid instruction. (See Section 6.5, “Exception Classifications,” for general rules for taking of interrupts and exceptions.) The opcodes D6 and F1 are undefined opcodes reserved by the Intel 64 and IA-32 architectures. These opcodes, even though undefined, do not generate an invalid opcode exception.
INTERRUPT AND EXCEPTION HANDLING Interrupt 7—Device Not Available Exception (#NM) Exception Class Fault. Description Indicates one of the following things: The device-not-available exception is generated by either of three conditions: • The processor executed an x87 FPU floating-point instruction while the EM flag in control register CR0 was set (1). See the paragraph below for the special case of the WAIT/FWAIT instruction.
INTERRUPT AND EXCEPTION HANDLING Saved Instruction Pointer The saved contents of CS and EIP registers point to the floating-point instruction or the WAIT/FWAIT instruction that generated the exception. Program State Change A program-state change does not accompany a device-not-available fault, because the instruction that generated the exception is not executed.
INTERRUPT AND EXCEPTION HANDLING Interrupt 8—Double Fault Exception (#DF) Exception Class Abort. Description Indicates that the processor detected a second exception while calling an exception handler for a prior exception. Normally, when the processor detects another exception while trying to call an exception handler, the two exceptions can be handled serially. If, however, the processor cannot handle them serially, it signals the double-fault exception.
INTERRUPT AND EXCEPTION HANDLING A segment or page fault may be encountered while prefetching instructions; however, this behavior is outside the domain of Table 6-5. Any further faults generated while the processor is attempting to transfer control to the appropriate fault handler could still lead to a double-fault sequence. Table 6-5.
INTERRUPT AND EXCEPTION HANDLING If the double fault occurs when any portion of the exception handling machine state is corrupted, the handler cannot be invoked and the processor must be reset. 6-40 Vol.
INTERRUPT AND EXCEPTION HANDLING Interrupt 9—Coprocessor Segment Overrun Exception Class Abort. (Intel reserved; do not use. Recent IA-32 processors do not generate this exception.) Description Indicates that an Intel386 CPU-based systems with an Intel 387 math coprocessor detected a page or segment violation while transferring the middle portion of an Intel 387 math coprocessor operand.
INTERRUPT AND EXCEPTION HANDLING Interrupt 10—Invalid TSS Exception (#TS) Exception Class Fault. Description Indicates that there was an error related to a TSS. Such an error might be detected during a task switch or during the execution of instructions that use information from a TSS. Table 6-6 shows the conditions that cause an invalid TSS exception to be generated. Table 6-6.
INTERRUPT AND EXCEPTION HANDLING Table 6-6. Invalid TSS Conditions (Contd.) Error Code Index Invalid Condition Stack segment selector index The stack segment selector exceeds descriptor table limit. Stack segment selector index The stack segment selector is NULL. Stack segment selector index The stack segment descriptor is a non-data segment. Stack segment selector index The stack segment is not writable. Stack segment selector index The stack segment DPL != CPL.
INTERRUPT AND EXCEPTION HANDLING Table 6-6. Invalid TSS Conditions (Contd.) Error Code Index Invalid Condition TSS segment selector index The TSS segment upper descriptor is not the correct type. TSS segment selector index The TSS segment descriptor contains a non-canonical base. TSS segment selector index There is a limit violation in attempting to load SS selector or ESP from a TSS on a call or exception which changes privilege levels in legacy mode.
INTERRUPT AND EXCEPTION HANDLING If an invalid TSS exception occurs during a task switch, it can occur before or after the commit-to-new-task point. If it occurs before the commit point, no program state change occurs. If it occurs after the commit point (when the segment descriptor information for the new segment selectors have been loaded in the segment registers), the processor will load all the state information from the new TSS before it generates the exception.
INTERRUPT AND EXCEPTION HANDLING Interrupt 11—Segment Not Present (#NP) Exception Class Fault. Description Indicates that the present flag of a segment or gate descriptor is clear. The processor can generate this exception during any of the following operations: • While attempting to load CS, DS, ES, FS, or GS registers. [Detection of a notpresent segment while loading the SS register causes a stack fault exception (#SS) to be generated.] This situation can occur while performing a task switch.
INTERRUPT AND EXCEPTION HANDLING tors for the segment selectors in a new TSS, the CS and EIP registers point to the first instruction in the new task. If the exception occurred while accessing a gate descriptor, the CS and EIP registers point to the instruction that invoked the access (for example a CALL instruction that references a call gate).
INTERRUPT AND EXCEPTION HANDLING Interrupt 12—Stack Fault Exception (#SS) Exception Class Fault. Description Indicates that one of the following stack related conditions was detected: • A limit violation is detected during an operation that refers to the SS register.
INTERRUPT AND EXCEPTION HANDLING Program State Change A program-state change does not generally accompany a stack-fault exception, because the instruction that generated the fault is not executed. Here, the instruction can be restarted after the exception handler has corrected the stack fault condition. If a stack fault occurs during a task switch, it occurs after the commit-to-new-task point (see Section 7.3, “Task Switching”).
INTERRUPT AND EXCEPTION HANDLING Interrupt 13—General Protection Exception (#GP) Exception Class Fault. Description Indicates that the processor detected one of a class of protection violations called “general-protection violations.” The conditions that cause this exception to be generated comprise all the protection violations that do not cause other exceptions to be generated (such as, invalid-TSS, segment-not-present, stack-fault, or page-fault exceptions).
INTERRUPT AND EXCEPTION HANDLING • • Loading the CR0 register with a set NW flag and a clear CD flag. • Attempting to access an interrupt or exception handler through an interrupt or trap gate from virtual-8086 mode when the handler’s code segment DPL is greater than 0. • • Attempting to write a 1 into a reserved bit of CR4. • • • Writing to a reserved bit in an MSR. Referencing an entry in the IDT (following an interrupt or exception) that is not an interrupt, trap, or task gate.
INTERRUPT AND EXCEPTION HANDLING • • A selector from a TSS involved in a task switch. IDT vector number. Saved Instruction Pointer The saved contents of CS and EIP registers point to the instruction that generated the exception. Program State Change In general, a program-state change does not accompany a general-protection exception, because the invalid instruction or operation is not executed.
INTERRUPT AND EXCEPTION HANDLING • If the segment descriptor pointed to by the segment selector in the destination operand is a code segment and it has both the D-bit and the L-bit set. • • If the segment descriptor from a 64-bit call gate is in non-canonical space. • • If the upper type field of a 64-bit call gate is not 0x0. • If an attempt is made to load null selector in the SS register in CPL3 and 64-bit mode.
INTERRUPT AND EXCEPTION HANDLING Interrupt 14—Page-Fault Exception (#PF) Exception Class Fault.
INTERRUPT AND EXCEPTION HANDLING — The U/S flag indicates whether the processor was executing at user mode (1) or supervisor mode (0) at the time of the exception. — The RSVD flag indicates that the processor detected 1s in reserved bits of the page directory, when the PSE or PAE flags in control register CR4 are set to 1. Note: • The PSE flag is only available in recent Intel 64 and IA-32 processors including the Pentium 4, Intel Xeon, P6 family, and Pentium processors.
INTERRUPT AND EXCEPTION HANDLING second page fault can occur.1 If a page fault is caused by a page-level protection violation, the access flag in the page-directory entry is set when the fault occurs. The behavior of IA-32 processors regarding the access flag in the corresponding page-table entry is model specific and not architecturally defined. Saved Instruction Pointer The saved contents of CS and EIP registers generally point to the instruction that generated the exception.
INTERRUPT AND EXCEPTION HANDLING description for “Interrupt 10—Invalid TSS Exception (#TS)” in this chapter for additional information on how to handle this situation.) Additional Exception-Handling Information Special care should be taken to ensure that an exception that occurs during an explicit stack switch does not cause the processor to use an invalid stack pointer (SS:ESP).
INTERRUPT AND EXCEPTION HANDLING Interrupt 16—x87 FPU Floating-Point Error (#MF) Exception Class Fault. Description Indicates that the x87 FPU has detected a floating-point error. The NE flag in the register CR0 must be set for an interrupt 16 (floating-point error exception) to be generated. (See Section 2.5, “Control Registers,” for a detailed description of the NE flag.) NOTE SIMD floating-point exceptions (#XM) are signaled through interrupt 19.
INTERRUPT AND EXCEPTION HANDLING Prior to executing a waiting x87 FPU instruction or the WAIT/FWAIT instruction, the x87 FPU checks for pending x87 FPU floating-point exceptions (as described in step 2 above). Pending x87 FPU floating-point exceptions are ignored for “non-waiting” x87 FPU instructions, which include the FNINIT, FNCLEX, FNSTSW, FNSTSW AX, FNSTCW, FNSTENV, and FNSAVE instructions. Pending x87 FPU exceptions are also ignored when executing the state management instructions FXSAVE and FXRSTOR.
INTERRUPT AND EXCEPTION HANDLING Interrupt 17—Alignment Check Exception (#AC) Exception Class Fault. Description Indicates that the processor detected an unaligned memory operand when alignment checking was enabled. Alignment checks are only carried out in data (or stack) accesses (not in code fetches or system segment accesses). An example of an alignment-check violation is a word stored at an odd byte address, or a doubleword stored at an address that is not an integer multiple of 4.
INTERRUPT AND EXCEPTION HANDLING • • AC flag in the EFLAGS register is set. The CPL is 3 (protected mode or virtual-8086 mode). Alignment-check exceptions (#AC) are generated only when operating at privilege level 3 (user mode). Memory references that default to privilege level 0, such as segment descriptor loads, do not generate alignment-check exceptions, even when caused by a memory reference made from privilege level 3.
INTERRUPT AND EXCEPTION HANDLING Interrupt 18—Machine-Check Exception (#MC) Exception Class Abort. Description Indicates that the processor detected an internal machine error or a bus error, or that an external agent detected a bus error. The machine-check exception is modelspecific, available on the Pentium and later generations of processors.
INTERRUPT AND EXCEPTION HANDLING For the Pentium 4, Intel Xeon, P6 family, and Pentium processors, a program-state change always accompanies a machine-check exception, and an abort class exception is generated. For abort exceptions, information about the exception can be collected from the machine-check MSRs, but the program cannot generally be restarted.
INTERRUPT AND EXCEPTION HANDLING Interrupt 19—SIMD Floating-Point Exception (#XM) Exception Class Fault. Description Indicates the processor has detected an SSE/SSE2/SSE3 SIMD floating-point exception. The appropriate status flag in the MXCSR register must be set and the particular exception unmasked for this interrupt to be generated.
INTERRUPT AND EXCEPTION HANDLING Note that because SIMD floating-point exceptions are precise and occur immediately, the situation does not arise where an x87 FPU instruction, a WAIT/FWAIT instruction, or another SSE/SSE2/SSE3 instruction will catch a pending unmasked SIMD floatingpoint exception.
INTERRUPT AND EXCEPTION HANDLING Saved Instruction Pointer The saved contents of CS and EIP registers point to the SSE/SSE2/SSE3 instruction that was executed when the SIMD floating-point exception was generated. This is the faulting instruction in which the error condition was detected. Program State Change A program-state change does not accompany a SIMD floating-point exception because the handling of the exception is immediate unless the particular exception is masked.
INTERRUPT AND EXCEPTION HANDLING Interrupts 32 to 255—User Defined Interrupts Exception Class Not applicable. Description Indicates that the processor did one of the following things: • Executed an INT n instruction where the instruction operand is one of the vector numbers from 32 through 255. • Responded to an interrupt request at the INTR pin or from the local APIC when the interrupt vector number associated with the request is from 32 through 255. Exception Error Code Not applicable.
INTERRUPT AND EXCEPTION HANDLING 6-68 Vol.
CHAPTER 7 TASK MANAGEMENT This chapter describes the IA-32 architecture’s task management facilities. These facilities are only available when the processor is running in protected mode. This chapter focuses on 32-bit tasks and the 32-bit TSS structure. For information on 16-bit tasks and the 16-bit TSS structure, see Section 7.6, “16-Bit Task-State Segment (TSS).” For information specific to task management in 64-bit mode, see Section 7.7, “Task Management in 64-bit Mode.” 7.
TASK MANAGEMENT Code Segment Task-State Segment (TSS) Data Segment Stack Segment (Current Priv. Level) Stack Seg. Priv. Level 0 Stack Seg. Priv. Level 1 Task Register CR3 Stack Segment (Priv. Level 2) Figure 7-1. Structure of a Task 7.1.2 Task State The following items define the state of the currently executing task: • The task’s current execution space, defined by the segment selectors in the segment registers (CS, DS, SS, ES, FS, and GS).
TASK MANAGEMENT 7.1.3 Executing a Task Software or the processor can dispatch a task for execution in one of the following ways: • • • • • A explicit call to a task with the CALL instruction. A explicit jump to a task with the JMP instruction. An implicit call (by the processor) to an interrupt-handler task. An implicit call to an exception-handler task. A return (initiated with an IRET instruction) when the NT flag in the EFLAGS register is set.
TASK MANAGEMENT page tables as other privilege-level-3 tasks can access code and corrupt data and the stack of other tasks. Use of task management facilities for handling multitasking applications is optional. Multitasking can be handled in software, with each software defined task executed in the context of a single IA-32 architecture task. 7.2 TASK MANAGEMENT DATA STRUCTURES The processor defines five data structures for handling task-related activities: • • • • • Task-state segment (TSS).
TASK MANAGEMENT 31 0 15 Reserved I/O Map Base Address LDT Segment Selector Reserved T 100 96 Reserved GS 92 Reserved FS 88 Reserved DS 84 Reserved SS 80 Reserved CS 76 Reserved ES 72 EDI 68 ESI 64 EBP 60 ESP 56 EBX 52 EDX 48 ECX 44 EAX 40 EFLAGS 36 EIP 32 CR3 (PDBR) 28 Reserved SS2 Reserved SS1 SS0 8 4 ESP0 Reserved 16 12 ESP1 Reserved 24 20 ESP2 Previous Task Link 0 Reserved bits. Set to 0. Figure 7-2.
TASK MANAGEMENT • EIP (instruction pointer) field — State of the EIP register prior to the task switch. • Previous task link field — Contains the segment selector for the TSS of the previous task (updated on a task switch that was initiated by a call, interrupt, or exception). This field (which is sometimes called the back link field) permits a task switch back to the previous task by using the IRET instruction. The processor reads the static fields, but does not normally change them.
TASK MANAGEMENT • Task switches are carried out faster if the pages containing these structures are present in memory before the task switch is initiated. 7.2.2 TSS Descriptor The TSS, like all other segments, is defined by a segment descriptor. Figure 7-3 shows the format of a TSS descriptor. TSS descriptors may only be placed in the GDT; they cannot be placed in an LDT or the IDT.
TASK MANAGEMENT of a TSS. Attempting to switch to a task whose TSS descriptor has a limit less than 67H generates an invalid-TSS exception (#TS). A larger limit is required if an I/O permission bit map is included or if the operating system stores additional data. The processor does not check for a limit greater than 67H on a task switch; however, it does check when accessing the I/O permission bit map or interrupt redirection bit map.
TASK MANAGEMENT TSS (or LDT) Descriptor 31 13 12 Reserved 0 8 7 0 12 Reserved 31 0 8 Base Address 63:32 31 24 23 22 21 20 19 Base 31:24 A G 0 0 V L 31 16 15 14 13 12 11 Limit 19:16 P D P L 0 8 7 Type 16 15 Base Address 15:00 AVL B BASE DPL G LIMIT P TYPE 4 Base 23:16 0 0 Segment Limit 15:00 0 Available for use by system software Busy flag Segment Base Address Descriptor Privilege Level Granularity Segment Limit Segment Present Segment Type Figure 7-4.
TASK MANAGEMENT The LTR instruction loads a segment selector (source operand) into the task register that points to a TSS descriptor in the GDT. It then loads the invisible portion of the task register with information from the TSS descriptor. LTR is a privileged instruction that may be executed only when the CPL is 0. It’s used during system initialization to put an initial value in the task register. Afterwards, the contents of the task register are changed implicitly when a task switch occurs.
TASK MANAGEMENT 7.2.5 Task-Gate Descriptor A task-gate descriptor provides an indirect, protected reference to a task (see Figure 7-6). It can be placed in the GDT, an LDT, or the IDT. The TSS segment selector field in a task-gate descriptor points to a TSS descriptor in the GDT. The RPL in this segment selector is not used. The DPL of a task-gate descriptor controls access to the TSS descriptor during a task switch.
TASK MANAGEMENT to be handled by handler tasks. When an interrupt or exception vector points to a task gate, the processor switches to the specified task. Figure 7-7 illustrates how a task gate in an LDT, a task gate in the GDT, and a task gate in the IDT can all point to the same task. LDT GDT TSS Task Gate Task Gate TSS Descriptor IDT Task Gate Figure 7-7. Task Gates Referencing the Same Task 7.
TASK MANAGEMENT • • An interrupt or exception vector points to a task-gate descriptor in the IDT. The current task executes an IRET when the NT flag in the EFLAGS register is set. JMP, CALL, and IRET instructions, as well as interrupts and exceptions, are all mechanisms for redirecting a program. The referencing of a TSS descriptor or a task gate (when calling or jumping to a task) or the state of the NT flag (when executing an IRET instruction) determines whether a task switch occurs.
TASK MANAGEMENT 10. If the task switch was initiated with a CALL instruction, JMP instruction, an exception, or an interrupt, the processor sets the busy (B) flag in the new task’s TSS descriptor; if initiated with an IRET instruction, the busy (B) flag is left set. 11. Loads the task register with the segment selector and descriptor for the new task's TSS. 12. The TSS state is loaded into the processor.
TASK MANAGEMENT rules control access to a TSS, software does not need to perform explicit privilege checks on a task switch. Table 7-1 shows the exception conditions that the processor checks for when switching tasks. It also shows the exception that is generated for each check if an error is detected and the segment that the error code references. (The order of the checks in the table is the order used in the P6 family processors.
TASK MANAGEMENT Table 7-1. Exception Conditions Checked During a Task Switch (Contd.) Condition Checked Exception1 Error Code Reference2 DS, ES, FS, and GS segments are present in memory. #NP New Data Segment DS, ES, FS, and GS segment DPL greater than or equal to CPL (unless these are conforming segments). New Data Segment #TS NOTES: 1. #NP is segment-not-present exception, #GP is general-protection exception, #TS is invalid-TSS exception, and #SS is stack-fault exception. 2.
TASK MANAGEMENT Top Level Task Nested Task More Deeply Nested Task Currently Executing Task TSS TSS TSS EFLAGS NT=1 NT=1 NT=0 Previous Task Link Previous Task Link NT=1 Previous Task Link Task Register Figure 7-8. Nested Tasks Table 7-2 shows the busy flag (in the TSS segment descriptor), the NT flag, the previous task link field, and TS flag (in control register CR0) during a task switch. The NT flag may be modified by software executing at any privilege level.
TASK MANAGEMENT 7.4.1 Use of Busy Flag To Prevent Recursive Task Switching A TSS allows only one context to be saved for a task; therefore, once a task is called (dispatched), a recursive (or re-entrant) call to the task would cause the current state of the task to be lost. The busy flag in the TSS segment descriptor is provided to prevent re-entrant task switching and a subsequent loss of task state information. The processor manages the busy flag as follows: 1.
TASK MANAGEMENT In a multiprocessing system, additional synchronization and serialization operations must be added to this procedure to insure that the TSS and its segment descriptor are both locked when the previous task link field is changed and the busy flag is cleared. 7.5 TASK ADDRESS SPACE The address space for a task consists of the segments that the task can access.
TASK MANAGEMENT and the page tables point to different pages of physical memory, then the tasks do not share physical addresses. With either method of mapping task linear address spaces, the TSSs for all tasks must lie in a shared area of the physical space, which is accessible to all tasks. This mapping is required so that the mapping of TSS addresses does not change while the processor is reading and updating the TSSs during a task switch.
TASK MANAGEMENT shared LDT point to segments that are mapped to a common area of the physical address space, the data and code in those segments can be shared among the tasks that share the LDT. This method of sharing is more selective than sharing through the GDT, because the sharing can be limited to specific tasks. Other tasks in the system may have different LDTs that do not give them access to the shared segments.
TASK MANAGEMENT 15 0 Task LDT Selector 42 DS Selector 40 SS Selector 38 CS Selector ES Selector 36 34 DI 32 SI 30 BP 28 SP 26 BX 24 DX 22 CX 20 AX 18 FLAG Word 16 IP (Entry Point) 14 SS2 12 SP2 10 SS1 8 SP1 6 SS0 4 SP0 2 Previous Task Link 0 Figure 7-10. 16-Bit TSS Format 7.7 TASK MANAGEMENT IN 64-BIT MODE In 64-bit mode, task structure and task state are similar to those in protected mode.
TASK MANAGEMENT Although hardware task-switching is not supported in 64-bit mode, a 64-bit task state segment (TSS) must exist. Figure 7-11 shows the format of a 64-bit TSS. The TSS holds information important to 64-bit mode and that is not directly related to the task-switch mechanism. This information includes: • RSPn — The full 64-bit canonical forms of the stack pointers (RSP) for privilege levels 0-2. • • ISTn — The full 64-bit canonical forms of the interrupt stack table (IST) pointers.
TASK MANAGEMENT 31 0 15 Reserved I/O Map Base Address Reserved 96 Reserved 92 IST7 (upper 32 bits) 88 IST7 (lower 32 bits) 84 IST6 (upper 32 bits) 80 IST6 (lower 32 bits) 76 IST5 (upper 32 bits) 72 IST5 (lower 32 bits) 68 IST4 (upper 32 bits) 64 IST4 (lower 32 bits) 60 IST3 (upper 32 bits) 56 IST3 (lower 32 bits) 52 IST2 (upper 32 bits) 48 IST2 (lower 32 bits) 44 IST1 (upper 32 bits) 40 IST1 (lower 32 bits) 36 Reserved 32 Reserved 28 RSP2 (upper 32 bits) 24 RSP2 (lo
CHAPTER 8 MULTIPLE-PROCESSOR MANAGEMENT The Intel 64 and IA-32 architectures provide mechanisms for managing and improving the performance of multiple processors connected to the same system bus. These include: • Bus locking and/or cache coherency management for performing atomic operations on system memory. • Serializing instructions. These instructions apply only to the Pentium 4, Intel Xeon, P6 family, and Pentium processors.
MULTIPLE-PROCESSOR MANAGEMENT • To distribute interrupt handling among a group of processors — When several processors are operating in a system in parallel, it is useful to have a centralized mechanism for receiving interrupts and distributing them to available processors for servicing. • To increase system performance by exploiting the multi-threaded and multiprocess nature of contemporary operating systems and applications.
MULTIPLE-PROCESSOR MANAGEMENT software to manage the fairness of semaphores and exclusive locking functions. The mechanisms for handling locked atomic operations have evolved with the complexity of IA-32 processors. More recent IA-32 processors (such as the Pentium 4, Intel Xeon, and P6 family processors) and Intel 64 provide a more refined locking mechanism than earlier processors. These mechanisms are described in the following sections. 8.1.
MULTIPLE-PROCESSOR MANAGEMENT the hardware designer to make the LOCK# signal available in system hardware to control memory accesses among processors. For the P6 and more recent processor families, if the memory area being accessed is cached internally in the processor, the LOCK# signal is generally not asserted; instead, locking is only applied to the processor’s caches (see Section 8.1.4, “Effects of a LOCK Operation on Internal Processor Caches”). 8.1.2.
MULTIPLE-PROCESSOR MANAGEMENT 8.1.2.2 Software Controlled Bus Locking To explicitly force the LOCK semantics, software can use the LOCK prefix with the following instructions when they are used to modify a memory location. An invalidopcode exception (#UD) is generated when the LOCK prefix is used with any other instruction or when no write operation is made to memory (that is, when the destination operand is in a register). • • • • The bit test and modify instructions (BTS, BTR, and BTC).
MULTIPLE-PROCESSOR MANAGEMENT ence weakly ordered memory types (such as the WC memory type) may not be serialized. Locked instructions should not be used to insure that data written can be fetched as instructions. NOTE The locked instructions for the current versions of the Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors allow data written to be fetched as instructions.
MULTIPLE-PROCESSOR MANAGEMENT The act of one processor writing data into the currently executing code segment of a second processor with the intent of having the second processor execute that data as code is called cross-modifying code. As with self-modifying code, IA-32 processors exhibit model-specific behavior when executing cross-modifying code, depending upon how far ahead of the executing processors current execution pointer the code has been modified.
MULTIPLE-PROCESSOR MANAGEMENT have cached the same area of memory from simultaneously modifying data in that area. 8.2 MEMORY ORDERING The term memory ordering refers to the order in which the processor issues reads (loads) and writes (stores) through the system bus to system memory. The Intel 64 and IA-32 architectures support several memory-ordering models depending on the implementation of the architecture.
MULTIPLE-PROCESSOR MANAGEMENT among processors are explicitly required to obey program ordering through the use of appropriate locking or serializing operations (see Section 8.2.5, “Strengthening or Weakening the Memory-Ordering Model”). 8.2.
MULTIPLE-PROCESSOR MANAGEMENT • Locked instructions have a total order. See the example in Figure 8-1. Consider three processors in a system and each processor performs three writes, one to each of three defined locations (A, B, and C).
MULTIPLE-PROCESSOR MANAGEMENT 8.2.3 Examples Illustrating the Memory-Ordering Principles This section provides a set of examples that illustrate the behavior of the memoryordering principles introduced in Section 8.2.2. They are designed to give software writers an understanding of how memory ordering may affect the results of different sequences of instructions. These examples are limited to accesses to memory regions defined as write-back cacheable (WB). (Section 8.2.3.
MULTIPLE-PROCESSOR MANAGEMENT Section 8.2.3.2 through Section 8.2.3.7 give examples using the MOV instruction. The principles that underlie these examples apply to load and store accesses in general and to other instructions that load from or store to memory. Section 8.2.3.8 and Section 8.2.3.9 give examples using the XCHG instruction. The principles that underlie these examples apply to other locked read-modify-write instructions. This section uses the term “processor” is to refer to a logical processor.
MULTIPLE-PROCESSOR MANAGEMENT 8.2.3.3 Stores Are Not Reordered With Earlier Loads The Intel-64 memory-ordering model ensures that a store by a processor may not occur before a previous load by the same processor. This is illustrated by the following example: Example 8-2. Stores Are Not Reordered with Older Loads Processor 0 mov r1, [ _x] mov r2, [ _y] mov [ _y], 1 mov [ _x], 1 Processor 1 Initially x == y == 0 r1 == 1 and r2 == 1 is not allowed Assume r1 == 1.
MULTIPLE-PROCESSOR MANAGEMENT has the two loads occurring before the two stores. This would result in each load returning value 0. The fact that a load may not be reordered with an earlier store to the same location is illustrated by the following example: Example 8-4.
MULTIPLE-PROCESSOR MANAGEMENT 8.2.3.6 Stores Are Transitively Visible The memory-ordering model ensures transitive visibility of stores; stores that are causally related appear to all processors to occur in an order consistent with the causality relation. This is illustrated by the following example: Example 8-6.
MULTIPLE-PROCESSOR MANAGEMENT By the principles discussed in Section 8.2.3.2, • • • processor 2’s first and second load cannot be reordered, • Similarly, r3 == 1 and r4 == 0 imply that processor 1’s store appears to precede processor 0’s store with respect to processor 1. processor 3’s first and second load cannot be reordered. If r1 == 1 and r2 == 0, processor 0’s store appears to precede processor 1’s store with respect to processor 2.
MULTIPLE-PROCESSOR MANAGEMENT reader should note that reordering is prevented also if the locked instruction is executed after a load or a store. The first example illustrates that loads may not be reordered with earlier locked instructions: Example 8-9. Loads Are not Reordered with Locks Processor 0 xchg [ _x], r1 xchg [ _y], r3 mov r2, [ _y] mov r4, [ _x] Processor 1 Initially x == y == 0, r1 == r3 == 1 r2 == 0 and r4 == 0 is not allowed As explained in Section 8.2.3.
MULTIPLE-PROCESSOR MANAGEMENT 8.2.4 Out-of-Order Stores For String Operations The Intel Core 2 Duo, Intel Core, Pentium 4, and P6 family processors modify the processors operation during the string store operations (initiated with the MOVS and STOS instructions) to maximize performance. Once the “fast string” operations initial conditions are met (as described below), the processor will essentially operate on, from an external perspective, the string in a cache line by cache line mode.
MULTIPLE-PROCESSOR MANAGEMENT 2. Stores from separate string operations (for example, stores from consecutive string operations) do not execute out of order. All the stores from an earlier string operation will complete before any store from a later string operation. 3. String operations are not reordered with other store operations. Fast string operations (e.g. string operations initiated with the MOVS/STOS instructions and the REP prefix) may be interrupted by exceptions or interrupts.
MULTIPLE-PROCESSOR MANAGEMENT Example 8-11. Stores Within a String Operation May be Reordered Processor 0 Processor 1 Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x Initially [_x] to 511[_x]== 0, _x <= _y < _z < _x+512 r1 == 1 and r2 == 0 is allowed It is possible for processor 1 to perceive that the repeated string stores in processor 0 are happening out of order. We assume that fast string operations are enabled on processor 0.
MULTIPLE-PROCESSOR MANAGEMENT Processor 1 performs two read operations, the first read is from an address outside the 512-byte block but to be updated by processor 0, the second ready is from inside the block of memory of string operation. Example 8-13.
MULTIPLE-PROCESSOR MANAGEMENT Example 8-15. String Operations Are not Reordered with Earlier Stores Processor 0 Processor 1 mov [_z], $1 mov r1, [ _y] rep:stosd [ _x] mov r2, [ _z] Initially on processor 0: EAX == 1, ECX==128, ES:EDI ==_x Initially [_y] == [_z] == 0, [_x] to 511[_x]== 0, _x <= _y < _x+512, _z is a separate memory location r1 == 1 and r2 == 0 is not allowed 8.2.
MULTIPLE-PROCESSOR MANAGEMENT as the XCHG instruction or the LOCK prefix to insure that a read-modify-write operation on memory is carried out atomically. Locking operations typically operate like I/O operations in that they wait for all previous instructions to complete and for all buffered writes to drain to memory (see Section 8.1.2, “Bus Locking”). Program synchronization can also be carried out with serializing instructions (see Section 8.3).
MULTIPLE-PROCESSOR MANAGEMENT The PAT was introduced in the Pentium III processor to enhance the caching characteristics that can be assigned to pages or groups of pages. The PAT mechanism typically used to strengthen caching characteristics at the page level with respect to the caching characteristics established by the MTRRs. Table 11-7 shows the interaction of the PAT with the MTRRs.
MULTIPLE-PROCESSOR MANAGEMENT • Non-privileged serializing instructions — CPUID, IRET, and RSM. When the processor serializes instruction execution, it ensures that all pending memory transactions are completed (including writes stored in its store buffer) before it executes the next instruction. Nothing can pass a serializing instruction and a serializing instruction cannot pass any other instruction (read, write, instruction fetch, or I/O).
MULTIPLE-PROCESSOR MANAGEMENT execution is not deterministically serialized when a branch instruction is executed. 8.4 MULTIPLE-PROCESSOR (MP) INITIALIZATION The IA-32 architecture (beginning with the P6 family processors) defines a multipleprocessor (MP) initialization protocol called the Multiprocessor Specification Version 1.4. This specification defines the boot protocol to be used by IA-32 processors in multiple-processor systems. (Here, multiple processors is defined as two or more processors.
MULTIPLE-PROCESSOR MANAGEMENT 8.4.1 BSP and AP Processors The MP initialization protocol defines two classes of processors: the bootstrap processor (BSP) and the application processors (APs). Following a power-up or RESET of an MP system, system hardware dynamically selects one of the processors on the system bus as the BSP. The remaining processors are designated as APs.
MULTIPLE-PROCESSOR MANAGEMENT 8.4.3 MP Initialization Protocol Algorithm for Intel Xeon Processors Following a power-up or RESET of an MP system, the processors in the system execute the MP initialization protocol algorithm to initialize each of the logical processors on the system bus or coherent link domain. In the course of executing this algorithm, the following boot-up and initialization operations are carried out: 1. Each logical processor is assigned a unique APIC ID, based on system topology.
MULTIPLE-PROCESSOR MANAGEMENT • The newly established BSP broadcasts an FIPI message to “all including self,” which the BSP and APs treat as an end of MP initialization signal. Only the processor with its BSP flag set responds to the FIPI message. It responds by fetching and executing the BIOS boot-strap code, beginning at the reset vector (physical address FFFF FFF0H). 5.
MULTIPLE-PROCESSOR MANAGEMENT SVR APIC_ID LVT3 APIC_ENABLED BOOT_ID COUNT VACANT 8.4.4.1 EQU 0FEE000F0H EQU 0FEE00020H EQU 0FEE00370H EQU 0100H DD ? EQU 00H EQU 00H Typical BSP Initialization Sequence After the BSP and APs have been selected (by means of a hardware protocol, see Section 8.4.3, “MP Initialization Protocol Algorithm for Intel Xeon Processors”), the BSP begins executing BIOS boot-strap code (POST) at the normal IA-32 architecture starting address (FFFF FFF0H).
MULTIPLE-PROCESSOR MANAGEMENT mode address space (1-MByte space). For example, a vector of 0BDH specifies a start-up memory address of 000BD000H. 11. Enables the local APIC by setting bit 8 of the APIC spurious vector register (SVR). MOV ESI, SVR; Address of SVR MOV EAX, [ESI]; OR EAX, APIC_ENABLED; Set bit 8 to enable (0 on reset) MOV [ESI], EAX; 12. Sets up the LVT error handling entry by establishing an 8-bit vector for the APIC error handler.
MULTIPLE-PROCESSOR MANAGEMENT MOV EAX, 000C46XXH; Load ICR encoding from broadcast SIPI IP ; to all APs into EAX where xx is the vector computed in step 8. 16. Waits for the timer interrupt. 17. Reads and evaluates the COUNT variable and establishes a processor count. 18. If necessary, reconfigures the APIC and continues with the remaining system diagnostics as appropriate. 8.4.4.
MULTIPLE-PROCESSOR MANAGEMENT 8.4.5 Identifying Logical Processors in an MP System After the BIOS has completed the MP initialization protocol, each logical processor can be uniquely identified by its local APIC ID.
MULTIPLE-PROCESSOR MANAGEMENT during power-up and initialization is 8 bits. Bits 2:1 form a 2-bit physical package identifier (which can also be thought of as a socket identifier). In systems that configure physical processors in clusters, bits 4:3 form a 2-bit cluster ID. Bit 0 is used in the Intel Xeon processor MP to identify the two logical processors within the package (see Section 8.9.3, “Hierarchical ID of Logical Processors in an MP System”).
MULTIPLE-PROCESSOR MANAGEMENT 8.5 INTEL® HYPER-THREADING TECHNOLOGY AND INTEL® MULTI-CORE TECHNOLOGY Intel Hyper-Threading Technology and Intel multi-core technology are extensions to Intel 64 and IA-32 architectures that enable a single physical processor to execute two or more separate code streams (called threads) concurrently. In Intel HyperThreading Technology, a single processor core provides two logical processors that share execution resources (see Section 8.
MULTIPLE-PROCESSOR MANAGEMENT number of addressable IDs attributable to processor cores (Y) in the physical package. • Extended Processor Topology Enumeration parameters for 32-bit APIC ID: Intel 64 processors supporting CPUID leaf 0BH will assign unique APIC IDs to each logical processor in the system. CPUID leaf 0BH reports the 32-bit APIC ID and provide topology enumeration parameters. See CPUID instruction reference pages in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A.
MULTIPLE-PROCESSOR MANAGEMENT During initialization, each logical processor is assigned an APIC ID that is stored in the local APIC ID register for each logical processor. If two or more processors supporting Intel Hyper-Threading Technology are present, each logical processor on the system bus is assigned a unique ID (see Section 8.9.3, “Hierarchical ID of Logical Processors in an MP System”). Once logical processors have APIC IDs, software communicates with them by sending APIC IPI messages. 8.6.
MULTIPLE-PROCESSOR MANAGEMENT Intel Processor with Intel Intel Processor with Intel Hyper-Threading Technology Hyper-Threading Technology Logical Logical Processor 0 Processor 1 Logical Logical Processor 0 Processor 1 Processor Core Processor Core Local APIC Local APIC Local APIC Local APIC Bus Interface Bus Interface IPIs Interrupt Messages Interrupt Messages IPIs Interrupt Messages Bridge PCI I/O APIC External Interrupts System Chip Set Figure 8-3.
MULTIPLE-PROCESSOR MANAGEMENT Logical Processor 0 Architectural State Logical Processor 1 Architectural State Execution Engine Local APIC Local APIC Bus Interface System Bus Figure 8-4. IA-32 Processor with Two Logical Processors Supporting Intel HT Technology 8.7.1 State of the Logical Processors The following features are part of the architectural state of logical processors within Intel 64 or IA-32 processors supporting Intel Hyper-Threading Technology.
MULTIPLE-PROCESSOR MANAGEMENT • • Debug registers (DR0, DR1, DR2, DR3, DR6, DR7) and the debug control MSRs • • • Thermal clock modulation and ACPI Power management control MSRs • • Local APIC registers. Machine check global status (IA32_MCG_STATUS) and machine check capability (IA32_MCG_CAP) MSRs Time stamp counter MSRs Most of the other MSR registers, including the page attribute table (PAT). See the exceptions below.
MULTIPLE-PROCESSOR MANAGEMENT gives software a consistent view of memory, independent of the processor on which it is running. See Section 11.11, “Memory Type Range Registers (MTRRs),” for information on setting up MTRRs. 8.7.4 Page Attribute Table (PAT) Each logical processor has its own PAT MSR (IA32_PAT). However, as described in Section 11.12, “Page Attribute Table (PAT),” the PAT MSR settings must be the same for all processors in a system, including the logical processors. 8.7.
MULTIPLE-PROCESSOR MANAGEMENT 8.7.7 Performance Monitoring Counters Performance counters and their companion control MSRs are shared between the logical processors within a processor core for processors based on Intel NetBurst microarchitecture. As a result, software must manage the use of these resources. The performance counter interrupts, events, and precise event monitoring support can be set up and allocated on a per thread (per logical processor) basis. See Section 30.
MULTIPLE-PROCESSOR MANAGEMENT 8.7.11 MICROCODE UPDATE Resources In an Intel processor supporting Intel Hyper-Threading Technology, the microcode update facilities are shared between the logical processors; either logical processor can initiate an update. Each logical processor has its own BIOS signature MSR (IA32_BIOS_SIGN_ID at MSR address 8BH).
MULTIPLE-PROCESSOR MANAGEMENT As a consequence, the use of the WBINVD instruction can have an impact on interrupt/event response time. • INVD instruction — The entire cache hierarchy is invalidated without writing back modified data to memory. All logical processors are stopped from executing until after the invalidate operation is completed. A special bus cycle is sent to all caching agents.
MULTIPLE-PROCESSOR MANAGEMENT disabled on a logical processor basis. Typically, if software controlled clock modulation is going to be used, the feature must be enabled for all the logical processors within a physical processor and the modulation duty cycle must be set to the same value for each logical processor. If the duty cycle values differ between the logical processors, the processor clock will be modulated at the highest duty cycle selected. 8.7.13.
MULTIPLE-PROCESSOR MANAGEMENT 8.8 MULTI-CORE ARCHITECTURE This section describes the architecture of Intel 64 and IA-32 processors supporting dual-core and quad-core technology. The discussion is applicable to the Intel Pentium processor Extreme Edition, Pentium D, Intel Core Duo, Intel Core 2 Duo, Dual-core Intel Xeon processor, Intel Core 2 Quad processors, and quad-core Intel Xeon processors. Features vary across different microarchitectures and are detectable using CPUID.
MULTIPLE-PROCESSOR MANAGEMENT 8.8.3 Performance Monitoring Counters Performance counters and their companion control MSRs are shared between two logical processors sharing a processor core if the processor core supports Intel Hyper-Threading Technology and is based on Intel NetBurst microarchitecture. They are not shared between logical processors in different cores or different physical packages.
MULTIPLE-PROCESSOR MANAGEMENT provided for each logical processors (see Section 8.7, “Intel® Hyper-Threading Technology Architecture,” and Section 8.8, “Multi-Core Architecture”). From a software programming perspective, control transfer of processor operation is managed at the granularity of logical processor (operating systems dispatch a runnable task by allocating an available logical processor on the platform).
MULTIPLE-PROCESSOR MANAGEMENT X=31 if x2APIC is supported X Otherwise X= 7 Reserved 0 Cluster ID Package ID Core ID SMT ID Figure 8-5. Generalized Four level Interpretation of the APIC ID If the processor supports CPUID leaf 0BH, the 32-bit APIC ID can represent cluster plus several levels of topology within the physical processor package. The exact number of hierarchical levels within a physical processor package must be enumerated through CPUID leaf 0BH.
MULTIPLE-PROCESSOR MANAGEMENT 8.9.2 Hierarchical Mapping of CPUID Extended Topology Leaf CPUID leaf 0BH provides enumeration parameters for software to identify each hierarchy of the processor topology in a deterministic manner. Each hierarchical level of the topology starting from the SMT level is represented numerically by a sub-leaf index within the CPUID 0BH leaf. Each level of the topology is mapped to a sub-field in the APIC ID, following the general relationship depicted in Figure 8-6.
MULTIPLE-PROCESSOR MANAGEMENT For m = 0, m < N, m ++; { cumulative_width[m] = CPUID.(EAX=0BH, ECX= m): EAX[4:0]; } BitWidth[0] = cumulative_width[0]; For m = 1, m < N, m ++; BitWidth[m] = cumulative_width[m] - cumulative_width[m-1]; Currently, only the following encoding of hierarchical level type are defined: 0 (invalid), 1 (SMT), and 2 (core). Software must not assume any “level type“ encoding value to be related to any sub-leaf index, except sub-leaf 0.
MULTIPLE-PROCESSOR MANAGEMENT T0 T1 Core 0 T0 T1 T0 Core1 T1 Core 0 T0 Core1 Package 1 Package 0 T1 SMT_ID Core ID Package ID Figure 8-7. Topological Relationships between Hierarchical IDs in a Hypothetical MP Platform Table 8-1.
MULTIPLE-PROCESSOR MANAGEMENT Table 8-2. Initial APIC IDs for the Logical Processors in a System that has Two Physical Processors Supporting Dual-Core and Intel Hyper-Threading Technology Initial APIC ID Package ID Core ID SMT ID 0H 0H 0H 0H 1H 0H 0H 1H 2H 0H 1H 0H 3H 0H 1H 1H 4H 1H 0H 0H 5H 1H 0H 1H 6H 1H 1H 0H 7H 1H 1H 1H 8.9.3.
MULTIPLE-PROCESSOR MANAGEMENT Table 8-3. Example of Possible x2APIC ID Assignment in a System that has Two Physical Processors Supporting x2APIC and Intel Hyper-Threading Technology x2APIC ID Package ID Core ID SMT ID 15H 1H 2H 1H 16H 1H 3H 0H 17H 1H 3H 1H 8.9.
MULTIPLE-PROCESSOR MANAGEMENT a. Query the right-shift value for the SMT level of the topology using CPUID leaf 0BH with ECX =0H as input. The number of bits to shift-right on x2APIC ID (EAX[4:0]) can distinguish different higher-level entities above SMT (e.g. processor cores) in the same physical package. This is also the width of the bit mask to extract the SMT_ID. b. Query CPUID leaf 0BH for the amount of bit shift to distinguish next higherlevel entities (e.g.
MULTIPLE-PROCESSOR MANAGEMENT Example 8-18. Support Routines for Detecting Hardware Multi-Threading and Identifying the Relationships Between Package, Core and Logical Processors 1. // // // // // Detect support for Hardware Multi-Threading Support in a processor. Returns a non-zero value if CPUID reports the presence of hardware multi-threading support in the physical package where the current logical processor is located.
MULTIPLE-PROCESSOR MANAGEMENT int DeriveCore_Mask_Offsets (void) { if (!HWMTSupported()) return -1; execute cpuid with eax = 11, ECX = 0; while( ECX[15:8] ) { // level type encoding is valid If (returned level type encoding in ECX[15:8] matches CORE) { Mask_Core_shift = EAX[4:0]; // needed to distinguish different physical packages COREPlusSMT_MASK = ~( (-1) << Mask_Core_shift); CORE_MASK = COREPlusSMT_MASK ^ SMT_MASK; PACKAGE_MASK = (-1) << Mask_Core_shift; return 0 } ECX ++; execute cpuid with eax = 11;
MULTIPLE-PROCESSOR MANAGEMENT unsigned char MaxLPIDsPerPackage(void) { if (!HWMTSupported()) return 1; execute cpuid with eax = 1 store returned value of ebx return (unsigned char) ((reg_ebx & NUM_LOGICAL_BITS) >> 16); } b. Find the size of address space for processor cores in a physical processor package. // Returns the max number of addressable IDs for processor cores in a physical processor package; // Software should not assume cpuid reports this value to be a power of 2.
MULTIPLE-PROCESSOR MANAGEMENT // Returns the mask bit width of a bit field from the maximum count that bit field can represent. // This algorithm does not assume ‘address size’ to have a value equal to power of 2.
MULTIPLE-PROCESSOR MANAGEMENT Software must not assume local APIC_ID values in an MP system are consecutive. Non-consecutive local APIC_IDs may be the result of hardware configurations or debug features implemented in the BIOS or OS. An identifier for each hierarchical level can be extracted from an 8-bit APIC_ID using the support routines illustrated in Example 8-20. The appropriate bit mask and shift value to construct the appropriate bit mask for each level must be determined dynamically at runtime. 8.
MULTIPLE-PROCESSOR MANAGEMENT example also depicts a technique to construct a mask to represent the logical processors that reside in the same core. In Example 8-21, the numerical ID value can be obtained from the value extracted with the mask by shifting it right by shift count. Algorithms below do not shift the value. The assumption is that the SubID values can be compared for equivalence without the need to shift. Example 8-21.
MULTIPLE-PROCESSOR MANAGEMENT using OS specific APIs. // Allocate per processor arrays to store the Package_ID, Core_ID and SMT_ID for every started // processor. ThreadAffinityMask = 1; ProcessorNum = 0; while (ThreadAffinityMask != 0 && ThreadAffinityMask <= SystemAffinity) { // Check to make sure we can utilize this processor first.
MULTIPLE-PROCESSOR MANAGEMENT PackageProcessorMask[0] = ProcessorMask; For (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) { ProcessorMask << = 1; For (i=0; i < PackageNum; i++) { // we may be comparing bit-fields of logical processors residing in different // packages, the code below assume package symmetry If (PackageID[ProcessorNum] == PackageIDBucket[i]) { PackageProcessorMask[i] |= ProcessorMask; Break; // found in existing bucket, skip to next iteration } } if (i ==PackageNum) { //PA
MULTIPLE-PROCESSOR MANAGEMENT } if (i == CoreNum) { //Did not match any bucket, start new bucket CoreIDBucket[i] = PackageID[ProcessorNum] | CoreID[ProcessorNum]; CoreProcessorMask[i] = ProcessorMask; CoreNum++; } } // CoreNum has the number of cores started in the OS // CoreProcessorMask[] array has the processor set of each core Other processor relationships such as processor mask of sibling cores can be computed from set operations of the PackageProcessorMask[] and CoreProcessorMask[].
MULTIPLE-PROCESSOR MANAGEMENT 8.10.2 PAUSE Instruction The PAUSE instruction can improves the performance of processors supporting Intel Hyper-Threading Technology when executing “spin-wait loops” and other routines where one thread is accessing a shared lock or semaphore in a tight polling loop. When executing a spin-wait loop, the processor can suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation and flushes the core processor’s pipeline.
MULTIPLE-PROCESSOR MANAGEMENT 8.10.4 MONITOR/MWAIT Instruction Operating systems usually implement idle loops to handle thread synchronization. In a typical idle-loop scenario, there could be several “busy loops” and they would use a set of memory locations. An impacted processor waits in a loop and poll a memory location to determine if there is available work to execute. The posting of work is typically a write to memory (the work-queue of the waiting processor).
MULTIPLE-PROCESSOR MANAGEMENT Power management related events (such as Thermal Monitor 2 or chipset driven STPCLK# assertion) will not cause the monitor event pending flag to be cleared. Faults will not cause the monitor event pending flag to be cleared. Software should not allow for voluntary context switches in between MONITOR/MWAIT in the instruction flow. Note that execution of MWAIT does not rearm the monitor hardware. This means that MONITOR/MWAIT need to be executed in a loop.
MULTIPLE-PROCESSOR MANAGEMENT the two parameters should default to be the same (the size of the monitor triggering area is the same as the system coherence line size). Based on the monitor line sizes returned by the CPUID, the OS should dynamically allocate structures with appropriate padding. If static data structures must be used by an OS, attempt to adapt the data structure and use a dynamically allocated data buffer for thread synchronization.
MULTIPLE-PROCESSOR MANAGEMENT JE Get_Lock PAUSE ;Short delay JMP Spin_Lock Get_Lock: MOV EAX, 1 XCHG EAX, lockvar ;Try to get lock CMP EAX, 0 ;Test if successful JNE Spin_Lock Critical_Section: MOV lockvar, 0 ... Continue: The spin-wait loop above uses a “test, test-and-set” technique for determining the availability of the synchronization variable. This technique is recommended when writing spin-wait loops.
MULTIPLE-PROCESSOR MANAGEMENT // C1 handler uses a Halt instruction VOID C1Handler() { STI HLT } The MONITOR and MWAIT instructions may be considered for use in the C0 idle state loops, if MONITOR and MWAIT are supported. Example 8-25. An OS Idle Loop with MONITOR/MWAIT in the C0 Idle Loop // WorkQueue is a memory location indicating there is a thread // ready to run. A non-zero value for WorkQueue is assumed to // indicate the presence of work to be scheduled on the processor.
MULTIPLE-PROCESSOR MANAGEMENT } 8.10.6.3 Halt Idle Logical Processors If one of two logical processors is idle or in a spin-wait loop of long duration, explicitly halt that processor by means of a HLT instruction. In an MP system, operating systems can place idle processors into a loop that continuously checks the run queue for runnable software tasks.
MULTIPLE-PROCESSOR MANAGEMENT { MONITOR WorkQueue // Setup of eax with WorkQueue LinearAddress, // ECX, EDX = 0 IF (WorkQueue != 0) THEN { STI MWAIT // EAX, ECX = 0 } } 8.10.6.5 Guidelines for Scheduling Threads on Logical Processors Sharing Execution Resources Because the logical processors, the order in which threads are dispatched to logical processors for execution can affect the overall efficiency of a system. The following guidelines are recommended for scheduling threads for execution.
MULTIPLE-PROCESSOR MANAGEMENT • A high resolution timer within the processor (such as, the local APIC timer or the time-stamp counter). For additional information, see the Intel® 64 and IA-32 Architectures Optimization Reference Manual. 8.10.6.
MULTIPLE-PROCESSOR MANAGEMENT 8-74 Vol.
CHAPTER 9 PROCESSOR MANAGEMENT AND INITIALIZATION This chapter describes the facilities provided for managing processor wide functions and for initializing the processor. The subjects covered include: processor initialization, x87 FPU initialization, processor configuration, feature determination, mode switching, the MSRs (in the Pentium, P6 family, Pentium 4, and Intel Xeon processors), and the MTRRs (in the P6 family, Pentium 4, and Intel Xeon processors). 9.
PROCESSOR MANAGEMENT AND INITIALIZATION The software-initialization code performs all system-specific initialization of the BSP or primary processor and the system logic. At this point, for MP (or DP) systems, the BSP (or primary) processor wakes up each AP (or secondary) processor to enable those processors to execute self-configuration code. When all processors are initialized, configured, and synchronized, the BSP or primary processor begins executing an initial operating-system or executive task.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.
PROCESSOR MANAGEMENT AND INITIALIZATION Paging disabled: 0 Caching disabled: 1 Not write-through disabled: 1 Alignment check disabled: 0 Write-protect disabled: 0 31 30 29 28 P C N GDW 19 18 17 16 15 Reserved A M 6 5 4 3 2 1 0 W P N T E M P 1 E S MP E Reserved External x87 FPU error reporting: 0 (Not used): 1 No task switch: 0 x87 FPU instructions not trapped: 0 WAIT/FWAIT instructions not trapped: 0 Real-address mode: 0 Figure 9-1. Contents of CR0 Register after Reset 9.1.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.1.4 First Instruction Executed The first instruction that is fetched and executed following a hardware reset is located at physical address FFFFFFF0H. This address is 16 bytes below the processor’s uppermost physical address. The EPROM containing the softwareinitialization code must be located at this address. The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in real-address mode.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-2. Recommended Settings of EM and MP Flags on IA-32 Processors EM MP NE IA-32 processor 1 0 1 0 1 1 or 0* Pentium 4, Intel Xeon, P6 family, Pentium, Intel486™ DX, and Intel 487 SX processors, and Intel386 DX and Intel386 SX processors when a companion math coprocessor is present. 0 1 1 or 0* More recent Intel 64 or IA-32 processors Intel486™ SX, Intel386™ DX, and Intel386™ SX processors only, without the presence of a math coprocessor.
PROCESSOR MANAGEMENT AND INITIALIZATION • It allows x87 FPU code to run on an IA-32 processor that has neither an integrated x87 FPU nor is connected to an external math coprocessor, by using a floating-point emulator. • It allows floating-point code to be executed using a special or nonstandard floating-point emulator, selected for a particular application, regardless of whether an x87 FPU or math coprocessor is present.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.4 MODEL-SPECIFIC REGISTERS (MSRS) Most IA-32 processors (starting from Pentium processors) and Intel 64 processors contain a model-specific registers (MSRs). A given MSR may not be supported across all families and models for Intel 64 and IA-32 processors. Some MSRs are designated as architectural to simplify software programming; a feature introduced by an architectural MSR is expected to be supported in future processors.
PROCESSOR MANAGEMENT AND INITIALIZATION all the MTRRs must be cleared to 0, which selects the uncached (UC) memory type. See Section 11.11, “Memory Type Range Registers (MTRRs),” for detailed information on the MTRRs. 9.6 INITIALIZING SSE/SSE2/SSE3/SSSE3 EXTENSIONS For processors that contain SSE/SSE2/SSE3/SSSE3 extensions, steps must be taken when initializing the processor to allow execution of these instructions. 1.
PROCESSOR MANAGEMENT AND INITIALIZATION mode. The protected-mode data structures that must be loaded are described in Section 9.8, “Software Initialization for Protected-Mode Operation.” 9.7.1 Real-Address Mode IDT In real-address mode, the only system data structure that must be loaded into memory is the IDT (also called the “interrupt vector table”). By default, the address of the base of the IDT is physical address 0H.
PROCESSOR MANAGEMENT AND INITIALIZATION modules into memory to support reliable operation of the processor in protected mode. These data structures include the following: • • • • • • A IDT. • One or more code modules that contain the necessary interrupt and exception handlers. A GDT. A TSS. (Optional) An LDT. If paging is to be used, at least one page directory and one page table. A code segment that contains the code to be executed when the processor switches to protected mode.
PROCESSOR MANAGEMENT AND INITIALIZATION descriptors in the GDT. Some operating systems allocate new segments and LDTs as they are needed. This provides maximum flexibility for handling a dynamic programming environment. However, many operating systems use a single LDT for all tasks, allocating GDT entries in advance. An embedded system, such as a process controller, might pre-allocate a fixed number of segments and LDTs for a fixed number of application programs.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.8.4 Initializing Multitasking If the multitasking mechanism is not going to be used and changes between privilege levels are not allowed, it is not necessary load a TSS into memory or to initialize the task register. If the multitasking mechanism is going to be used and/or changes between privilege levels are allowed, software initialization code must load at least one TSS and an accompanying TSS descriptor.
PROCESSOR MANAGEMENT AND INITIALIZATION following instructions must be located in an identity-mapped page (until such time that a branch to non-identity mapped pages can be effected). 64-bit mode paging tables must be located in the first 4 GBytes of physical-address space prior to activating IA-32e mode. This is necessary because the MOV CR3 instruction used to initialize the page-directory base must be executed in legacy mode prior to activating IA-32e mode (setting CR0.PG = 1 to enable paging).
PROCESSOR MANAGEMENT AND INITIALIZATION 9.8.5.3 64-bit Mode and Compatibility Mode Operation IA-32e mode uses two code segment-descriptor bits (CS.L and CS.D, see Figure 3-8) to control the operating modes after IA-32e mode is initialized. If CS.L = 1 and CS.D = 0, the processor is running in 64-bit mode. With this encoding, the default operand size is 32 bits and default address size is 64 bits.
PROCESSOR MANAGEMENT AND INITIALIZATION from 64-bit mode through compatibility mode to legacy or real mode and then back through compatibility mode to 64-bit mode. 9.9 MODE SWITCHING To use the processor in protected mode after hardware or software reset, a mode switch must be performed from real-address mode. Once in protected mode, software generally does not need to return to real-address mode.
PROCESSOR MANAGEMENT AND INITIALIZATION 7. If a local descriptor table is going to be used, execute the LLDT instruction to load the segment selector for the LDT in the LDTR register. 8. Execute the LTR instruction to load the task register with a segment selector to the initial protected-mode task or to a writable area of memory that can be used to store TSS information on a task switch. 9. After entering protected mode, the segment registers continue to hold the contents they had in real-address mode.
PROCESSOR MANAGEMENT AND INITIALIZATION 4. Load segment registers SS, DS, ES, FS, and GS with a selector for a descriptor containing the following values, which are appropriate for real-address mode: — Limit = 64 KBytes (0FFFFH) — Byte granular (G = 0) — Expand up (E = 0) — Writable (W = 1) — Present (P = 1) — Base = any value 5. The segment registers must be loaded with non-null segment selectors or the segment registers will be unusable in real-address mode.
PROCESSOR MANAGEMENT AND INITIALIZATION • Load the system registers with the necessary pointers to the data structures and the appropriate flag settings for protected-mode operation. • Switch the processor to protected mode. Figure 9-3 shows the physical memory layout for the processor following a hardware reset and the starting point of this example.
PROCESSOR MANAGEMENT AND INITIALIZATION After Reset [CS.BASE+EIP] FFFF FFFFH FFFF FFF0H 64K EPROM EIP = 0000 FFF0H CS.BASE = FFFF 0000H DS.BASE = 0H ES.BASE = 0H SS.BASE = 0H ESP = 0H [SP, DS, SS, ES] FFFF 0000H 0 Figure 9-3. Processor State After Reset Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing STARTUP.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing (Contd.) STARTUP.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.10.2 STARTUP.ASM Listing Example 9-1 provides high-level sample code designed to move the processor into protected mode. This listing does not include any opcode and offset information. Example 9-1. STARTUP.ASM MS-DOS* 5.0(045-N) 386(TM) MACRO ASSEMBLER STARTUP PAGE 1 09:44:51 08/19/92 MS-DOS 5.0(045-N) 386(TM) MACRO ASSEMBLER V4.0, ASSEMBLY OF MODULE STARTUP OBJECT MODULE PLACED IN startup.obj ASSEMBLER INVOKED BY: f:\386tools\ASM386.EXE startup.
PROCESSOR MANAGEMENT AND INITIALIZATION 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 ; RAM_START will contain the linear address of the first ; free byte above the copied tables - this may be useful if ; a memory manager is used.
PROCESSOR MANAGEMENT AND INITIALIZATION 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 SS_reg DW ? SS_h DW ? DS_reg DW ? DS_h DW ? FS_reg DW ? FS_h DW ? GS_reg DW ? GS_h DW ? LDT_reg DW ? LDT_h DW ? TRAP_reg DW ? IO_map_baseDW ? TASK_STATE ENDS ; basic structure of a descriptor DESC STRUC lim_0_15 DW ? bas_0_15 DW ? bas_16_23 DB ? access DB ? gran DB ? bas_24_31 DB ? DESC ENDS ; structure for use with LGDT and
PROCESSOR MANAGEMENT AND INITIALIZATION 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 ; ------------------------- DATA SEGMENT---------------------; Initially, this data segment starts at linear 0, according ; to the processor’s power-up state.
PROCESSOR MANAGEMENT AND INITIALIZATION 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 ; DS,ES address the bottom 64K of flat linear memory ASSUME DS:STARTUP_DATA, ES:STARTUP_DATA ; See Figure 9-4 ; load GDTR with temporary GDT LEA EBX,TEMP_GDT ; build the TEMP_GDT in low ram, MOV DWORD PTR [EBX],0 ; where we can address MOV DWORD PTR [EBX]+4,0 MOV DWORD PTR [EBX]+8, LINEAR_PROTO_LO MOV DWORD PTR [EBX]+12, LINEAR_PROTO_HI MOV TEMP_GDT_scratch.
PROCESSOR MANAGEMENT AND INITIALIZATION 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 9-28 Vol. 3 MOV ADD MOV MOV MOVZX MOV INC MOV MOV ADD REP MOVS ; fixup MOV MOV ROR MOV MOV ECX, CS_BASE ECX, OFFSET (GDT_EPROM) ESI, [ECX].table_linear EDI,EAX ECX, [ECX].table_lim APP_GDT_ram[EBX].table_lim,CX ECX EDX,EAX APP_GDT_ram[EBX].
PROCESSOR MANAGEMENT AND INITIALIZATION 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 MOV MOV MOV MOV MOV MOV ROL MOV MOV LSL INC MOV ADD REP MOVS ; move the TSS EDI,EAX EBX,TSS_INDEX*SIZE(DESC) ECX,GDT_DESC_OFF ;build linear address for TSS GS,CX DH,GS:[EBX].bas_24_31 DL,GS:[EBX].bas_16_23 EDX,16 DX,GS:[EBX].
PROCESSOR MANAGEMENT AND INITIALIZATION 289 PUSH DWORD PTR [EDX].EIP_reg 290 MOV AX,[EDX].DS_reg 291 MOV BX,[EDX].ES_reg 292 MOV DS,AX ; DS and ES no longer linear memory 293 MOV ES,BX 294 295 ; simulate far jump to initial task 296 IRETD 297 298 STARTUP_CODE ENDS *** WARNING #377 IN 298, (PASS 2) SEGMENT CONTAINS PRIVILEGED INSTRUCTION(S) 299 300 END STARTUP, DS:STARTUP_DATA, SS:STARTUP_DATA 301 302 ASSEMBLY COMPLETE, 9-30 Vol. 3 1 WARNING, NO ERRORS.
PROCESSOR MANAGEMENT AND INITIALIZATION FFFF FFFFH START: [CS.BASE+EIP] FFFF 0000H • Jump near start • Construct TEMP_GDT • LGDT • Move to protected mode DS, ES = GDT[1] 4 GB Base Limit GDT [1] GDT [0] Base=0, Limit=4G 0 GDT_SCRATCH TEMP_GDT Figure 9-4. Constructing Temporary GDT and Switching to Protected Mode (Lines 162-172 of List File) Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION FFFF FFFFH TSS IDT GDT • Move the GDT, IDT, TSS from ROM to RAM • Fix Aliases • LTR TSS RAM IDT RAM GDT RAM RAM_START 0 Figure 9-5. Moving the GDT, IDT, and TSS from ROM to RAM (Lines 196-261 of List File) 9-32 Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION SS = TSS.SS ESP = TSS.ESP PUSH TSS.EFLAG PUSH TSS.CS PUSH TSS.EIP ES = TSS.ES DS = TSS.DS IRET • • EIP EFLAGS • • • ESP • ES CS SS DS GDT IDT Alias GDT Alias 0 TSS RAM IDT RAM GDT RAM RAM_START Figure 9-6. Task Switching (Lines 282-296 of List File) 9.10.3 MAIN.ASM Source Code The file MAIN.
PROCESSOR MANAGEMENT AND INITIALIZATION CODE SEGMENT ER use32 PUBLIC main_start: nop nop nop CODE ENDS END main_start, ds:data, ss:stack 9.10.4 Supporting Files The batch file shown in Example 9-3 can be used to assemble the source code files STARTUP.ASM and MAIN.ASM and build the final application. Example 9-3. Batch File to Assemble and Build the Application ASM386 STARTUP.ASM ASM386 MAIN.ASM BLD386 STARTUP.OBJ, MAIN.OBJ buildfile(EPROM.
PROCESSOR MANAGEMENT AND INITIALIZATION TABLE GDT ( LOCATION = GDT_EPROM , ENTRY = ( 10: PROTECTED_MODE_TASK , startup.startup_code , startup.startup_data , main_module.data , main_module.code , main_module.stack ) ), IDT ( LOCATION = IDT_EPROM ); MEMORY ( , , , ); RESERVE = (0..3FFFH -- Area for the GDT, IDT, TSS copied from ROM 60000H..0FFFEFFFFH) RANGE = (ROM_AREA = ROM (0FFFF0000H..0FFFFFFFFH)) -- Eprom size 64K RANGE = (RAM_AREA = RAM (4000H..
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-5. Relationship Between BLD Item and ASM Source File (Contd.) Item ASM386 and Startup.A58 BLD386 Controls and BLD file Effect RAM start RAM_START equ 400H memory (reserve = (0..3FFFH)) RAM_START is used as the ram destination for moving the tables. It must be excluded from the application's segment area.
PROCESSOR MANAGEMENT AND INITIALIZATION Update Loader New Update Update Blocks CPU BIOS Figure 9-7. Applying Microcode Updates 9.11.1 Microcode Update A microcode update consists of an Intel-supplied binary that contains a descriptive header and data. No executable code resides within the update. Each microcode update is tailored for a specific list of processor signatures. A mismatch of the processor’s signature with the signature contained in the update will result in a failure to load.
PROCESSOR MANAGEMENT AND INITIALIZATION NOTE The optional extended signature table is supported starting with processor family 0FH, model 03H. . Table 9-6. Microcode Update Field Definitions Field Name Offset (bytes) Length (bytes) Description Header Version 0 4 Version number of the update header. Update Revision 4 4 Unique version number for the update, the basis for the update signature provided by the processor to indicate the current update functioning within the processor.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-6. Microcode Update Field Definitions (Contd.) Field Name Offset (bytes) Length (bytes) Description Reserved 36 12 Reserved fields for future expansion Update Data 48 Data Size or 2000 Update data Extended Signature Count Data Size + 48 4 Specifies the number of extended signature structures (Processor Signature[n], processor flags[n] and checksum[n]) that exist in this microcode update.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-7.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.2 Optional Extended Signature Table The extended signature table is a structure that may be appended to the end of the encrypted data when the encrypted data only supports a single processor signature (optional case). The extended signature table will always be present when the encrypted data supports multiple processor steppings and/or models (required case).
PROCESSOR MANAGEMENT AND INITIALIZATION a processor signature embedded in the microcode update with the processor signature returned by CPUID will cause the BIOS to reject the update. Example 9-5 shows how to check for a valid processor signature match between the processor and microcode update. Example 9-5. Pseudo Code to Validate the Processor Signature ProcessorSignature ← CPUID(1):EAX If (Update.HeaderVersion == 00000001h) { // first check the ProcessorSignature field If (ProcessorSignature == Update.
PROCESSOR MANAGEMENT AND INITIALIZATION The three platform ID bits, when read as a binary coded decimal (BCD) number, indicate the bit position in the microcode update header’s processor flags field associated with the installed processor. The processor flags in the 48-byte header and the processor flags field associated with the extended processor signature structures may have multiple bits set. Each set bit represents a different platform ID that the update supports.
PROCESSOR MANAGEMENT AND INITIALIZATION } Else { // // Assume the Data Size has been used to calculate the // location of Update.ProcessorSignature[N] and a match // on Update.ProcessorSignature[N] has already succeeded // If (Update.ProcessorFlags[n] & Flag) { Load Update } } } 9.11.5 Microcode Update Checksum Each microcode update contains a DWORD checksum located in the update header. It is software’s responsibility to ensure that a microcode update is not corrupt.
PROCESSOR MANAGEMENT AND INITIALIZATION If (ChkSum == 00000000H) Success Else Fail 9.11.6 Microcode Update Loader This section describes an update loader used to load an update into a Pentium 4, Intel Xeon, or P6 family processor. It also discusses the requirements placed on the BIOS to ensure proper loading. The update loader described contains the minimal instructions needed to load an update.
PROCESSOR MANAGEMENT AND INITIALIZATION • ECX contains 79H (address of IA32_BIOS_UPDT_TRIG). Other requirements are: • If the update is loaded while the processor is in real mode, then the update data may not cross a segment boundary. • If the update is loaded while the processor is in real mode, then the update data may not exceed a segment limit. • • If paging is enabled, pages that are currently present must map the update data. The microcode update data requires a 16-byte boundary alignment.
PROCESSOR MANAGEMENT AND INITIALIZATION If processor core supports Intel Hyper-Threading Technology, the guideline described in Section 9.11.6.3 also applies. 9.11.6.5 Update Loader Enhancements The update loader presented in Section 9.11.6, “Microcode Update Loader,” is a minimal implementation that can be enhanced to provide additional functionality.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.7.1 Determining the Signature An update that is successfully loaded into the processor provides a signature that matches the update revision of the currently functioning revision. This signature is available any time after the actual update has been loaded. Requesting the signature does not have a negative impact upon a loaded update. The procedure for determining this signature shown in Example 9-9. Example 9-9.
PROCESSOR MANAGEMENT AND INITIALIZATION Example 9-10.
PROCESSOR MANAGEMENT AND INITIALIZATION There are no optional functions. BIOS must load the appropriate update for each processor during system initialization. A Header Version of an update block containing the value 0FFFFFFFFH indicates that the update block is unused and available for storing a new update. The BIOS is responsible for providing a region of non-volatile storage (NVRAM) for each potential processor stepping within a system. This storage unit consists of one or more update blocks.
PROCESSOR MANAGEMENT AND INITIALIZATION These requirements are checked by the BIOS during the execution of the write update function of this interface. The BIOS sequentially scans through all of the update blocks in NVRAM starting with index 0. The BIOS scans until it finds an update where the processor fields in the header match the processor signature (extended family, extended model, type, family, model, and stepping) as well as the platform bits of the current processor. Example 9-11.
PROCESSOR MANAGEMENT AND INITIALIZATION } } NOTES The platform Id bits in IA32_PLATFORM_ID are encoded as a threebit binary coded decimal field. The platform bits in the microcode update header are individually bit encoded. The algorithm must do a translation from one format to the other prior to doing a check. When performing the INT 15H, 0D042H functions, the BIOS must assume that the caller has no knowledge of platform specific requirements.
PROCESSOR MANAGEMENT AND INITIALIZATION Example 9-12.
PROCESSOR MANAGEMENT AND INITIALIZATION // Do we have enough update slots for all CPUs? // If there are more blocks required to support the unique processor steppings than update blocks provided by the BIOS exit // // Do we need any update blocks at all? If not, we are done // If (NumBlocks == 0) exit // // Record updates for processors in NVRAM.
PROCESSOR MANAGEMENT AND INITIALIZATION } // // Compare the Update read to that written // If (Update read != Update written) { Display Diagnostic exit } I ← I + (size of microcode update / 2048) } // // Enable Update Loading, and inform user // Issue the Update Control function with Task = Enable. 9.11.8.3 Microcode Update Functions Table 9-12 defines current Pentium 4, Intel Xeon, and P6 family processor microcode update functions. Table 9-12.
PROCESSOR MANAGEMENT AND INITIALIZATION In general, each function returns with CF cleared and AH contains the returned status. The general return codes and other constant definitions are listed in Section 9.11.8.9, “Return Codes.” The OEM error field (AL) is provided for the OEM to return additional error information specific to the platform. If the BIOS provides no additional information about the error, OEM error must be set to SUCCESS.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.8.6 Function 01H—Write Microcode Update Data This function integrates a new microcode update into the BIOS storage device. Table 9-14 lists the parameters and return codes for the function. Table 9-14. Parameters for the Write Update Data Function Input AX Function Code 0D042H BL Sub-function 01H - Write update ES:DI Update Address Real Mode pointer to the Intel Update structure.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-14. Parameters for the Write Update Data Function (Contd.) Input CPU_NOT_PRESENT The processor stepping does not currently exist in the system. INVALID_HEADER The update header contains a header or loader version that is not recognized by the BIOS. INVALID_HEADER_CS The update does not checksum correctly. SECURITY_FAILURE The processor rejected the update. INVALID_REVISION The same or more recent revision of the update exists in the storage device.
PROCESSOR MANAGEMENT AND INITIALIZATION Finally, before storing the proposed update in NVRAM, the BIOS must verify the authenticity of the update via the mechanism described in Section 9.11.6, “Microcode Update Loader.” This includes loading the update into the current processor, executing the CPUID instruction, reading MSR 08Bh, and comparing a calculated value with the update revision in the proposed update header for equality.
PROCESSOR MANAGEMENT AND INITIALIZATION Write Microcode Update Does Update Match A CPU in The System No Return CPU_NOT_PRESENT No Return INVALID_HEADER No Return INVALID_HEADER No Return INVALID_HEADER_CS Yes Valid Update Header Version? Yes Loader Revision Match BIOS’s Loader? Yes Does Update Checksum Correctly? 1 Figure 9-8. Microcode Update Write Operation Flow [1] 9-60 Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION 1 Update Matching CPU Already In NVRAM? No Space Available in NVRAM? Yes Yes Update Revision Newer Than NVRAM Update? No Return INVALID_REVISION Replacement No policy implemented? No Yes Return STORAGE_FULL Yes Update Pass Authenticity Test? Return SECURITY_FAILURE Yes Update NMRAM Record Return SUCCESS Figure 9-9. Microcode Update Write Operation Flow [2] Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.8.7 Function 02H—Microcode Update Control This function enables loading of binary updates into the processor. Table 9-15 lists the parameters and return codes for the function. Table 9-15. Parameters for the Control Update Sub-function Input AX Function Code 0D042H BL Sub-function 02H - Control update BH Task See the description below.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-16. Mnemonic Values Mnemonic Value Meaning Enable 1 Enable the Update loading at initialization time. Query 2 Determine the current state of the update control without changing its status. The READ_FAILURE error code returned by this function has meaning only if the control function is implemented in the BIOS NVRAM. The state of this feature (enabled/disabled) can also be implemented using CMOS RAM bits where READ failure errors cannot occur. 9.11.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-17. Parameters for the Read Microcode Update Data Function (Contd.) AL OEM Error Additional OEM Information Return Codes (see Table 9-18 for code definitions) SUCCESS The function completed successfully. READ_FAILURE There was a failure because of the inability to read the storage device. UPDATE_NUM_INVALID Update number exceeds the maximum number of update blocks implemented by the BIOS.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-18. Return Code Definitions Return Code Value Description SUCCESS 00H The function completed successfully. NOT_IMPLEMENTED 86H The function is not implemented. ERASE_FAILURE 90H A failure because of the inability to erase the storage device. WRITE_FAILURE 91H A failure because of the inability to write the storage device. READ_FAILURE 92H A failure because of the inability to read the storage device.
PROCESSOR MANAGEMENT AND INITIALIZATION 9-66 Vol.
CHAPTER 10 ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) The Advanced Programmable Interrupt Controller (APIC), referred to in the following sections as the local APIC, was introduced into the IA-32 processors with the Pentium processor (see Section 19.27, “Advanced Programmable Interrupt Controller (APIC)”) and is included in the P6 family, Pentium 4, Intel Xeon processors, and other more recent Intel 64 and IA-32 processor families (see Section 10.4.2, “Presence of the Local APIC”).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) interrupt pins (LINT0 and LINT1). The I/O devices may also be connected to an 8259-type interrupt controller that is in turn connected to the processor through one of the local interrupt pins. • Externally connected I/O devices — These interrupts originate as an edge or level asserted by an I/O device that is connected to the interrupt input pins of an I/O APIC.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) IPIs can be sent to other processors in the system or to the originating processor (self-interrupts). When the target processor receives an IPI message, its local APIC handles the message automatically (using information included in the message such as vector number and trigger mode). See Section 10.7, “Issuing Interprocessor Interrupts,” for a detailed explanation of the local APIC’s IPI message delivery and acceptance mechanism.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) also be delivered to the individual processors through the local interrupt pins; however, this mechanism is commonly not used in MP systems.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) The IPI mechanism is typically used in MP systems to send fixed interrupts (interrupts for a specific vector number) and special-purpose interrupts to processors on the system bus. For example, a local APIC can use an IPI to forward a fixed interrupt to another processor for servicing. Special-purpose IPIs (including NMI, INIT, SMI and SIPI IPIs) allow one or more processors on the system bus to perform systemwide boot-up and control functions.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) forward extendability for future Intel platform innovations. These extensions and modifications are noted in the following sections. 10.4 LOCAL APIC The following sections describe the architecture of the local APIC and how to detect it, identify it, and determine its status. Descriptions of how to program the local APIC are given in Section 10.6.1, “Local Vector Table,” and Section 10.7.1, “Interrupt Command Register (ICR).” 10.4.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) DATA/ADDR Version Register EOI Register Timer Task Priority Register Current Count Register Initial Count Register Processor Priority Register Divide Configuration Register Prioritizer INTA From CPU Core INTR To CPU Core EXTINT Local Vector Table Timer LINT0/1 Perf. Mon.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-1 shows how the APIC registers are mapped into the 4-KByte APIC register space. Registers are 32 bits, 64 bits, or 256 bits in width; all are aligned on 128-bit boundaries. All 32-bit registers should be accessed using 128-bit aligned 32-bit loads or stores. Some processors may support loads and stores of less than 32 bits to some of the APIC registers. This is model specific behavior and is not guaranteed to work on all processors.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-1 Local APIC Register Address Map (Contd.) Address Register Name Software Read/Write FEE0 00F0H Spurious Interrupt Vector Register Bits 0-8 Read/Write; bits 9-31 Read Only. FEE0 0100H In-Service Register (ISR); bits 0:31 Read Only. FEE0 0110H In-Service Register (ISR); bits 32:63 Read Only. FEE0 0120H In-Service Register (ISR); bits 64:95 Read Only. FEE0 0130H In-Service Register (ISR); bits 96:127 Read Only.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-1 Local APIC Register Address Map (Contd.) Address Register Name Software Read/Write FEE0 0310H Interrupt Command Register (ICR); bits 32-63 Read/Write. FEE0 0320H LVT Timer Register Read/Write. 2 FEE0 0330H LVT Thermal Sensor Register Read/Write. FEE0 0340H LVT Performance Monitoring Counters Register3 Read/Write. FEE0 0350H LVT LINT0 Register Read/Write. FEE0 0360H LVT LINT1 Register Read/Write.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 1. Using the APIC global enable/disable flag in the IA32_APIC_BASE MSR (MSR address 1BH; see Figure 10-5): — When IA32_APIC_BASE[11] is 0, the processor is functionally equivalent to an IA-32 processor without an on-chip APIC. The CPUID feature flag for the APIC (see Section 10.4.2, “Presence of the Local APIC”) is also set to 0.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) • APIC Global Enable flag, bit 11 ⎯ Enables or disables the local APIC (see Section 10.4.3, “Enabling or Disabling the Local APIC”). This flag is available in the Pentium 4, Intel Xeon, and P6 family processors. It is not guaranteed to be available or available at the same location in future Intel 64 or IA-32 processors. • APIC Base field, bits 12 through 35 ⎯ Specifies the base address of the APIC registers.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) this, operating system software should avoid writing to the local APIC ID register. The value returned by bits 31-24 of the EBX register (when the CPUID instruction is executed with a source operand value of 1 in the EAX register) is always the Initial APIC ID (determined by the platform initialization). This is true even if software has changed the value in the Local APIC ID register.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) x2APIC will introduce 32-bit ID; see Section 10.5. 10.4.7.1 Local APIC State After Power-Up or Reset Following a power-up or RESET of the processor, the state of local APIC and its registers are as follows: • The following registers are reset to all 0s: • • • IRR, ISR, TMR, ICR, LDR, and TPR Timer initial count and timer current count registers Divide configuration register • • • • The DFR register is reset to all 1s.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) • The mask bits for all the LVT entries are set. Attempts to reset these bits will be ignored. • (For Pentium and P6 family processors) The local APIC continues to listen to all bus messages in order to keep its arbitration ID synchronized with the rest of the system. 10.4.7.3 Local APIC State After an INIT Reset (“Wait-for-SIPI” State) An INIT reset of the processor can be initiated in either of two ways: • • By asserting the processor’s INIT# pin.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 24 23 Reserved 16 15 Max. LVT Entry 0 8 7 Reserved Version Value after reset: 000N 00VVH V = Version, N = # of LVT entries minus 1 Address: FEE0 0030H Figure 10-7. Local APIC Version Register 10.5 EXTENDED XAPIC (X2APIC) The x2APIC architecture extends the xAPIC architecture (described in Section 9.4) in a backward compatible manner and provides forward extendability for future Intel platform innovations.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 63 36 35 Reserved 12 11 10 9 8 7 0 APIC Base APIC Base—Base physical address EN—xAPIC global enable/disable EXTD—Enable x2APIC mode BSP—Processor is BSP Reserved Figure 10-8. IA32_APIC_BASE MSR Supporting x2APIC Table 10-2, “x2APIC operating mode configurations” describe the possible combinations of the enable bit (EN - bit 11) and the extended mode bit (EXTD - bit 10) in the IA32_APIC_BASE MSR. Table 10-2.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 32-bit register. Similarly executing the WRMSR instruction with the APIC register address in ECX, writes bits 0 to 31 of register EAX to bits 0 to 31 of the specified APIC register. If the register is a 64-bit register then bits 0 to 31 of register EDX are written to bits 32 to 63 of the APIC register. The Interrupt Command Register is the only APIC register that is implemented as a 64-bit MSR.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-3. Local APIC Register Address Map Supported by x2APIC (Contd.) MMIO Offset (xAPIC mode) MSR Offset (x2APIC mode) 0080H Register Name R/W Semantics 008H Task Priority Register (TPR) Read/Write. Bits 7:0 are RW. Bits 31:8 are Reserved. 0090H 009H Reserved 00A0H 00AH Processor Priority Register (PPR) Read only. 00B0H 00BH EOI Register Write only. 0 is the only valid value to write.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-3. Local APIC Register Address Map Supported by x2APIC (Contd.) MMIO Offset (xAPIC mode) MSR Offset (x2APIC mode) Register Name R/W Semantics 01F0H 01FH TMR bits 224:255 Read Only. 0200H 020H Interrupt Request Register (IRR); bits 0:31 Read Only. 0210H 021H IRR bits32:63 Read Only. 0220H 022H IRR bits 64:95 Read Only. 0230H 023H IRR bits 96:127 Read Only. 0240H 024H IRR bits 128:159 Read Only.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-3. Local APIC Register Address Map Supported by x2APIC (Contd.) MMIO Offset (xAPIC mode) MSR Offset (x2APIC mode) 03E0H 03EH Not supported 03FH 040H-3FFH Register Name R/W Semantics Divide Configuration Register (for Timer) Read/Write. SELF IPI4 Write only Comments Only in x2APIC mode Reserved NOTES: 1. Destination format register (DFR) is supported in xAPIC mode at MMIO offset 00E0H. 2.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) to enable BIOS and/or platform firmware to re-configure the x2APIC IDs in some clusters to provide for unique and non-overlapping system wide IDs before configuring the disconnected components into a single system. 10.5.2 x2APIC Register Availability The local APIC registers can be accessed via the MSR interface only when the local APIC has been switched to the x2APIC mode as described in Section 10.5.1.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) field, VM-exit MSR-load address filed, and VM-entry MSR-load address field in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B). The X2APIC MSRs cannot to be loaded and stored on VMX transitions. A VMX transition fails if the VMM has specified that the transition should access any MSRs in the address range from 0000_0800H to 0000_08FFH (the range used for accessing the X2APIC registers).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 12 11 0 9 8 7 Reserved EOI Broadcast Disable APIC Software Enable/Disable 0: APIC Disabled 1: APIC Enabled Spurious Vector MMIO Address: FEE0 00F0H MSR Address: 080FH Figure 10-9. Spurious Interrupt Vector Register (SVR) of x2APIC The default value for SVR[bit 12] is clear, indicating that an EOI broadcast will be performed. The support for Directed EOI capability can be detected by means of bit 24 in the Local APIC Version Register.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) • • • xAPIC mode: IA32_APIC_BASE[EN]=1 and IA32_APIC_BASE[EXTD]=0 x2APIC mode: IA32_APIC_BASE[EN]=1 and IA32_APIC_BASE[EXTD]=1 Invalid: IA32_APIC_BASE[EN]=0 and IA32_APIC_BASE[EXTD]=1 The state corresponding to EXTD=1 and EN=0 is not valid and it is not possible to get into this state. Values written to the IA32_APIC_BASE_MSR that attempt a transition from a valid state to this invalid state will cause a GP fault.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Reset Disabled EN = 0 Extd = 0 Init EN =1 xAPIC Mode EN=1, Extd=0 Reset Extd = 1 Init EN = 0 EN = 0 Extd = 0 Illegal Transition Extd = 1 Illegal Transition Extd = 1 Illegal Transition Extd = 0 Invalid State EN = 0 Illegal Transition EN = 0 Extended Mode EN=1, Extd=1 Reset Init Figure 10-11.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) x2APIC Transitions From x2APIC Mode From the x2APIC mode, the only valid x2APIC transition using IA32_APIC_BASE is to the state where the x2APIC is disabled by setting EN to 0 and EXTD to 0. The x2APIC ID (32 bits) and the legacy local xAPIC ID (8 bits) are preserved across this transition. A transition from the x2APIC mode to xAPIC mode is not valid and the corresponding WRMSR to the IA32_APIC_BASE MSR will raise a GP fault.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Support for the x2APIC architecture can be implemented in the local APIC unit. All existing PCI/MSI capable devices and IOxAPIC unit should work with the x2APIC extensions defined in this document. The x2APIC architecture also provides flexibility to cope with the underlying fabrics that connect the PCI devices, IOxAPICs and Local APIC units.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) The extended topology enumeration leaf is intended to assist software with enumerating processor topology on systems that requires 32-bit x2APIC IDs to address individual logical processors. For example, a system with greater than 256 logical processors or greater than 64 processor cores will require the OS to use 32-bit x2APIC IDs.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 10.6 HANDLING LOCAL INTERRUPTS The following sections describe facilities that are provided in the local APIC for handling local interrupts. These include: the processor’s LINT0 and LINT1 pins, the APIC timer, the performance-monitoring counters, the thermal sensor, and the internal APIC error detector.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 18 17 16 15 13 12 11 8 7 0 Timer Vector Address: FEE0 0320H Value after Reset: 0001 0000H Timer Mode 0: One-shot 1: Periodic Delivery Status 0: Idle 1: Send Pending Mask† 0: Not Masked 1: Masked Interrupt Input Pin Polarity Delivery Mode 000: Fixed 010: SMI 100: NMI 111: ExtlNT 101: INIT All other combinations are Reserved Remote IRR Trigger Mode 0: Edge 1: Level 31 17 11 10 8 7 0 LINT0 Vector LINT1 Vector Error Vector Performance
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) The setup information that can be specified in the registers of the LVT table is as follows: Vector Interrupt vector number. Delivery Mode Specifies the type of interrupt to be sent to the processor. Some delivery modes will only operate as intended when used in conjunction with a specific trigger mode. The allowable delivery modes are as follows: 000 (Fixed) Delivers the interrupt specified in the vector field.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Interrupt Input Pin Polarity Specifies the polarity of the corresponding interrupt pin: (0) active high or (1) active low. Remote IRR Flag (Read Only) For fixed mode, level-triggered interrupts; this flag is set when the local APIC accepts the interrupt for servicing and is reset when an EOI command is received from the processor. The meaning of this flag is undefined for edge-triggered interrupts and other delivery modes.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 10.6.3 Error Handling The local APIC provides an error status register (ESR) that it uses to record errors that it detects when handling interrupts (see Figure 10-13). An APIC error interrupt is generated when the local APIC sets one of the error bits in the ESR. The LVT error register allows selection of the interrupt vector to be delivered to the processor core when APIC error is detected.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-5. ESR Flags FLAG Function Send Checksum Error (P6 family and Pentium processors only) Set when the local APIC detects a checksum error for a message that it sent on the APIC bus. Receive Checksum Error (P6 family and Pentium processors only) Set when the local APIC detects a checksum error for a message that it received on the APIC bus.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) If the ICR is programmed with lowest priority delivery mode then the "Re-directible IPI" bit will be set in x2APIC modes (same as legacy xAPIC behavior) and the interrupt will not be processed. Write to the ICR with both lowest priority delivery mode and illegal vector, will set the "re-directible IPI" error bit. The interrupt will not be processed and hence the "Send Illegal Vector" error bit will not be set.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 4 3 2 1 0 31 Reserved Address: FEE0 03E0H Value after reset: 0H 0 Divide Value (bits 0, 1 and 3) 000: Divide by 2 001: Divide by 4 010: Divide by 8 011: Divide by 16 100: Divide by 32 101: Divide by 64 110: Divide by 128 111: Divide by 1 Figure 10-15. Divide Configuration Register 31 0 Initial Count Current Count Address: Initial Count FEE0 0380H Current Count FEE0 0390H Value after reset: 0H Figure 10-16.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 10.6.5 Local Interrupt Acceptance When a local interrupt is sent to the processor core, it is subject to the acceptance criteria specified in the interrupt acceptance flow chart in Figure 10-25. If the interrupt is accepted, it is logged into the IRR register and handled by the processor according to its priority (see Section 10.9.4, “Interrupt Acceptance for Fixed Interrupts”).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 63 56 55 32 Destination Field Reserved 31 20 19 18 17 16 15 14 13 12 11 10 Reserved Destination Shorthand 00: No Shorthand 01: Self 10: All Including Self 11: All Excluding Self Reserved 8 7 0 Vector Delivery Mode 000: Fixed 001: Lowest Priority1 010: SMI 011: Reserved 100: NMI 101: INIT 110: Start Up 111: Reserved Destination Mode 0: Physical 1: Logical Delivery Status 0: Idle 1: Send Pending Address: FEE0 0300H (0 - 31) FEE0 0310H (32 - 63)
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) ability for a processor to send a lowest priority IPI is model specific and should be avoided by BIOS and operating system software. 010 (SMI) Delivers an SMI interrupt to the target processor or processors. The vector field must be programmed to 00H for future compatibility. 011 (Reserved) 100 (NMI) Delivers an NMI interrupt to the target processor or processors. The vector information is ignored.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Destination Mode Selects either physical (0) or logical (1) destination mode (see Section 10.7.2, “Determining IPI Destination”). Delivery Status (Read Only) Indicates the IPI delivery status, as follows: 0 (Idle) There is currently no IPI activity for this local APIC, or the previous IPI sent from this local APIC was delivered and accepted by the target processor or processors.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) destination field set to FH for Pentium and P6 family processors and to FFH for Pentium 4 and Intel Xeon processors. 11: (All Excluding Self) The IPI is sent to all processors in a system with the exception of the processor sending the IPI. The APIC broadcasts a message with the physical destination mode and destination field set to 0xFH for Pentium and P6 family processors and to 0xFFH for Pentium 4 and Intel Xeon processors.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-6 Valid Combinations for the Pentium 4 and Intel Xeon Processors’ Local xAPIC Interrupt Command Register (Contd.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 10-7 Valid Combinations for the P6 Family Processors’ Local APIC Interrupt Command Register (Contd.) Destination Shorthand Valid/ Invalid All excluding Self All excluding Self Valid Valid Trigger Mode Edge 2 5 Delivery Mode All Modes Destination Mode 1 X 1 Level Fixed, Lowest Priority , NMI X All excluding Self Invalid Level SMI, Start-Up X All excluding Self Valid3 Level INIT X X Invalid5 Level SMI, Start-Up X NOTES: 1.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) ICR in xAPIC mode, except the Delivery Status bit is removed since it is not needed in x2APIC mode. The destination ID field is expanded to 32 bits in x2APIC mode.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 10.7.2 Determining IPI Destination The destination of an IPI can be one, all, or a subset (group) of the processors on the system bus. The sender of the IPI specifies the destination of an IPI with the following APIC registers and fields within the registers: • ICR Register — The following fields in the ICR register are used to specify the destination of an IPI: — Destination Mode — Selects one of two destination modes (physical or logical).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) APICs to be addressed on the APIC bus. A broadcast to all local APICs is specified with 0FH. NOTE The number of local APICs that can be addressed on the system bus may be restricted by hardware. 10.7.2.2 Logical Destination Mode In logical destination mode, IPI destination is specified using an 8-bit message destination address (MDA), which is entered in the destination field of the ICR.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 0 28 Model Reserved (All 1s) Flat model: 1111B Cluster model: 0000B Address: 0FEE0 00E0H Value after reset: FFFF FFFFH Figure 10-20. Destination Format Register (DFR) The interpretation of MDA for the two models is described in the following paragraphs. 1. Flat Model — This model is selected by programming DFR bits 28 through 31 to 1111.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) lowest priority delivery mode is not supported in cluster mode and must not be configured by software. The hierarchical cluster destination model can be used with Pentium 4, Intel Xeon, P6 family, or Pentium processors. With this model, a hierarchical network can be created by connecting different flat clusters via independent system or APIC buses.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) mode is not supported in the x2APIC mode. Hence the Destination Format Register (DFR) is eliminated in x2APIC mode. The 32-bit logical x2APIC ID field of LDR is partitioned into two sub-fields: • • Cluster ID (LDR[31:16]): is the address of the destination cluster Logical ID (LDR[15:0]): defines a logical ID of the individual local x2APIC within the cluster specified by LDR[31:16].
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 10.7.2.5 Broadcast/Self Delivery Mode The destination shorthand field of the ICR allows the delivery mode to be by-passed in favor of broadcasting the IPI to all the processors on the system bus and/or back to itself (see Section 10.7.1, “Interrupt Command Register (ICR)”). Three destination shorthands are supported: self, all excluding self, and all including self. The destination mode is ignored when a destination shorthand is used. 10.7.2.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Here, the TPR value is the task priority value in the TPR (see Figure 10-26), the IRRV value is the vector number for the highest priority bit that is set in the IRR (see Figure 10-28) or 00H (if no IRR bit is set), and the ISRV value is the vector number for the highest priority bit that is set in the ISR (see Figure 10-28).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) MSR Address: 083FH 31 8 7 Reserved 0 Vector Figure 10-23. SELF IPI register The SELF IPI register is a write-only register. A RDMSR instruction with address of the SELF IPI register will raise a GP fault. The handling and prioritization of a self-IPI sent via the SELF IPI register is architecturally identical to that for an IPI sent via the ICR from a legacy xAPIC unit.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) priorities of the local APICs by resetting Arb ID register of each agent to its current APIC ID value. (The Pentium 4 and Intel Xeon processors do not implement the Arb ID register.) Section 10.11, “APIC Bus Message Passing Mechanism and Protocol (P6 Family, Pentium Processors),” describes the APIC bus arbitration protocols and bus message formats, while Section 10.7.1, “Interrupt Command Register (ICR),” describes the INIT level de-assert IPI message.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 3. If the local APIC determines that it is the designated destination for the interrupt but the interrupt request is not one of the interrupts given in step 2, the local APIC sets the appropriate bit in the IRR. 4. When interrupts are pending in the IRR and ISR register, the local APIC dispatches them to the processor one at a time, based on their priority and the current task and processor priorities in the TPR and PPR (see Section 10.9.3.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Wait to Receive Bus Message No Discard Message Belong to Destination? Yes Is it NMI/SMI/INIT /ExtINT? Yes Accept Message No Fixed Delivery Lowes Priority P6 Family Processor Specific No Set Status to Retry Am I Focus? Is Interrupt Slot Available? Yes Yes Is Status a Retry? Yes Accept Message Yes Discard Message No No Other Focus? No Set Status to Retry No Accept Message Is Interrupt Slot Available? Yes No Arbitrate Am I Winner?
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) interrupt, or one of the MP protocol IPI messages (BIPI, FIPI, and SIPI), the interrupt is sent directly to the processor core for handling. 3. If the local APIC determines that it is the designated destination for the interrupt but the interrupt request is not one of the interrupts given in step 2, the local APIC looks for an open slot in one of its two pending interrupt queues contained in the IRR and ISR registers (see Figure 10-28).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) of vectors within a priority group, the vector number is often divided into two parts, with the high 4 bits of the vector indicating its priority and the low 4 bit indicating its ranking within the priority group. 10.9.3.1 Task and Processor Priorities The local APIC also defines a task priority and a processor priority that it uses in determining the order in which interrupts should be handled.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 8 7 4 3 0 Reserved Address: FEE0 00A0H Value after reset: 0H Processor Priority Processor Priority Sub-Class Figure 10-27. Processor Priority Register (PPR) Its value in the PPR is computed as follows: IF TPR[7:4] ≥ ISRV[7:4] THEN PPR[7:0] ← TPR[7:0] ELSE PPR[7:4] ← ISRV[7:4] PPR[3:0] ← 0 Here, the ISRV value is the vector number of the highest priority ISR bit that is set, or 00H if no ISR bit is set.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 255 16 15 0 Reserved IRR Reserved ISR Reserved TMR Addresses: IRR FEE0 0200H - FEE0 0270H ISR FEE0 0100H - FEE0 0170H TMR FEE0 0180H - FEE0 01F0H Value after reset: 0H Figure 10-28. IRR, ISR and TMR Registers The IRR contains the active interrupt requests that have been accepted, but not yet dispatched to the processor for servicing.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) bit is cleared for edge-triggered interrupts and set for level-triggered interrupts. If a TMR bit is set when an EOI cycle for its corresponding interrupt vector is generated, an EOI message is sent to all I/O APICs. 10.9.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) • Loading the TPR with a value of 8 (01000B) blocks all interrupts with a priority of 8 or less while allowing all interrupts with a priority of nine or more to be recognized. • • Loading the TPR with zero enables all external interrupts. Loading the TPR with 0F (01111B) disables all external interrupts. The TPR (shown in Figure 10-26) is cleared to 0 on reset.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) There are no ordering mechanisms between direct updates of the APIC.TPR and CR8. Operating software should implement either direct APIC TPR updates or CR8 style TPR updates but not mix them. Software can use a serializing instruction (for example, CPUID) to serialize updates between MOV CR8 and stores to the APIC. 10.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 10 9 8 7 0 Reserved Focus Processor Checking1 0: Enabled 1: Disabled APIC Software Enable/Disable 0: APIC Disabled 1: APIC Enabled Spurious Vector2 Address: FEE0 00F0H Value after reset: 0000 00FFH 1. Not supported in Pentium 4 and Intel Xeon processors. 2. For the P6 family and Pentium processors, bits 0 through 3 of the spurious vector are hardwired to 1. Figure 10-31. Spurious-Interrupt Vector Register (SVR) 10.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) the bus regardless of its sender’s arbitration priority, unless more than one APIC issues an EOI message simultaneously. In the latter case, the APICs sending the EOI messages arbitrate using their arbitration priorities. If the APICs are set up to use “lowest priority” arbitration (see Section 10.7.2.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 10.12.1 Message Address Register Format The format of the Message Address Register (lower 32-bits) is shown in Figure 10-32. 31 20 19 0FEEH 12 11 Destination ID 4 Reserved 3 2 RH DM 1 0 XX Figure 10-32. Layout of the MSI Message Address Register Fields in the Message Address Register are as follows: 1. Bits 31-20 — These bits contain a fixed value for interrupt messages (0FEEH).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) destination mode and only the processor in the system that has the matching APIC ID is considered for delivery of that interrupt (this means no re-direction). If RH is 1 and DM is 1, the Destination ID Field is interpreted as in logical destination mode and the redirection is limited to only those processors that are part of the logical group of processors based on the processor’s logical APIC ID and the Destination ID field in the message.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Reserved fields are not assumed to be any value. Software must preserve their contents on writes. Other fields in the Message Data Register are described below. 1. Vector — This 8-bit field contains the interrupt vector associated with the message. Values range from 010H to 0FEH. Software must guarantee that the field is not programmed with vector 00H to 0FH. 2. Delivery Mode — This 3-bit field specifies how the interrupt receipt is handled.
CHAPTER 11 MEMORY CACHE CONTROL This chapter describes the memory cache and cache control mechanisms, the TLBs, and the store buffer in Intel 64 and IA-32 processors. It also describes the memory type range registers (MTRRs) introduced in the P6 family processors and how they are used to control caching of physical memory locations. 11.
MEMORY CACHE CONTROL Instruction Decoder and front end ITLB Instruction Cache Chipset Out-of-Order Engine QPI Data TLB Data Cache Unit (L1) STLB IMC L2 Cache L3 Cache Figure 11-2. Cache Structure of the Intel Core i7 Processors Figure 11-2 shows the cache arrangement of Intel Core i7 processor. Table 11-1.
MEMORY CACHE CONTROL Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and Write Combining Buffer in Intel 64 and IA-32 Processors (Contd.) Cache or Buffer Characteristics L1 Data Cache • Pentium 4 and Intel Xeon processors (Based on Intel NetBurst microarchitecture): 8-KByte, 4-way set associative, 64-byte cache line size. • Pentium 4 and Intel Xeon processors (Based on Intel NetBurst microarchitecture): 16-KByte, 8-way set associative, 64-byte cache line size.
MEMORY CACHE CONTROL Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and Write Combining Buffer in Intel 64 and IA-32 Processors (Contd.) Cache or Buffer Characteristics Instruction TLB (4-KByte Pages) • Pentium 4 and Intel Xeon processors (Based on Intel NetBurst microarchitecture): 128 entries, 4-way set associative. • Intel Atom processors: 32-entries, fully associative. • Intel Core i7 processor: 64-entries per thread (128-entries per core), 4way set associative.
MEMORY CACHE CONTROL Table 11-1. Characteristics of the Caches, TLBs, Store Buffer, and Write Combining Buffer in Intel 64 and IA-32 Processors (Contd.) Cache or Buffer Characteristics Store Buffer • • • • • • • Intel Core i7 processors: 32entries. Intel Core 2 Duo processors: 20 entries. Intel Atom processors: 8 entries, used for both WC and store buffers. Pentium 4 and Intel Xeon processors: 24 entries. Pentium M processor: 16 entries. P6 family processors: 12 entries.
MEMORY CACHE CONTROL • Pentium 4 and Intel Xeon processors Based on Intel NetBurst microarchitecture — The trace cache caches decoded instructions (μops) from the instruction decoder and the L1 cache contains data. The L2 and L3 caches are unified data and instruction caches located on the processor chip. Dualcore processors have two L2, one in each processor core. Note that the L3 cache is only implemented on some Intel Xeon processors.
MEMORY CACHE CONTROL Processors based on Intel Core microarchitectures implement one level of instruction TLB and two levels of data TLB. Intel Core i7 processor provides a second-level unified TLB. The store buffer is associated with the processors instruction execution units. It allows writes to system memory and/or the internal caches to be saved and in some cases combined to optimize the processor’s bus accesses. The store buffer is always enabled in all execution modes.
MEMORY CACHE CONTROL (depending on the write policy currently in force) can also write it out to memory. If the operand is to be written out to memory, it is written first into the store buffer, and then written from the store buffer to memory when the system bus is available. (Note that for the Pentium processor, write misses do not result in a cache line fill; they always result in a write to memory. For this processor, only read misses result in cache line fills.
MEMORY CACHE CONTROL registers to access UC memory that may have read or write side effects. Table 11-2. Memory Types and Their Properties Memory Type and Mnemonic Cacheable Writeback Allows Cacheable Speculative Reads Memory Ordering Model Strong Uncacheable (UC) No No No Strong Ordering Uncacheable (UC-) No No No Strong Ordering. Can only be selected through the PAT. Can be overridden by WC in MTRRs. Write Combining (WC) No No Yes Weak Ordering.
MEMORY CACHE CONTROL possible) and through to system memory. When writing through to memory, invalid cache lines are never filled, and valid cache lines are either filled or invalidated. Write combining is allowed. This type of cache-control is appropriate for frame buffers or when there are devices on the system bus that access system memory, but do not perform snooping of memory accesses. It enforces coherency between caches in the processors and system memory.
MEMORY CACHE CONTROL Table 11-3. Methods of Caching Available in Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4, Intel Xeon, P6 Family, and Pentium Processors (Contd.) Memory Type Intel Core 2 Duo, Intel Atom, Intel Core Duo, Pentium M, Pentium 4 and Intel Xeon Processors P6 Family Processors Pentium Processor NOTE: * Introduced in the Pentium III processor; not available in the Pentium Pro or Pentium II processors 11.3.
MEMORY CACHE CONTROL The WC memory type is weakly ordered by definition. Once the eviction of a WC buffer has started, the data is subject to the weak ordering semantics of its definition. Ordering is not maintained between the successive allocation/deallocation of WC buffers (for example, writes to WC buffer 1 followed by writes to WC buffer 2 may appear as buffer 2 followed by buffer 1 on the system bus).
MEMORY CACHE CONTROL large data structure should be marked as uncacheable, or reading it will evict cached lines that the processor will be referencing again. A similar example would be a write-only data structure that is written to (to export the data to another agent), but never read by software.
MEMORY CACHE CONTROL Table 11-4. MESI Cache Line States Cache Line State M (Modified) E (Exclusive) S (Shared) I (Invalid) This cache line is valid? Yes Yes Yes No The memory copy is… Out of date Valid Valid — Copies exist in caches of other processors? No No Maybe Maybe A write to this line … Does not go to Does not go to Causes the the system bus. the system bus. processor to gain exclusive ownership of the line. Goes directly to the system bus.
MEMORY CACHE CONTROL 11.5.1 Cache Control Registers and Bits Figure 11-3 depicts cache-control mechanisms in IA-32 processors. Other than for the matter of memory address space, these work the same in Intel 64 processors. The Intel 64 and IA-32 architectures provide the following cache-control registers and bits for use in enabling or restricting caching to various pages or regions in memory: • CD flag, bit 30 of control register CR0 — Controls caching of system memory locations (see Section 2.
MEMORY CACHE CONTROL CR4 P G E Enables global pages designated with G flag CR3 P P C W D T CR0 C N D W CD and NW Flags control overall caching of system memory Physical Memory FFFFFFFFH2 PAT4 Control caching of page directory PAT controls caching of virtual memory pages Page-Directory or Page-Table Entry P P P4 A G1 C W T D T MTRRs3 PCD and PWT flags control page-level caching G flag controls pagelevel flushing of TLBs 0 MTRRs control caching of selected regions of physical memory Store Buffe
MEMORY CACHE CONTROL Table 11-5. Cache Operating Modes CD NW 0 0 0 1 L1 L2/L31 • Read hits access the cache; read misses may cause replacement. • Write hits update the cache. • Only writes to shared lines and write misses update system memory. Yes Yes Yes Yes Yes Yes • Write misses cause cache line fills. • Write hits can change shared lines to modified under control of the MTRRs and with associated read invalidation cycle. • (Pentium processor only.) Write misses do not cause cache line fills.
MEMORY CACHE CONTROL Table 11-5. Cache Operating Modes CD NW 1 1 L1 L2/L31 • (P6 family and Pentium processors.) State of the processor after a power up or reset. • Read hits access the cache; read misses do not cause replacement. • Write hits update the cache and change exclusive lines to modified. Yes Yes Yes Yes Yes Yes • Shared lines remain shared after write hit. • Write misses access memory. • Invalidation is inhibited when snooping; but is allowed with INVD and WBINVD instructions.
MEMORY CACHE CONTROL corrupt addresses. • PCD flag in the page-directory and page-table entries — Controls caching for individual page tables and pages, respectively (see Section 4.9, “Paging and Memory Typing”). This flag only has effect when paging is enabled and the CD flag in control register CR0 is clear. The PCD flag enables caching of the page table or page when clear and prevents caching when set.
MEMORY CACHE CONTROL page-table entries) permit caching in an external L2 cache to be controlled on a page-by-page basis, consistent with the control exercised on the L1 cache of these processors. The P6 and more recent processor families do not provide these pins because the L2 cache in internal to the chip package. 11.5.2 Precedence of Cache Controls The cache control flags and MTRRs operate hierarchically for restricting caching.
MEMORY CACHE CONTROL Table 11-6. Effective Page-Level Memory Type for Pentium Pro and Pentium II Processors MTRR Memory Type1 PCD Value PWT Value Effective Memory Type UC X X UC WC 0 0 WC 0 1 WC 1 0 WC 1 1 UC 0 X WT 1 X UC 0 0 WP 0 1 WP 1 0 WC 1 1 UC 0 0 WB 0 1 WT 1 X UC WT WP WB NOTE: 1.
MEMORY CACHE CONTROL 11.5.2.2 Selecting Memory Types for Pentium III and More Recent Processor Families The Intel Core 2 Duo, Intel Atom, Intel Core Duo, Intel Core Solo, Pentium M, Pentium 4, Intel Xeon, and Pentium III processors use the PAT to select effective page-level memory types. Here, a memory type for a page is selected by the MTRRs and the value in a PAT entry that is selected with the PAT, PCD and PWT bits in a page-table or page-directory entry (see Section 11.12.
MEMORY CACHE CONTROL Table 11-7. Effective Page-Level Memory Types for Pentium III and More Recent Processor Families (Contd.) MTRR Memory Type PAT Entry Value Effective Memory Type WB UC UC2 UC- UC2 WP WC WC WT WT WB WB WP WP UC UC2 UC- WC3 WC WC WT WT3 WB WP WP WP NOTES: 1. The UC attribute comes from the MTRRs and the processors are not required to snoop their caches since the data could never have been cached. This attribute is preferred for performance reasons. 2.
MEMORY CACHE CONTROL 11.5.3 Preventing Caching To disable the L1, L2, and L3 caches after they have been enabled and have received cache fills, perform the following steps: 1. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0. 2. Flush all caches using the WBINVD instruction. 3.
MEMORY CACHE CONTROL 11.5.4 Disabling and Enabling the L3 Cache On processors based on Intel NetBurst microarchitecture, the third-level cache can be disabled by bit 6 of the IA32_MISC_ENABLE MSR. The third-level cache disable flag (bit 6 of the IA32_MISC_ENABLE MSR) allows the L3 cache to be disabled and enabled, independently of the L1 and L2 caches.
MEMORY CACHE CONTROL The CLFLUSH instruction allow selected cache lines to be flushed from memory. This instruction give a program the ability to explicitly free up cache space, when it is known that cached section of system memory will not be accessed in the near future. The non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD) allow data to be moved from the processor’s registers directly into system memory without being also written into the L1, L2, and/or L3 caches.
MEMORY CACHE CONTROL on the Intel NetBurst microarchitecture that support Intel Hyper-Threading Technology. 11.6 SELF-MODIFYING CODE A write to a memory location in a code segment that is currently cached in the processor causes the associated cache line (or lines) to be invalidated. This check is based on the physical address of the instruction. In addition, the P6 family and Pentium processors check whether a write to a code segment may modify an instruction that has been prefetched for execution.
MEMORY CACHE CONTROL To avoid problems related to implicit caching, the operating system must explicitly invalidate the cache when changes are made to cacheable data that the cache coherency mechanism does not automatically handle. This includes writes to dual-ported or physically aliased memory boards that are not detected by the snooping mechanisms of the processor, and changes to page- table entries in memory. The code in Example 11-1 shows the effect of implicit caching on page-table entries.
MEMORY CACHE CONTROL 11.9 INVALIDATING THE TRANSLATION LOOKASIDE BUFFERS (TLBS) The processor updates its address translation caches (TLBs) transparently to software. Several mechanisms are available, however, that allow software and hardware to invalidate the TLBs either explicitly or as a side effect of another operation. Most details are given in Section 4.10.3, “Invalidation of TLBs and Paging-Structure Caches.
MEMORY CACHE CONTROL The discussion of write ordering in Section 8.2, “Memory Ordering,” gives a detailed description of the operation of the store buffer. 11.11 MEMORY TYPE RANGE REGISTERS (MTRRS) The following section pertains only to the P6 and more recent processor families. The memory type range registers (MTRRs) provide a mechanism for associating the memory types (see Section 11.3, “Methods of Caching Available”) with physicaladdress ranges in system memory.
MEMORY CACHE CONTROL Table 11-8. Memory Types That Can Be Encoded in MTRRs (Contd.) Reserved* 03H Write-through (WT) 04H Write-protected (WP) 05H Writeback (WB) 06H Reserved* 7H through FFH NOTE: * Use of these encodings results in a general-protection exception (#GP).
MEMORY CACHE CONTROL 11.11.1 MTRR Feature Identification The availability of the MTRR feature is model-specific. Software can determine if MTRRs are supported on a processor by executing the CPUID instruction and reading the state of the MTRR flag (bit 12) in the feature information register (EDX). If the MTRR flag is set (indicating that the processor implements MTRRs), additional information about MTRRs can be obtained from the 64-bit IA32_MTRRCAP MSR (named MTRRcap MSR for the P6 family processors).
MEMORY CACHE CONTROL 11.11.2 Setting Memory Ranges with MTRRs The memory ranges and the types of memory specified in each range are set by three groups of registers: the IA32_MTRR_DEF_TYPE MSR, the fixed-range MTRRs, and the variable range MTRRs. These registers can be read and written to using the RDMSR and WRMSR instructions, respectively. The IA32_MTRRCAP MSR indicates the availability of these registers on the processor (see Section 11.11.1, “MTRR Feature Identification”). 11.11.2.
MEMORY CACHE CONTROL memory. When this flag is set, the FE flag can disable the fixed-range MTRRs; when the flag is clear, the FE flag has no affect. When the E flag is set, the type specified in the default memory type field is used for areas of memory not already mapped by either a fixed or variable MTRR. Bits 8 and 9, and bits 12 through 63, in the IA32_MTRR_DEF_TYPE MSR are reserved; the processor generates a general-protection exception (#GP) if software attempts to write nonzero values to them. 11.
MEMORY CACHE CONTROL Table 11-9.
MEMORY CACHE CONTROL — The width of the PhysMask field depends on the maximum physical address size supported by the processor. CPUID.80000008H reports the maximum physical address size supported by the processor. If CPUID.80000008H is not available, software may assume that the processor supports a 36-bit physical address size (then PhysMask is 24 bits wide and the upper 28 bits of IA32_MTRR_PHYSMASKn are reserved). See the Note below.
MEMORY CACHE CONTROL NOTE It is possible for software to parse the memory descriptions that BIOS provides by using the ACPI/INT15 e820 interface mechanism. This information then can be used to determine how MTRRs are initialized (for example: allowing the BIOS to define valid memory ranges and the maximum memory range supported by the platform, including the processor). See Section 11.11.4.1, “MTRR Precedences,” for information on overlapping variable MTRR ranges. 11.11.2.
MEMORY CACHE CONTROL Before attempting to access these SMRR registers, software must test bit 11 in the IA32_MTRRCAP register. If SMRR is not supported, reads from or writes to registers cause general-protection exceptions. When the valid flag in the IA32_SMRR_PHYSMASK MSR is 1, accesses to the specified address range are treated as follows: • If the logical processor is in SMM, accesses uses the memory type in the IA32_SMRR_PHYSBASE MSR.
MEMORY CACHE CONTROL 3FFFFFH (2 MBytes to 4 MBytes), a mask value of FFFE00000H is required. Again, the 12 least-significant bits of this mask value are truncated, so that the value entered in the PhysMask field of IA32_MTRR_PHYSMASK3 is FFFE00H. This mask is chosen so that when any address in the 200000H to 3FFFFFH range is AND’d with the mask value, it will return the same value as when the base address is AND’d with the mask value (which is 200000H).
MEMORY CACHE CONTROL IA32_MTRR_PHYSBASE5 = 0000 0000 A000 0001H IA32_MTRR_PHYSMASK5 = 0000 000F FF80 0800H Caches A0000000-A0800000 as WC type. This MTRR setup uses the ability to overlap any two memory ranges (as long as the ranges are mapped to WB and UC memory types) to minimize the number of MTRR registers that are required to configure the memory environment. This setup also fulfills the requirement that two register pairs are left for operating system usage. 11.11.3.
MEMORY CACHE CONTROL 11.11.4 Range Size and Alignment Requirement A range that is to be mapped to a variable-range MTRR must meet the following “power of 2” size and alignment rules: 1. The minimum range size is 4 KBytes and the base address of the range must be on at least a 4-KByte boundary. 2. For ranges greater than 4 KBytes, each range must be of length 2n and its base address must be aligned on a 2n boundary, where n is a value equal to or greater than 12.
MEMORY CACHE CONTROL the MTRRs according to known types of memory, including memory on devices that it auto-configures. Initialization is expected to occur prior to booting the operating system. See Section 11.11.8, “MTRR Considerations in MP Systems,” for information on initializing MTRRs in MP (multiple-processor) systems. 11.11.
MEMORY CACHE CONTROL automatically aligns the base address and size to 4-KByte boundaries. Pseudocode for the MemTypeGet() function is given in Example 11-4. Example 11-4. MemTypeGet() Pseudocode #define MIXED_TYPES -1 /* 0 < MIXED_TYPES || MIXED_TYPES > 256 */ IF CPU_FEATURES.MTRR /* processor supports MTRRs */ THEN Align BASE and SIZE to 4-KByte boundary; IF (BASE + SIZE) wrap 4-GByte address space THEN return INVALID; FI; IF MTRRdefType.
MEMORY CACHE CONTROL Example 11-5. Get4KMemType() Pseudocode IF IA32_MTRRCAP.FIX AND MTRRdefType.FE /* fixed registers enabled */ THEN IF PHY_ADDRESS is within a fixed range return IA32_MTRR_FIX.Type; FI; FOR each variable-range MTRR in IA32_MTRRCAP.VCNT IF IA32_MTRR_PHYSMASK.V = 0 THEN continue; FI; IF (PHY_ADDRESS AND IA32_MTRR_PHYSMASK.Mask) = (IA32_MTRR_PHYSBASE.Base AND IA32_MTRR_PHYSMASK.Mask) THEN return IA32_MTRR_PHYSBASE.Type; FI; ROF; return MTRRdefType.Type; 11.11.7.
MEMORY CACHE CONTROL THEN pre_mtrr_change(); update affected MTRR; post_mtrr_change(); FI; ELSE (* try to map using a variable MTRR pair *) IF IA32_MTRRCAP.
MEMORY CACHE CONTROL END The physical address to variable range mapping algorithm in the MemTypeSet function detects conflicts with current variable range registers by cycling through them and determining whether the physical address in question matches any of the current ranges. During this scan, the algorithm can detect whether any current variable ranges overlap and can be concatenated into a single range.
MEMORY CACHE CONTROL 4. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0.) 5. Flush all caches using the WBINVD instructions. Note on a processor that supports self-snooping, CPUID feature flag bit 27, this step is unnecessary. 6. If the PGE flag is set in control register CR4, flush all TLBs by clearing that flag. 7.
MEMORY CACHE CONTROL The requirement that all 4-KByte ranges in a large page are of the same memory type implies that large pages with different memory types may suffer a performance penalty, since they must be marked with the lowest common denominator memory type. The Pentium 4, Intel Xeon, and P6 family processors provide special support for the physical memory range from 0 to 4 MBytes, which is potentially mapped by both the fixed and variable MTRRs.
MEMORY CACHE CONTROL 11.12.2 IA32_PAT MSR The IA32_PAT MSR is located at MSR address 277H (see to Appendix B, “ModelSpecific Registers (MSRs),” and this address will remain at the same address on future IA-32 processors that support the PAT feature. Figure 11-9. shows the format of the 64-bit IA32_PAT MSR. The IA32_PAT MSR contains eight page attribute fields: PA0 through PA7. The three low-order bits of each field are used to specify a memory type.
MEMORY CACHE CONTROL 11.12.3 Selecting a Memory Type from the PAT To select a memory type for a page from the PAT, a 3-bit index made up of the PAT, PCD, and PWT bits must be encoded in the page-table or page-directory entry for the page. Table 11-11 shows the possible encodings of the PAT, PCD, and PWT bits and the PAT entry selected with each encoding. The PAT bit is bit 7 in page-table entries that point to 4-KByte pages and bit 12 in paging-structure entries that point to larger pages.
MEMORY CACHE CONTROL The values in all the entries of the PAT can be changed by writing to the IA32_PAT MSR using the WRMSR instruction. The IA32_PAT MSR is read and write accessible (use of the RDMSR and WRMSR instructions, respectively) to software operating at a CPL of 0. Table 11-10 shows the allowable encoding of the entries in the PAT. Attempting to write an undefined memory type encoding into the PAT causes a general-protection (#GP) exception to be generated.
MEMORY CACHE CONTROL 11.12.5 PAT Compatibility with Earlier IA-32 Processors For IA-32 processors that support the PAT, the IA32_PAT MSR is always active. That is, the PCD and PWT bits in page-table entries and in page-directory entries (that point to pages) are always select a memory type for a page indirectly by selecting an entry in the PAT. They never select the memory type for a page directly as they do in earlier IA-32 processors that do not implement the PAT (see Table 11-6).
CHAPTER 12 INTEL MMX TECHNOLOGY SYSTEM PROGRAMMING ® ™ This chapter describes those features of the Intel® MMX™ technology that must be considered when designing or enhancing an operating system to support MMX technology. It covers MMX instruction set emulation, the MMX state, aliasing of MMX registers, saving MMX state, task and context switching considerations, exception handling, and debugging. 12.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING result, the MMX register mapping is fixed and is not affected by value in the Top Of Stack (TOS) field in the floating-point status word (bits 11 through 13). x87 FPU Tag Register 79 64 63 Floating-Point Registers 0 00 R7 00 R6 00 R5 00 R4 00 R3 00 R2 00 R1 00 R0 x87 FPU Status Register 13 11 000 63 TOS MMX Registers 0 MM7 MM6 MM5 MM4 MM3 MM2 MM1 TOS = 0 MM0 Figure 12-1.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING • When the EMMS instruction is executed, each tag field in the x87 FPU tag word is set to 11B (empty). • Each time an MMX instruction is executed, the TOS value is set to 000B.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING Table 12-3. Effect of the MMX, x87 FPU, and FXSAVE/FXRSTOR Instructions on the x87 FPU Tag Word Instruction Type Instruction x87 FPU Tag Word Image of x87 FPU Tag Word Stored in Memory MMX All (except EMMS) All tags are set to 00B (valid). Not affected. MMX EMMS All tags are set to 11B (empty). Not affected. x87 FPU All (except FSAVE, FSTENV, FRSTOR, FLDENV) Tag for modified floatingpoint register is set to 00B or 11B. Not affected.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING • Execute eight MOVQ instructions to save the contents of the MMX0 through MMX7 registers to memory. An EMMS instruction may then (optionally) be executed to clear the MMX state in the x87 FPU. • Execute eight MOVQ instructions to read the saved contents of MMX registers from memory into the MMX0 through MMX7 registers. NOTE The IA-32 architecture does not support scanning the x87 FPU tag word and then only saving valid entries. 12.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING • System exceptions: — Invalid Opcode (#UD), if the EM flag in control register CR0 is set when an MMX instruction is executed (see Section 12.1, “Emulation of the MMX Instruction Set”). — Device not available (#NM), if an MMX instruction is executed when the TS flag in control register CR0 is set. (See Section 13.5.1, “Using the TS Flag to Control the Saving of the x87 FPU, MMX, SSE, SSE2, SSE3 SSSE3 and SSE4 State.”) • Floating-point error (#MF).
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING When the TOS equals 2 (case B in Figure 12-2), ST0 points to the physical location R2. MM0 maps to ST6, MM1 maps to ST7, MM2 maps to ST0, and so on.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING 12-8 Vol.
CHAPTER 13 SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR EXTENDED STATES This chapter describes system programming features for instruction set extensions operating on the processor state extension known as the SSE state (XMM registers, MXCSR) and for processor extended states. Instruction set extensions operating on the SSE state include the streaming SIMD extensions (SSE), streaming SIMD extensions 2 (SSE2), streaming SIMD extensions 3 (SSE3), Supplemental SSE3 (SSSE3), and SSE4.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR guidelines for this support. Because SSE/SSE2/SSE3/SSSE3/SSE4 extensions share the same state, experience the same sets of non-numerical and numerical exception behavior, these guidelines that apply to SSE also apply to other sets of SIMD extensions that operate on the same processor state and subject to the same sets of of non-numerical and numerical exception behavior.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND To use POPCNT instruction, software must check CPUID.1:ECX.POPCNT[bit 23] = 1 13.1.3 Checking for Support for the FXSAVE and FXRSTOR Instructions A separate check must be made to insure that the processor supports FXSAVE and FXRSTOR. Make sure: • CPUID.1:EDX.FXSR[bit 24] = 1 13.1.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR • • • OSFXSR and OSXMMEXCPT flags in control register CR4 SSE/SSE2/SSE3/SSSE3/SSE4 feature flags returned by CPUID EM, MP, and TS flags in control register CR0 Table 13-1. Action Taken for Combinations of OSFXSR, OSXMMEXCPT, SSE, SSE2, SSE3, EM, MP, and TS1 CR4 CPUID CR0 Flags OSFXSR OSXMMEXCPT SSE, SSE2, SSE32 SSE4_13 EM MP 4 TS 0 X5 X X 1 X #UD exception. 1 X 0 X 1 X #UD exception. 1 X 1 1 1 X #UD exception.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND Table 13-2. Action Taken for Combinations of OSFXSR, SSSE3, SSE4, EM, and TS CR4 CPUID CR0 Flags OSFXSR SSSE3 SSE4_1* SSE4_2** EM TS 0 X*** X X #UD exception. 1 0 X X #UD exception. 1 1 1 X #UD exception. 1 1 0 1 #NM exception. Action NOTES: * Applies to SSE4_1 instructions except DPPS, DPPD, ROUNDPS, ROUNDPD, ROUNDSS, ROUNDSD. ** Applies to SSE4_2 instructions except CRC32 and POPCNT. ***X — Don’t care.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR to a 16-byte boundary will also generate a general-protection exception, instead a stack-segment fault exception (#SS). — Page fault (#PF). — Alignment check (#AC). When enabled, this type of alignment check operates on operands that are less than 128-bits in size: 16-bit, 32-bit, and 64-bit.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND — Device not available (#NM). This exception is generated by executing a SSE/SSE2/SSE3/SSSE3/SSE4 instruction when the TS flag (bit 3) of CR0 is set to 1. Other exceptions can occur indirectly due to faulty execution of the above exceptions. 13.1.6 Providing an Handler for the SIMD Floating-Point Exception (#XM) SSE/SSE2/SSE3/SSSE3/SSE4 instructions do not generate numeric exceptions on packed integer operations.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR 13.1.6.1 Numeric Error flag and IGNNE# SSE/SSE2/SSE3/SSE4 extensions ignore the NE flag in control register CR0 (that is, treats it as if it were always set) and the IGNNE# pin. When an unmasked SIMD floating-point exception is detected, it is always reported by generating a SIMD floating-point exception (#XM). 13.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND • Execute a LDMXCSR instruction to restore the state of the MXCSR register from memory. 13.4 SAVING THE SSE/SSE2/SSE3/SSSE3/SSE4 STATE ON TASK OR CONTEXT SWITCHES When switching from one task or context to another, it is often necessary to save the SSE/SSE2/SSE3/SSSE3/SSE4 state. FXSAVE and FXRSTOR instructions provide a simple method for saving and restoring this state. See Section 13.3, “Saving and Restoring the SSE/SSE2/SSE3/SSSE3/SSE4 State.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR when a suspended task is resumed (using an FXRSTOR instruction). Here, the x87 FPU/MMX/SSE/SSE2/SSE3/SSE4 state must be saved as part of the task state. This approach is appropriate for preemptive multitasking operating systems, where the application cannot know when it is going to be preempted and cannot prepare in advance for task switching.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND The TS flag can be set either explicitly (by executing a MOV instruction to control register CR0) or implicitly (using the IA-32 architecture’s native task switching mechanism). When the native task switching mechanism is used, the processor automatically sets the TS flag on a task switch. After the device-not-available handler has saved the x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3/SSE4 state, it should execute the CLTS instruction to clear the TS flag.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR If a new task attempts to access an x87 FPU, MMX, XMM, or MXCSR register while the TS flag is set to 1, a device-not-available exception (#NM) is generated. The devicenot-available exception handler executes the following pseudo-code.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND — CPUID leaf function 0DH enumerates the list of processor states (including legacy x87 FPU, SSE states and processor extended states), the offset and size of individual save area for each processor extended state. • Control register enhancement and dedicated register for enabling each processor extended state: CR4. OSXSAVE[bit 18] and the XFEATURE_ENABLED_MASK register (XCR0) are described in Chapter 2, “System Architecture Overview”.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR XState_BV 63 .................................. 4 1 3 0 2 1 0 1 1 1 Bit Position Extensions 3 X87 FPU State FXSAVE FXRSTOR Save Area SSE State Extensions 4 Extensions 2 XState_BV, .. Header Ext_SaveArea2 Updated Not updated Ext_SaveArea3 Ext_SaveArea4 Updated ......................... Figure 13-2.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND enabled), a value of "1" in the corresponding bit of HEADER.XSTATE_BV causes the processor state to be updated with contents of the save area read from the memory image. A value of "0" in HEADER.XSTATE_BV causes the processor state to be initialized by hardware supplied values instead of from memory (See the operation detail of XRSTOR in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B).
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR Table 13-4.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND 13.8 DETECTION, ENUMERATION, ENABLING PROCESSOR EXTENDED STATE SUPPORT An OS can determine if the XSAVE/XRSTOR/XGETBV/XSETBV instructions and the XFEATURE_ENABLED_MASK register (XCR0) are available in the processor by checking the value of CPUID.1.ECX.XSAVE to be 1. The OS must set CR4.OSXSAVE to 1 to enable the new instructions.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR instructions, and provides a more constrained list of features than using all 1's in the save mask. The advantage of using a mask value of all-bits-set-to-1 for XSAVE/XRSTOR is that it can simplify system software’s support for processor extended state management, when multiple generations of hardware may support different number of processor extended states as reported by CPUID.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND Check feature flag CPUID.1H:ECX.OXSAVE = 1? OS provides processor extended state management Yes Implied HW support for XSAVE, XRSTOR, XGETBV, XCR0 Check enabled state in XCR0 via XGETBV State enabled Check feature flag for Instruction set ok to use Instructions Figure 13-4.
SYSTEM PROGRAMMING FOR INSTRUCTION SET EXTENSIONS AND PROCESSOR 13-20 Vol.
CHAPTER 14 POWER AND THERMAL MANAGEMENT This chapter describes facilities of Intel 64 and IA-32 architecture used for power management and thermal monitoring. 14.1 ENHANCED INTEL SPEEDSTEP® TECHNOLOGY Enhanced Intel SpeedStep® Technology was introduced in the Pentium M processor; it is available in Pentium 4, Intel Xeon, Intel® Core™ Solo, Intel® Core™ Duo, Intel® Atom™ and Intel® Core™2 Duo processors. The technology manages processor power consumption using performance state transitions.
POWER AND THERMAL MANAGEMENT tools can access model-specific events and report the occurrences of state transitions. 14.2 P-STATE HARDWARE COORDINATION The Advanced Configuration and Power Interface (ACPI) defines performance states (P-state) that are used facilitate system software’s ability to manage processor power consumption. Different P-state correspond to different performance levels that are applied while the processor is actively executing instructions.
POWER AND THERMAL MANAGEMENT • IA32_APERF MSR (0xE8) increments in proportion to actual performance, while accounting for hardware coordination of P-state and TM1/TM2; or software initiated throttling. • The MSRs are per logical processor; they measure performance only when the targeted processor is in the C0 state. • Only the IA32_APERF/IA32_MPERF ratio is architecturally defined; software should not attach meaning to the content of the individual of IA32_APERF or IA32_MPERF MSRs.
POWER AND THERMAL MANAGEMENT // This example does not cover the additional logic or algorithms // necessary to coordinate multiple logical processors to a target P-state. TargetPstate = FindPstate(PercentPerformance); if (TargetPstate != currentPstate) { SetPState(TargetPstate); } // WRMSR of MCNT and ACNT should be performed without delay. // Software needs to exercise care to avoid delays between // the two WRMSRs (for example, interrupts). WRMSR(IA32_MPERF, 0); WRMSR(IA32_APERF, 0); 14.
POWER AND THERMAL MANAGEMENT corresponding enable mechanism is activated, the headroom is available and certain criteria are met. • The opportunistic processor performance operation is generally transparent to most application software. • System software (BIOS and Operating system) must be aware of hardware support for opportunistic processor performance operation and may need to temporarily disengage opportunistic processor performance operation when it requires more predictable processor operation.
POWER AND THERMAL MANAGEMENT to the OS, it may be undesirable to allow the possibility of the processor delivering increased performance that cannot be sustained after the calibration phase. System software can temporarily disengage opportunistic processor performance operation by setting bit 32 of the IA32_PERF_CTL MSR (0199H), using a readmodify-write sequence on the MSR.
POWER AND THERMAL MANAGEMENT 14.3.2.4 Application Awareness of Opportunistic Processor Operation (Optional) There may be situations that an end user or application software wishes to be aware of turbo mode activity. It is possible for an application-level utility to periodically check the occurrences of opportunistic processor operation. The basic elements of an algorithm is described below, using the characteristics of Intel Turbo Boost Technology as example.
POWER AND THERMAL MANAGEMENT • When the OS timer service transfers control, the application can use RDPMC (with ECX = 4000_0001H) to read IA32_PERF_FIXED_CTR1 (MSR address 30AH) to record the unhalted core clocktick (UCC) value; followed by RDPMC (ECX=4000_0002H) to read IA32_PERF_FIXED_CTR2 (MSR address 30BH) to record the unhalted reference clocktick (URC) value. This pair of values is needed for each logical processor for each sampling period.
POWER AND THERMAL MANAGEMENT Software can program the lowest four bits of IA32_ENERGY_PERF_BIAS MSR with a value from 0 - 15. The values represent a sliding scale, where a value of 0 (the default reset value) corresponds to a hint preference for highest performance and a value of 15 corresponds to the maximum energy savings. A value of 7 roughly translates into a hint to balance performance with energy consumption 4 3 63 0 Reserved Energy Policy Preference Hint Figure 14-4.
POWER AND THERMAL MANAGEMENT Reference, A-M,” of Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A). If CPUID.05H.ECX[Bit 1] = 1, the target processor supports using interrupts as break-events for MWAIT, even when interrupts are disabled. Use this feature to measure C-state residency as follows: • Software can write to bit 0 in the MWAIT Extensions register (ECX) when issuing an MWAIT to enter into a processor-specific C-state or sub C-state.
POWER AND THERMAL MANAGEMENT consumption; this is in addition to the reduction offered by automatic thermal monitoring mechanisms. 4. On-die digital thermal sensor and interrupt mechanisms permit the OS to manage thermal conditions natively without relying on BIOS or other system board components. The first mechanism is not visible to software. The other three mechanisms are visible to software using processor feature information returned by executing CPUID with EAX = 1.
POWER AND THERMAL MANAGEMENT 14.5.1 Catastrophic Shutdown Detector P6 family processors introduced a thermal sensor that acts as a catastrophic shutdown detector. This catastrophic shutdown detector was also implemented in Pentium 4, Intel Xeon and Pentium M processors. It is always enabled. When processor core temperature reaches a factory preset level, the sensor trips and processor execution is halted until after the next reset cycle. 14.5.
POWER AND THERMAL MANAGEMENT Support for TM2 is indicated by CPUID.1:ECX.TM2[bit 8] = 1. 14.5.2.3 Two Methods for Enabling TM2 On processors with CPUID family/model/stepping signature encoded as 0x69n or 0x6Dn (early Pentium M processors), TM2 is enabled if the TM_SELECT flag (bit 16) of the MSR_THERM2_CTL register is set to 1 (Figure 14-6) and bit 3 of the IA32_MISC_ENABLE register is set to 1. Following a power-up or reset, the TM_SELECT flag may be cleared.
POWER AND THERMAL MANAGEMENT 15 63 0 Reserved TM2 Transition Target Figure 14-7. MSR_THERM2_CTL Register for Supporting TM2 14.5.2.4 Performance State Transitions and Thermal Monitoring If the thermal control circuitry (TCC) for thermal monitor (TM1/TM2) is active, writes to the IA32_PERF_CTL will effect a new target operating point as follows: • If TM1 is enabled and the TCC is engaged, the performance state transition can commence before the TCC is disengaged.
POWER AND THERMAL MANAGEMENT 63 210 Reserved Thermal Status Log Thermal Status Figure 14-8. IA32_THERM_STATUS MSR After the second temperature sensor has been tripped, the thermal monitor (TM1/TM2) will remain engaged for a minimum time period (on the order of 1 ms). The thermal monitor will remain engaged until the processor core temperature drops below the preset trip temperature of the temperature sensor, taking hysteresis into account.
POWER AND THERMAL MANAGEMENT interrupt enable flags in the IA32_THERM_INTERRUPT MSR are cleared (interrupts are disabled) and the thermal LVT entry is set to mask interrupts. This interrupt should be handled either by the operating system or system management mode (SMM) code. Note that the operation of the thermal monitoring mechanism has no effect upon the clock rate of the processor's internal high-resolution timer (time stamp counter). 14.5.2.
POWER AND THERMAL MANAGEMENT 63 543 10 Reserved On-Demand Clock Modulation Enable On-Demand Clock Modulation Duty Cycle Reserved Figure 14-10. IA32_CLOCK_MODULATION MSR The IA32_CLOCK_MODULATION MSR contains the following flag and field used to enable software-controlled clock modulation and to select the clock modulation duty cycle: • On-Demand Clock Modulation Enable, bit 4 — Enables on-demand software controlled clock modulation when set; disables software-controlled clock modulation when clear.
POWER AND THERMAL MANAGEMENT clock modulation at the duty cycle specified by TM1 takes precedence, regardless of the setting of the on-demand clock modulation duty cycle. For Hyper-Threading Technology enabled processors, the IA32_CLOCK_MODULATION register is duplicated for each logical processor. In order for the On-demand clock modulation feature to work properly, the feature must be enabled on all the logical processors within a physical processor.
POWER AND THERMAL MANAGEMENT 14.5.5.2 Reading the Digital Sensor Unlike traditional analog thermal devices, the output of the digital thermal sensor is a temperature relative to the maximum supported operating temperature of the processor. Temperature measurements returned by digital thermal sensors are always at or below TCC activation temperature. Critical temperature conditions are detected using the “Critical Temperature Status” bit.
POWER AND THERMAL MANAGEMENT • PROCHOT# or FORCEPR# Log (bit 3, R/WC0) — Sticky bit that indicates whether PROCHOT# or FORCEPR# has been asserted by another agent on the platform since the last clearing of this bit or a reset. If bit 3 = 1, PROCHOT# or FORCEPR# has been externally asserted. Software may clear this bit by writing a zero. External PROCHOT# assertions are only acknowledged if the Bidirectional Prochot feature is enabled.
POWER AND THERMAL MANAGEMENT • Reading Valid (bit 31, RO) — Indicates if the digital readout in bits 22:16 is valid. The readout is valid if bit 31 = 1. Changes to temperature can be detected using two thresholds (see Figure 14-12); one is set above and the other below the current temperature. These thresholds have the capability of generating interrupts using the core's local APIC which software must then service.
POWER AND THERMAL MANAGEMENT • Critical Temperature Interrupt Enable (bit 4, R/W) — Enables the generation of an interrupt when the Critical Temperature Detector has detected a critical thermal condition. The recommended response to this condition is a system shutdown. Bit 4 = 0 disables the interrupt; bit 4 = 1 enables the interrupt. • Threshold #1 Value (bits 14:8, R/W) — A temperature threshold, encoded relative to the TCC Activation temperature (using the same format as the Digital Readout).
CHAPTER 15 MACHINE-CHECK ARCHITECTURE This chapter describes the machine-check architecture and machine-check exception mechanism found in the Pentium 4, Intel Xeon, and P6 family processors. See Chapter 6, “Interrupt 18—Machine-Check Exception (#MC),” for more information on machine-check exceptions. A brief description of the Pentium processor’s machine check capability is also given. Additionally, a signaling mechanism for software to respond to hardware corrected machine check error is covered. 15.
MACHINE-CHECK ARCHITECTURE 15.2 COMPATIBILITY WITH PENTIUM PROCESSOR The Pentium 4, Intel Xeon, and P6 family processors support and extend the machine-check exception mechanism introduced in the Pentium processor. The Pentium processor reports the following machine-check errors: • • data parity errors during read cycles unsuccessful completion of a bus cycle The above errors are reported using the P5_MC_TYPE and P5_MC_ADDR MSRs (implementation specific for the Pentium processor).
MACHINE-CHECK ARCHITECTURE Error-Reporting Bank Registers (One Set for Each Hardware Unit) Global Control MSRs 63 0 63 IA32_MCG_CAP MSR 63 0 63 IA32_MCG_STATUS MSR 0 IA32_MCi_STATUS MSR 0 63 0 IA32_MCi_CTL MSR 0 63 IA32_MCG_CTL MSR IA32_MCi_ADDR MSR 63 0 IA32_MCi_MISC MSR 0 63 IA32_MCi_CTL2 MSR Figure 15-1. Machine-Check MSRs Each error-reporting bank is associated with a specific hardware unit (or group of hardware units) in the processor.
MACHINE-CHECK ARCHITECTURE 63 25 24 23 16 15 0 12 11 10 9 8 7 Count Reserved MCG_SER_P[24] MCG_EXT_CNT[23:16] MCG_TES_P[11] MCG_CMCI_P[10] MCG_EXT_P[9] MCG_CTL_P[8] Figure 15-2. IA32_MCG_CAP Register Where: • Count field, bits 7:0 — Indicates the number of hardware unit error-reporting banks available in a particular processor implementation. • MCG_CTL_P (control MSR present) flag, bit 8 — Indicates that the processor implements the IA32_MCG_CTL MSR when set; this register is absent when clear.
MACHINE-CHECK ARCHITECTURE Section 15.6), and IA32_MCi_STATUS MSR bits 56:55 are used to report the signaling of uncorrected recoverable errors and whether software must take recovery actions for uncorrected errors. Note that when MCG_TES_P is not set, bits 56:53 of the IA32_MCi_STATUS MSR are model-specific. If MCG_TES_P is set but MCG_SER_P is not set, bits 56:55 are reserved. The effect of writing to the IA32_MCG_CAP MSR is undefined. 15.3.1.
MACHINE-CHECK ARCHITECTURE 15.3.1.3 IA32_MCG_CTL MSR The IA32_MCG_CTL MSR is present if the capability flag MCG_CTL_P is set in the IA32_MCG_CAP MSR. IA32_MCG_CTL controls the reporting of machine-check exceptions. If present, writing 1s to this register enables machine-check features and writing all 0s disables machine-check features. All other values are undefined and/or implementation specific. 15.3.
MACHINE-CHECK ARCHITECTURE encoding of 06H_1AH and onward): the operating system or executive software must not modify the contents of the IA32_MC0_CTL MSR. This MSR is internally aliased to the EBL_CR_POWERON MSR and controls platform-specific error handling features. System specific firmware (the BIOS) is responsible for the appropriate initialization of the IA32_MC0_CTL MSR. P6 family processors only allow the writing of all 1s or all 0s to the IA32_MCi_CTL MSR. 15.3.2.
MACHINE-CHECK ARCHITECTURE introduced with Intel 64 processor having CPUID DisplayFamily_DisplayModel encoding of 06H_1AH.
MACHINE-CHECK ARCHITECTURE • If IA32_MCG_CAP[10] is 0, bits 52:38 also contain “Other Information” (in the same sense as bits 37:32). • If IA32_MCG_CAP[10] is 1, bits 52:38 are architectural (not modelspecific). In this case, bits 52:38 reports the value of a 15 bit counter that increments each time a corrected error is observed by the MCA recording bank. This count value will continue to increment until cleared by software. The most significant bit, 52, is a sticky count overflow bit.
MACHINE-CHECK ARCHITECTURE flag indicates that the error did not affect the processor’s state. Software restarting might be possible. • ADDRV (IA32_MCi_ADDR register valid) flag, bit 58 — Indicates (when set) that the IA32_MCi_ADDR register contains the address where the error occurred (see Section 15.3.2.3, “IA32_MCi_ADDR MSRs”). When clear, this flag indicates that the IA32_MCi_ADDR register is either not implemented or does not contain the address where the error occurred.
MACHINE-CHECK ARCHITECTURE In Table 15-2, the values in the two left-most columns are IA32_MCi_STATUS[54:53]. Table 15-2.
MACHINE-CHECK ARCHITECTURE Processor Without Support For Intel 64 Architecture 63 0 36 35 Address Reserved Processor With Support for Intel 64 Architecture 63 0 Address * * Useful bits in this field depend on the address methodology in use when the the register state is saved. Figure 15-6. IA32_MCi_ADDR MSR 15.3.2.4 IA32_MCi_MISC MSRs The IA32_MCi_MISC MSR contains additional information describing the machine-check error if the MISCV flag in the IA32_MCi_STATUS register is set.
MACHINE-CHECK ARCHITECTURE 9 8 63 6 5 0 Model Specific Information Address Mode Recoverable Address LSB Figure 15-7. UCR Support in IA32_MCi_MISC Register • Recoverable Address LSB (bits 5:0): The lowest valid recoverable address bit. Indicates the position of the least significant bit (LSB) of the recoverable error address. For example, if the processor logs bits [43:9] of the address, the LSB sub-field in IA32_MCi_MISC is 01001b (9 decimal).
MACHINE-CHECK ARCHITECTURE When IA32_MCG_CAP[10] = 1, the IA32_MCi_CTL2 MSR for each bank exists, i.e. reads and writes to these MSR are supported. However, signaling interface for corrected MC errors may not be supported in all banks. The layout of IA32_MCi_CTL2 is shown in Figure 15-8: 63 31 30 29 15 14 0 Reserved Reserved CMCI_EN—Enable/disable CMCI Corrected error count threshold Figure 15-8.
MACHINE-CHECK ARCHITECTURE 15.3.2.6 IA32_MCG Extended Machine Check State MSRs The Pentium 4 and Intel Xeon processors implement a variable number of extended machine-check state MSRs. The MCG_EXT_P flag in the IA32_MCG_CAP MSR indicates the presence of these extended registers, and the MCG_EXT_CNT field indicates the number of these registers actually implemented. See Section 15.3.1.1, “IA32_MCG_CAP MSR.” Also see Table 15-4. Table 15-4.
MACHINE-CHECK ARCHITECTURE Table 15-5. Extended Machine Check State MSRs In Processors With Support For Intel 64 Architecture MSR Address Description IA32_MCG_RAX 180H Contains state of the RAX register at the time of the machinecheck error. IA32_MCG_RBX 181H Contains state of the RBX register at the time of the machinecheck error. IA32_MCG_RCX 182H Contains state of the RCX register at the time of the machinecheck error.
MACHINE-CHECK ARCHITECTURE Table 15-5. Extended Machine Check State MSRs In Processors With Support For Intel 64 Architecture (Contd.) MSR Address Description IA32_MCG_R14 196H Contains state of the R14 register at the time of the machinecheck error. IA32_MCG_R15 197H Contains state of the R15 register at the time of the machinecheck error.
MACHINE-CHECK ARCHITECTURE processor; the handler must be written to interpret P5_MC_TYPE encodings correctly. 15.4 ENHANCED CACHE ERROR REPORTING Starting with Intel Core Duo processors, cache error reporting was enhanced. In earlier Intel processors, cache status was based on the number of correction events that occurred in a cache. In the new paradigm, called “threshold-based error status”, cache status is based on the number of lines (ECC blocks) in a cache that incur repeated corrections.
MACHINE-CHECK ARCHITECTURE beyond those of threshold-based error reporting (Section 15.4). With threshold-based error reporting, software is limited to use periodic polling to query the status of hardware corrected MC errors. CMCI provides a signaling mechanism to deliver a local interrupt based on threshold values that software can program using the IA32_MCi_CTL2 MSRs. CMCI is disabled by default.
MACHINE-CHECK ARCHITECTURE CMCI interrupt delivery is configured by writing to the LVT CMCI register entry in the local APIC register space at default address of APIC_BASE + 2F0H. A CMCI interrupt can be delivered to more than one logical processors if multiple logical processors are affected by the associated MC errors.
MACHINE-CHECK ARCHITECTURE • Delivery status, bits 12 — It is a read-only bit that, when set, indicates that an interrupt from this source has been delivered to the processor core, but has not yet been accepted. • Mask, bits 16 — When set, inhibits reception of the interrupt. (Unlike the PerfMon LVT entry, this bit is not set when an interrupt is received. When clear, CMCI is not masked. The mask bit is set by default. • Bits 31:17, 15:13 and 11 are reserved. 15.5.
MACHINE-CHECK ARCHITECTURE b. Each thread examines IA32_MCi_CTL2[30] indicator for each bank to determine if another thread has already claimed ownership of that bank. c. • • If IA32_MCi_CTL2[30] had been set by another thread. This thread can not own bank i and should proceed to step b. and examine the next machine check bank until all of the machine check banks are exhausted. • If IA32_MCi_CTL2[30] = 0, proceed to step c.
MACHINE-CHECK ARCHITECTURE • • Write 7FFFH to IA32_MCi_CTL2[15:0], Read back IA32_MCi_CTL2[15:0], the lower 15 bits (14:0) is the maximum threshold supported by the processor. b. Increase the threshold to a value below the maximum value discovered using step a. 15.5.2.
MACHINE-CHECK ARCHITECTURE 15.6.1 Detection of Software Error Recovery Support Software must use bit 24 of IA32_MCG_CAP (MCG_SER_P) to detect the presence of software error recovery support (see Figure 15-2). When IA32_MCG_CAP[24] is set, this indicates that the processor supports software error recovery.
MACHINE-CHECK ARCHITECTURE • S (Signaling) flag, bit 56 - Indicates (when set) that a machine check exception was generated for the UCR error reported in this MC bank and system software needs to check the AR flag and the MCA error code fields in the IA32_MCi_STATUS register to identify the necessary recovery action for this error.
MACHINE-CHECK ARCHITECTURE IA32_MCi_STATUS register. Recovery actions for SRAO errors are MCA error code specific. The MISCV and the ADDRV flags in the IA32_MCi_STATUS register are set when the additional error information is available from the IA32_MCi_MISC and the IA32_MCi_ADDR registers. System software needs to inspect the MCA error code fields in the IA32_MCi_STATUS register to identify the specific recovery action for a given SRAO error.
MACHINE-CHECK ARCHITECTURE 15.6.4 UCR Error Overwrite Rules In general, the overwrite rules are as follows: • • • • UCR errors will overwrite corrected errors. Uncorrected (PCC=1) errors overwrite UCR (PCC=0) errors. UCR errors are not written over previous UCR errors. Corrected errors do not write over previous UCR errors.
MACHINE-CHECK ARCHITECTURE 15.7 MACHINE-CHECK AVAILABILITY The machine-check architecture and machine-check exception (#MC) are model-specific features. Software can execute the CPUID instruction to determine whether a processor implements these features. Following the execution of the CPUID instruction, the settings of the MCA flag (bit 14) and MCE flag (bit 7) in EDX indicate whether the processor implements the machine-check architecture and machine-check exception. 15.
MACHINE-CHECK ARCHITECTURE FI (* enables all MCA features *) (* Determine number of error-reporting banks supported *) COUNT← IA32_MCG_CAP.
MACHINE-CHECK ARCHITECTURE also write a 16-bit model-specific error code in the IA32_MCi_STATUS register depending on the implementation of the machine-check architecture of the processor. The MCA error codes are architecturally defined for Intel 64 and IA-32 processors. To determine the cause of a machine-check exception, the machine-check exception handler must read the VAL flag for each IA32_MCi_STATUS register.
MACHINE-CHECK ARCHITECTURE 15.9.2 Compound Error Codes Compound error codes describe errors related to the TLBs, memory, caches, bus and interconnect logic, and internal timer. A set of sub-fields is common to all of compound errors. These sub-fields describe the type of access, level in the cache hierarchy, and type of request. Table 15-9 shows the general form of the compound error codes. Table 15-9.
MACHINE-CHECK ARCHITECTURE The behavior of error filtering after crossing the yellow threshold is modelspecific. 15.9.2.2 Transaction Type (TT) Sub-Field The 2-bit TT sub-field (Table 15-10) indicates the type of transaction (data, instruction, or generic). The sub-field applies to the TLB, cache, and interconnect error conditions. Note that interconnect error conditions are primarily associated with P6 family and Pentium processors, which utilize an external APIC bus separate from the system bus.
MACHINE-CHECK ARCHITECTURE caused the error. Eviction and snoop requests apply only to the caches. All of the other requests apply to TLBs, caches and interconnects. Table 15-12. Encoding of Request (RRRR) Sub-Field Request Type Mnemonic Binary Encoding Generic Error ERR 0000 Generic Read RD 0001 Generic Write WR 0010 Data Read DRD 0011 Data Write DWR 0100 Instruction Fetch IRD 0101 Prefetch PREFETCH 0110 Eviction EVICT 0111 Snoop SNOOP 1000 15.9.2.
MACHINE-CHECK ARCHITECTURE Table 15-13. Encodings of PP, T, and II Sub-Fields (Contd.) I/O IO Other transaction 10 11 NOTE: * Local processor differentiates the processor reporting the error from other system components (including the APIC, other processors, etc.). 15.9.2.6 Memory Controller Errors The memory controller errors are defined with the 3-bit MMM (memory transaction type), and 4-bit CCCC (channel) sub-fields. The encodings for MMM and CCCC are defined in Table 15-14. Table 15-14.
MACHINE-CHECK ARCHITECTURE 15-9). Their values and compound encoding format are given in Table 15-15. Table 15-15.
MACHINE-CHECK ARCHITECTURE IA32_MCG_STATUS register for the memory scrubbing and L3 explicit writeback errors on both the reporting and non-reporting logical processors. Table 15-17. IA32_MCG_STATUS Flag Indication for SRAO Errors SRAO Type Reporting Logical Processors Non-reporting Logical Processors RIPV EIPV RIPV EIPV Memory Scrubbing 1 0 1 0 L3 Explicit Writeback 1 0 1 0 15.9.3.2 Architecturally Defined SRAR Errors The following two SRAR errors are architecturally defined.
MACHINE-CHECK ARCHITECTURE Table 15-19 lists values of relevant bit fields of IA32_MCi_STATUS for architecturally defined SRAR errors. Table 15-19.
MACHINE-CHECK ARCHITECTURE For Instruction Fetch recoverable error, the affected logical processor should find that the RIPV flag and the EIPV Flag in the IA32_MCG_STATUS register are cleared, indicating that the error is detected at the instruction pointer saved on the stack may not be associated with this error and restarting the execution with the interrupted context is not possible.
MACHINE-CHECK ARCHITECTURE • When multiple recoverable errors are reported and no other fatal condition (e.g.. overflowed condition for SRAR error) is found for the reported recoverable errors, it is possible for system software to recover from the multiple recoverable errors by taking necessary recovery action for each individual recoverable error.
MACHINE-CHECK ARCHITECTURE Guidelines for writing a machine-check exception handler or a machineerror logging utility are given in the following sections. 15.10.1 Machine-Check Exception Handler The machine-check exception (#MC) corresponds to vector 18. To service machine-check exceptions, a trap gate must be added to the IDT. The pointer in the trap gate must point to a machine-check exception handler. Two approaches can be taken to designing the exception handler: 1.
MACHINE-CHECK ARCHITECTURE generated). If this flag is clear, the processor may still be able to be restarted (for debugging purposes) but not without loss of program continuity. • For unrecoverable errors, the EIPV flag in the IA32_MCG_STATUS register indicates whether the instruction indicated by the instruction pointer pushed on the stack (when the exception was generated) is related to the error. If the flag is clear, the pushed instruction may not be related to the error.
MACHINE-CHECK ARCHITECTURE When machine-check exceptions are enabled for the Pentium processor (MCE flag is set in control register CR4), the machine-check exception handler uses the RDMSR instruction to read the error type from the P5_MC_TYPE register and the machine check address from the P5_MC_ADDR register. The handler then normally reports these register values to the system console before aborting execution (see Example 15-2). 15.10.
MACHINE-CHECK ARCHITECTURE AND PCC flag in IA32_MCi_STATUS = 1 OR RIPV flag in IA32_MCG_STATUS = 0 (* execution is not restartable *) THEN RESTARTABILITY = FALSE; return RESTARTABILITY to calling procedure; FI; FI; Save time-stamp counter and processor ID; Set IA32_MCi_STATUS to all 0s; Execute serializing instruction (i.e., CPUID); OD; FI; If the processor supports the machine-check architecture, the utility reads through the banks of error-reporting registers looking for valid register entries.
MACHINE-CHECK ARCHITECTURE mechanism to indicate the frequency of exceptions. A multiprocessing operating system stores the identity of the processor node incurring the exception using a unique identifier, such as the processor’s APIC ID (see Section 10.9, “Handling Interrupts”). The basic algorithm given in Example 15-3 can be modified to provide more robust recovery techniques. For example, software has the flexibility to attempt recovery using information unavailable to the hardware.
MACHINE-CHECK ARCHITECTURE was corrected (UC=0) or uncorrected (UC=1). The MCE handler can optionally log and clear the corrected errors in the MC banks if it can implement software algorithm to avoid the undesired race conditions with the CMCI or CMC polling handler.
MACHINE-CHECK ARCHITECTURE AR flag to find the type of the UCR error for software recovery and determine if software error recovery is possible. • When both the S and the AR flags are clear in the IA32_MCi_STATUS register for the UCR error (VAL=1, UC=1, EN=x and PCC=0), the error in this bank is an uncorrected no-action required error (UCNA). UCNA errors are uncorrected but do not require any OS recovery action to continue execution.
MACHINE-CHECK ARCHITECTURE • When the OVER flag in the IA32_MCi_STATUS register is set for the SRAR error (VAL=1, UC=1, EN=1, PCC=0, S=1 and AR=1), the MCE handler cannot take recovery action as the information of the SRAR error in the IA32_MCi_STATUS register was potentially lost due to the overflow condition. Since the recovery action for SRAR errors must be taken, the MCE handler must signal the operating system to reset the system.
MACHINE-CHECK ARCHITECTURE FI RESTARTABILITY = FALSE; FI; IF RESTARTABILITY = FALSE THEN Report RESTARTABILITY to console; Reset system; FI; IF MCA_BROADCAST = TRUE THEN IF ProcessorCount = MAX_PROCESSORS AND NOERROR = TRUE THEN Report RESTARTABILITY to console; Reset system; FI; Release SpinLock; Wait till ProcessorCount = MAX_PROCESSRS on system; (* implement a timeout and abort function if necessary *) FI; CLEAR MCIP flag in IA32_MCG_STATUS; RESUME Execution; (* End of MACHINE CHECK HANDLER*) MCA ERR
MACHINE-CHECK ARCHITECTURE IF PCC Flag in IA32_MCi_STATUS = 1 THEN (* processor context might have been corrupted *) RESTARTABILITY = FALSE; ELSE (* It is a uncorrected recoverable (UCR) error *) IF S Flag in IA32_MCi_STATUS = 0 THEN IF AR Flag in IA32_MCi_STATUS = 0 THEN (* It is a uncorrected no action required (UCNA) error *) GOTO CONTINUE; (* let CMCI and CMC polling handler to process *) ELSE FESTARTABILITY = FALSE; (* S=0, AR=1 is illegal *) FI FI; IF RESTARTABILITY = FALSE THEN (* no need to take re
MACHINE-CHECK ARCHITECTURE If MISCV in IA32_MCi_STATUS THEN SAVE IA32_MCi_MISC; FI; IF ADDRV in IA32_MCi_STATUS THEN SAVE IA32_MCi_ADDR; FI; IF CLEAR_MC_BANK = TRUE THEN SET all 0 to IA32_MCi_STATUS; If MISCV in IA32_MCi_STATUS THEN SET all 0 to IA32_MCi_MISC; FI; IF ADDRV in IA32_MCi_STATUS THEN SET all 0 to IA32_MCi_ADDR; FI; FI; CONTINUE: OD; ( *END FOR *) RETURN; (* End of MCA ERROR PROCESSING*) 15.10.4.
MACHINE-CHECK ARCHITECTURE before these errors are actually handled and processed by the MCE handler for attempted software error recovery. Example 15-5 gives pseudocode for a CMCI handler with UCR support. Example 15-5.
MACHINE-CHECK ARCHITECTURE 15-52 Vol.
CHAPTER 16 DEBUGGING, PROFILING BRANCHES AND TIMESTAMP COUNTER Intel 64 and IA-32 architectures provide debug facilities for use in debugging code and monitoring performance. These facilities are valuable for debugging application software, system software, and multitasking operating systems. Debug support is accessed using debug registers (DB0 through DB7) and model-specific registers (MSRs): • Debug registers hold the addresses of memory and I/O locations called breakpoints.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER instruction is an alternative way to set code breakpoints. It is especially useful when more than four breakpoints are desired, or when breakpoints are being placed in the source code. • Last branch recording facilities — Store branch records in the last branch record (LBR) stack MSRs for the most recent taken branches, interrupts, and/or exceptions in MSRs. A branch record consist of a branch-from and a branch-to instruction address.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER • Whether the breakpoint condition was present when the debug exception was generated.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.2.1 Debug Address Registers (DR0-DR3) Each of the debug-address registers (DR0 through DR3) holds the 32-bit linear address of a breakpoint (see Figure 16-1). Breakpoint comparisons are made before physical address translation occurs. The contents of debug register DR7 further specifies breakpoint conditions. 16.2.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER exceptions, debug handlers should clear the register before returning to the interrupted task. 16.2.4 Debug Control Register (DR7) The debug control register (DR7) enables or disables breakpoints and sets breakpoint conditions (see Figure 16-1).
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 10 — Break on I/O reads or writes. 11 — Break on data reads or writes but not instruction fetches. When the DE flag is clear, the processor interprets the R/Wn bits the same as for the Intel386™ and Intel486™ processors, which is as follows: 00 01 10 11 • — — — — Break on instruction execution only. Break on data writes only. Undefined. Break on data reads or writes but not instruction fetches.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER the lower address bits in the debug registers. Unaligned data or I/O breakpoint addresses do not yield valid results. A data breakpoint for reading or writing data is triggered if any of the bytes participating in an access is within the range defined by a breakpoint address register and its LENn field. Table 16-1 provides an example setup of debug registers and data accesses that would subsequently trap or not trap on the breakpoints.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER Table 16-1. Breakpoint Examples (Contd.) Debug Register Setup Debug Register R/Wn Data operations that do not trap - Read or write - Read - Read or write - Read or write - Read - Read or write 16.2.6 Breakpoint Address LENn A0000H A0002H A0003H B0000H C0000H C0004H 1 1 4 2 2 4 Debug Registers and Intel® 64 Processors For Intel 64 architecture processors, debug registers DR0–DR7 are 64 bits.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 63 32 DR7 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 LEN R/W LEN R/W LEN R/W LEN R/W 0 0 G 0 0 1 G L G L G L G L G L DR7 3 3 2 2 1 1 0 0 D E E 3 3 2 2 1 1 0 0 63 32 DR6 31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved (set to 1) B B B 0 1 1 1 1 1 1 1 1 1 B B B B DR6 T S D 3 2 1 0 Reserved Figure 16-2. DR6/DR7 Layout on Processors Supporting Intel 64 Technology 16.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER See also: Chapter 6, “Interrupt 1—Debug Exception (#DB),” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. Table 16-2.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER (resume flag) in the EFLAGS register (see Section 2.3, “System Flags and Fields in the EFLAGS Register,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A). When the RF flag is set, the processor ignores instruction breakpoints. All Intel 64 and IA-32 processors manage the RF flag as follows. The RF Flag is cleared at the start of the instruction after the check for code breakpoint, CS limit violation and FP exceptions.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.3.1.2 Data Memory and I/O Breakpoint Exception Conditions Data memory and I/O breakpoints are reported when the processor attempts to access a memory or I/O address specified in a breakpoint-address register (DB0 through DR3) that has been set up to detect data or I/O accesses (R/W flag is set to 1, 2, or 3).
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER single-step trap does not occur until after the instruction that follows the POPF instruction. The processor clears the TF flag before calling the exception handler. If the TF flag was set in a TSS at the time of a task switch, the exception occurs after the first instruction is executed in the new task. The TF flag normally is not cleared by privilege changes inside a task. The INT n and INTO instructions, however, do clear this flag.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.4 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING OVERVIEW P6 family processors introduced the ability to set breakpoints on taken branches, interrupts, and exceptions, and to single-step from one branch to the next.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER in the last branch record (LBR) stack. For more information, see the Section 16.5.1, “LBR Stack”. • BTF (single-step on branches) flag (bit 1) — When set, the processor treats the TF flag in the EFLAGS register as a “single-step on branches” flag rather than a “single-step on instructions” flag. This mechanism allows single-stepping the processor on taken branches, interrupts, and exceptions. See Section 16.4.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER • FREEZE_LBRS_ON_PMI flag (bit 11) — When set, the LBR stack is frozen on a hardware PMI request (e.g. when a counter overflows and is configured to trigger PMI). • FREEZE_PERFMON_ON_PMI flag (bit 12) — When set, a PMI request clears each of the “ENABLE” field of MSR_PERF_GLOBAL_CTRL MSR (see Figure 30-3) to disable all the counters.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER a bug to a particular block of code before instruction single-stepping further narrows the search. If the BTF flag is set when the processor generates a debug exception, the processor clears the BTF flag along with the TF flag. The debugger must reset the BTF and TF flags before resuming program execution to continue control-flow single stepping. 16.4.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.4.6 CPL-Qualified Branch Trace Mechanism CPL-qualified branch trace mechanism is available to a subset of Intel 64 and IA-32 processors that support the branch trace storing mechanism. The processor supports the CPL-qualified branch trace mechanism if CPUID.01H:ECX[bit 4] = 1. The CPL-qualified branch trace mechanism is described in Section 16.4.9.4.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.4.8 LBR Stack The last branch record stack and top-of-stack (TOS) pointer MSRs are supported across Intel 64 and IA-32 processor families. However, the number of MSRs in the LBR stack and the valid range of TOS pointer value can vary between different processor families.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.4.8.1 LBR Stack and Intel® 64 Processors LBR MSRs are 64-bits. If IA-32e mode is disabled, only the lower 32-bits of the address is recorded. If IA-32e mode is enabled, the processor writes 64-bit values into the MSR. In 64-bit mode, last branch records store 64-bit addresses; in compatibility mode, the upper 32-bits of last branch records are cleared.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.4.8.3 Last Exception Records and Intel 64 Architecture Intel 64 and IA-32 processors also provide MSRs that store the branch record for the last branch taken prior to an exception or an interrupt. The location of the last exception record (LER) MSRs are model specific. The MSRs that store last exception records are 64-bits. If IA-32e mode is disabled, only the lower 32-bits of the address is recorded.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER and is cleared on processor RESET and INIT. DS recording is available in real address mode. The BTS and PEBS facilities may not be available on all processors. The availability of these facilities is indicated by the BTS_UNAVAILABLE and PEBS_UNAVAILABLE flags, respectively, in the IA32_MISC_ENABLE MSR (see Appendix B). The DS save area is divided into three parts (see Figure 16-5): buffer management area, branch trace store (BTS) buffer, and PEBS buffer.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER IA32_DS_AREA MSR DS Buffer Management Area BTS Buffer Base 0H BTS Index 4H BTS Buffer Branch Record 0 BTS Absolute Maximum BTS Interrupt Threshold 8H Branch Record 1 CH PEBS Buffer Base 10H PEBS Index PEBS Absolute Maximum PEBS Interrupt Threshold PEBS Counter Reset Reserved 14H 18H Branch Record n 1CH 20H 24H PEBS Buffer 30H PEBS Record 0 PEBS Record 1 PEBS Record n Figure 16-5.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER • PEBS counter reset value — A 40-bit value that the counter is to be reset to after state information has collected following counter overflow. This value allows state information to be collected after a preset number of events have been counted. Figures 16-6 shows the structure of a 12-byte branch record in the BTS buffer.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 31 0 EFLAGS 0H Linear IP 4H EAX 8H EBX CH ECX 10H EDX 14H ESI 18H EDI 1CH EBP 20H ESP 24H Figure 16-7. PEBS Record Format 16.4.9.1 DS Save Area and IA-32e Mode Operation When IA-32e mode is active (IA32_EFER.LMA = 1), the structure of the DS save area is shown in Figure 16-8. The organization of each field in IA-32e mode operation is similar to that of non-IA-32e mode operation. However, each field now stores a 64-bit address.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER IA32_DS_AREA MSR DS Buffer Management Area BTS Buffer Base 0H BTS Index 8H BTS Buffer Branch Record 0 BTS Absolute Maximum BTS Interrupt Threshold 10H Branch Record 1 18H PEBS Buffer Base 20H PEBS Index PEBS Absolute Maximum PEBS Interrupt Threshold PEBS Counter Reset Reserved 28H 30H Branch Record n 38H 40H 48H PEBS Buffer 50H PEBS Record 0 PEBS Record 1 PEBS Record n Figure 16-8.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 63 4 0 Last Branch From 0H Last Branch To 8H 10H Branch Predicted Figure 16-9. 64-bit Branch Trace Record Format 63 0 RFLAGS 0H RIP 8H RAX 10H RBX 18H RCX 20H RDX 28H RSI 30H RDI 38H RBP 40H RSP 48H R8 50H ... ... R15 88H Figure 16-10. 64-bit PEBS Record Format Fields in the buffer management area of a DS save area are described in Section 16.4.9.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER The procedures used to program IA32_DEBUG_CTRL MSR to set up a BTS buffer or a CPL-qualified BTS are described in Section 16.4.9.3 and Section 16.4.9.4. Required elements for writing a DS interrupt service routine are largely the same on processors that support using DS Save area for BTS or PEBS records. However, on processors based on Intel NetBurst® microarchitecture, re-enabling counting requires writing to CCCRs.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER • It is recommended that the buffer size for the BTS buffer and the PEBS buffer be an integer multiple of the corresponding record sizes. • The precise event records buffer should be large enough to hold the number of precise event records that can occur while waiting for the interrupt to be serviced. • The DS save area should be in kernel space. It must not be on the same page as code, to avoid triggering self-modifying code actions.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 2. Set the TR and BTS flags in the IA32_DEBUGCTL for Intel Core Solo and Intel Core Duo processors or later processors (or MSR_DEBUGCTLA MSR for processors based on Intel NetBurst Microarchitecture; or MSR_DEBUGCTLB for Pentium M processors). 3. Clear the BTINT flag in the corresponding IA32_DEBUGCTL (or MSR_DEBUGCTLA MSR; or MSR_DEBUGCTLB) if a circular BTS buffer is desired. NOTES If the buffer size is set to less than the minimum allowable value (i.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER Table 16-5. CPL-Qualified Branch Trace Store Encodings (Contd.) TR BTS BTS_OFF_OS BTS_OFF_USR BTINT Description 1 1 1 0 1 Store BTMs with CPL > 0 in the BTS buffer; generate an interrupt when the buffer is nearly full 1 1 0 1 1 Store BTMs with CPL = 0 in the BTS buffer; generate an interrupt when the buffer is nearly full 16.4.9.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER • • The ISR must clear the mask bit in the performance counter LVT entry. • The Pentium 4 Processor and Intel Xeon Processor mask PMIs upon receiving an interrupt. Clear this condition before leaving the interrupt handler. The ISR must re-enable the counters to count via IA32_PERF_GLOBAL_CTRL/IA32_PERF_GLOBAL_OVF_CTRL if it is servicing an overflow PMI due to PEBS (or via CCCR's ENABLE bit on processor based on Intel NetBurst microarchitecture). 16.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.5.1 LBR Stack The last branch record stack and top-of-stack (TOS) pointer MSRs are supported across Intel Core 2, Intel Xeon and Intel Atom processor families.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER • Branch trace store and CPL-qualified BTS — See Section 16.4.6 and Section 16.4.5. • • • FREEZE_LBRS_ON_PMI flag (bit 11) — see Section 16.4.7. FREEZE_PERFMON_ON_PMI flag (bit 12) — see Section 16.4.7. FREEZE_WHILE_SMM_EN (bit 14) — FREEZE_WHILE_SMM_EN is supported if IA32_PERF_CAPABILITIES.FREEZE_WHILE_SMM[Bit 12] is reporting 1. See Section 16.4.1.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER Table 16-6. IA32_LASTBRACH_x_FROM_IP Bit Field Bit Offset Access Description Data 47:0 R/O The linear address of the branch instruction itself, This is the “branch from“ address SIGN_EXt 62:48 R/0 Signed extension of bit 47 of this register MISPRED 63 R/O When set, indicates the branch was predicted; otherwise, the branch was mispredicted.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER Table 16-9. MSR_LBR_SELECT (Contd.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER • IA32_MISC_ENABLE MSR — Indicates that the processor provides the BTS facilities. • Last branch record (LBR) stack — The LBR stack is a circular stack that consists of four MSRs (MSR_LASTBRANCH_0 through MSR_LASTBRANCH_3) for the Pentium 4 and Intel Xeon processor family [CPUID family 0FH, models 0H02H].
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 31 7 6 5 4 3 2 1 0 Reserved BTS_OFF_USR — Disable storing non-CPL_0 BTS BTS_OFF_OS — Disable storing CPL_0 BTS BTINT — Branch trace interrupt BTS — Branch trace store TR — Trace messages enable BTF — Single-step on branches LBR — Last branch/interrupt/exception Figure 16-12.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER LBR MSR pair) that contains the most recent (last) branch record placed on the stack. Prior to placing a new branch record on the stack, the TOS is incremented by 1. When the TOS pointer reaches it maximum value, it wraps around to 0. See Table 16-10 and Figure 16-12. Table 16-10.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER CPUID Family 0FH, Models 0H-02H MSR_LASTBRANCH_0 through MSR_LASTBRANCH_3 63 0 32 - 31 From Linear Address To Linear Address CPUID Family 0FH, Model 03H-04H MSR_LASTBRANCH_0_FROM_LIP through MSR_LASTBRANCH_15_FROM_LIP 0 32 - 31 63 Reserved From Linear Address MSR_LASTBRANCH_0_TO_LIP through MSR_LASTBRANCH_15_TO_LIP 63 0 32 - 31 Reserved To Linear Address Figure 16-13.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.8 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (INTEL® CORE™ SOLO AND INTEL® CORE™ DUO PROCESSORS) Intel Core Solo and Intel Core Duo processors provide last branch interrupt and exception recording. This capability is almost identical to that found in Pentium 4 and Intel Xeon processors. There are differences in the stack and in some MSR names and locations.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 31 8 7 6 5 4 3 2 1 0 Reserved BTINT — Branch trace interrupt BTS — Branch trace store TR — Trace messages enable Reserved BTF — Single-step on branches LBR — Last branch/interrupt/exception Figure 16-14.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7 0 32 - 31 63 To Linear Address From Linear Address Figure 16-15. LBR Branch Record Layout for the Intel Core Solo and Intel Core Duo Processor 16.9 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (PENTIUM M PROCESSORS) Like the Pentium 4 and Intel Xeon processor family, Pentium M processors provide last branch interrupt and exception recording.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER — TR (trace message enable) flag (bit 6) — When set, branch trace messages are enabled. When the processor detects a taken branch, interrupt, or exception, it sends the branch record out on the system bus as a branch trace message (BTM). See Section 16.4.4, “Branch Trace Messages,” for more information about the TR flag.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7 0 32 - 31 63 To Linear Address From Linear Address Figure 16-17. LBR Branch Record Layout for the Pentium M Processor For more detail on these capabilities, see Section 16.7.3, “Last Exception Records,” and Appendix B.7, “MSRs In the Pentium M Processor.” 16.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 31 7 6 5 4 3 2 1 0 Reserved P P P P B L T B B B B T B R 3 2 1 0 F R TR — Trace messages enable PBi — Performance monitoring/breakpoint pins BTF — Single-step on branches LBR — Last branch/interrupt/exception Figure 16-18. DEBUGCTLMSR Register (P6 Family Processors) • BTF (single-step on branches) flag (bit 1) — When set, the processor treats the TF flag in the EFLAGS register as a “single-step on branches” flag. See Section 16.4.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER tion or interrupt being generated. When an exception or interrupt occurs, the contents of the LastBranchToIP and LastBranchFromIP MSRs are copied into these registers before the to and from addresses of the exception or interrupt are recorded in the LastBranchToIP and LastBranchFromIP MSRs. These registers can be read using the RDMSR instruction.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.11 TIME-STAMP COUNTER The Intel 64 and IA-32 architectures (beginning with the Pentium processor) define a time-stamp counter mechanism that can be used to monitor and identify the relative time occurrence of processor events. The counter’s architecture includes the following components: • TSC flag — A feature bit that indicates the availability of the time-stamp counter. The counter is available in an if the function CPUID.1:EDX.TSC[bit 4] = 1.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER NOTE To determine average processor clock frequency, Intel recommends the use of EMON logic to count processor core clocks over the period of time for which the average is required. See Section 30.10, “Counting Clocks,” and Appendix A, “PerformanceMonitoring Events,” for more information.
DEBUGGING, PROFILING BRANCHES AND TIME-STAMP COUNTER 16.11.2 IA32_TSC_AUX Register and RDTSCP Support Processor based on Intel microarchitecture (Nehalem) provides an auxiliary TSC register, IA32_TSC_AUX that is designed to be used in conjunction with IA32_TSC. IA32_TSC_AUX provides a 32-bit field that is initialized by privileged software with a signature value (for example, a logical processor ID).
CHAPTER 17 8086 EMULATION IA-32 processors (beginning with the Intel386 processor) provide two ways to execute new or legacy programs that are assembled and/or compiled to run on an Intel 8086 processor: • • Real-address mode. Virtual-8086 mode. Figure 2-3 shows the relationship of these operating modes to protected mode and system management mode (SMM). When the processor is powered up or reset, it is placed in the real-address mode.
8086 EMULATION The following is a summary of the core features of the real-address mode execution environment as would be seen by a program written for the 8086: • The processor supports a nominal 1-MByte physical address space (see Section 17.1.1, “Address Translation in Real-Address Mode”, for specific details). This address space is divided into segments, each of which can be up to 64 KBytes in length.
8086 EMULATION • A single interrupt table, called the “interrupt vector table” or “interrupt table,” is provided for handling interrupts and exceptions (see Figure 17-2). The interrupt table (which has 4-byte entries) takes the place of the interrupt descriptor table (IDT, with 8-byte entries) used when handling protected-mode interrupts and exceptions. Interrupt and exception vector numbers provide an index to entries in the interrupt table.
8086 EMULATION in real-address mode, however, the processor does not truncate such an address and uses it as a physical address. (Note, however, that for IA-32 processors beginning with the Intel486 processor, the A20M# signal can be used in real-address mode to mask address line A20, thereby mimicking the 20-bit wrap-around behavior of the 8086 processor.) Care should be take to ensure that A20M# based address wrapping is handled correctly in multiprocessor based system.
8086 EMULATION • Move (MOV) instructions that move operands between general-purpose registers, segment registers, and between memory and general-purpose registers. • • • The exchange (XCHG) instruction. • • • Logical instructions AND, OR, XOR, and NOT. • • • • • • Type conversion instructions CWD, CDQ, CBW, and CWDE. • • • • • • I/O instructions IN, INS, OUT, and OUTS. Load segment register instructions LDS and LES.
8086 EMULATION • Bit test and bit scan instructions BT, BTS, BTR, BTC, BSF, and BSR; the byte-seton condition instruction SETcc; and the byte swap (BSWAP) instruction. • • • • • • Double shift instructions SHLD and SHRD. EFLAGS control instructions PUSHF and POPF. ENTER and LEAVE control instructions. BOUND instruction. CPU identification (CPUID) instruction. System instructions CLTS, INVD, WINVD, INVLPG, LGDT, SGDT, LIDT, SIDT, LMSW, SMSW, RDMSR, WRMSR, RDTSC, and RDPMC.
8086 EMULATION The interrupt vector table is an array of 4-byte entries (see Figure 17-2). Each entry consists of a far pointer to a handler procedure, made up of a segment selector and an offset. The processor scales the interrupt or exception vector by 4 to obtain an offset into the interrupt table. Following reset, the base of the interrupt vector table is located at physical address 0 and its limit is set to 3FFH.
8086 EMULATION 17.2 VIRTUAL-8086 MODE Virtual-8086 mode is actually a special type of a task that runs in protected mode. When the operating-system or executive switches to a virtual-8086-mode task, the processor emulates an Intel 8086 processor. The execution environment of the processor while in the 8086-emulation state is the same as is described in Section 17.1, “Real-Address Mode” for real-address mode, including the extensions.
8086 EMULATION Table 17-1. Real-Address Mode Exceptions and Interrupts (Contd.) Vector No. Description Real-Address Mode Virtual-8086 Mode Intel 8086 Processor 19-31 (Intel reserved. Do not use.) Reserved Reserved Reserved 32255 User Defined Interrupts Yes Yes Yes NOTE: * In the real-address mode, vector 13 is the segment overrun exception.
8086 EMULATION The processor enters virtual-8086 mode to run the 8086 program and returns to protected mode to run the virtual-8086 monitor. The virtual-8086 monitor is a 32-bit protected-mode code module that runs at a CPL of 0. The monitor consists of initialization, interrupt- and exception-handling, and I/O emulation procedures that emulate a personal computer or other 8086-based platform.
8086 EMULATION Paging is not necessary for a single virtual-8086-mode task, but paging is useful or necessary in the following situations: • When running multiple virtual-8086-mode tasks. Here, paging allows the lower 1 MByte of the linear address space for each virtual-8086-mode task to be mapped to a different physical address location. • When emulating the 8086 address-wraparound that occurs at 1 MByte.
8086 EMULATION When a task switch is used to enter virtual-8086 mode, the TSS for the virtual-8086mode task must be a 32-bit TSS. (If the new TSS is a 16-bit TSS, the upper word of the EFLAGS register is not in the TSS, causing the processor to clear the VM flag when it loads the EFLAGS register.) The processor updates the VM flag prior to loading the segment registers from their images in the new TSS.
8086 EMULATION Real Mode Code Real-Address Mode PE=0 or RESET PE=1 Protected Mode ProtectedMode Tasks Task Switch1 Task Switch VM=0 ProtectedMode Interrupt and Exception Handlers CALL Virtual-8086 Monitor RET VM = 0 VM = 1 Interrupt or Exception2 Virtual-8086 Mode RESET Virtual-8086 Mode Tasks (8086 Programs) #GP Exception3 IRET4 IRET5 Redirect Interrupt to 8086 Program Interrupt or Exception Handler6 NOTES: 1.
8086 EMULATION 17.2.6 Leaving Virtual-8086 Mode The processor can leave the virtual-8086 mode only through an interrupt or exception. The following are situations where an interrupt or exception will lead to the processor leaving virtual-8086 mode (see Figure 17-3): • The processor services a hardware interrupt generated to signal the suspension of execution of the virtual-8086 application. This hardware interrupt may be generated by a timer or other external mechanism.
8086 EMULATION execution sequence after verifying that it was entered as a result of a HLT execution. See Section 17.3, “Interrupt and Exception Handling in Virtual-8086 Mode”, for information on leaving virtual-8086 mode to handle an interrupt or exception generated in virtual-8086 mode. 17.2.7 Sensitive Instructions When an IA-32 processor is running in virtual-8086 mode, the CLI, STI, PUSHF, POPF, INT n, and IRET instructions are sensitive to IOPL.
8086 EMULATION for another task. This differs from protected mode in which, if the CPL is less than or equal to the IOPL, I/O access is allowed without checking the I/O permission bit map. See Chapter 13, “Input/Output”, in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information about the I/O permission bit map. 17.2.8.
8086 EMULATION In virtual-8086 mode, the interrupts and exceptions are divided into three classes for the purposes of handling: • Class 1 — All processor-generated exceptions and all hardware interrupts, including the NMI interrupt and the hardware interrupts sent to the processor’s external interrupt delivery pins. All class 1 exceptions and interrupts are handled by the protected-mode exception and interrupt handlers. • Class 2 — Special case for maskable hardware interrupts (Section 6.3.
8086 EMULATION in the previous paragraphs. These sections describe three possible types of interrupt and exception handlers: • Protected-mode interrupt and exceptions handlers — These are the standard handlers that the processor calls through the protected-mode IDT.
8086 EMULATION save and restore these registers regardless of the type segment selectors they contain (protected-mode or 8086-style). The interrupt and exception handlers, which may be called in the context of either a protected-mode task or a virtual8086-mode task, can use the same code sequences for saving and restoring the registers for any task. Clearing these registers before execution of the IRET instruction does not cause a trap in the interrupt handler.
8086 EMULATION Interrupt and exception handlers can examine the VM flag on the stack to determine if the interrupted procedure was running in virtual-8086 mode. If so, the interrupt or exception can be handled in one of three ways: • The protected-mode interrupt or exception handler that was called can handle the interrupt or exception. • The protected-mode interrupt or exception handler can call the virtual-8086 monitor to handle the interrupt or exception.
8086 EMULATION 2. Store the EFLAGS (low-order 16 bits only), CS and EIP values of the 8086 program on the privilege-level 3 stack. This is the stack that the virtual-8086mode task is using. (The 8086 handler may use or modify this information.) 3. Change the return link on the privilege-level 0 stack to point to the privilege-level 3 handler procedure. 4. Execute an IRET instruction to pass control to the 8086 program handler. 5.
8086 EMULATION executed must be 0, otherwise the processor does not change the state of the VM flag. 17.3.2 Class 2—Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism Maskable hardware interrupts are those interrupts that are delivered through the INTR# pin or through an interrupt request to the local APIC (see Section 6.3.2, “Maskable Hardware Interrupts”).
8086 EMULATION CLI instruction, the processor clears the VIF flag to request that the virtual-8086 monitor inhibit maskable hardware interrupts from interrupting program execution; when it executes the STI instruction, the processor sets the VIF flag requesting that the virtual-8086 monitor enable maskable hardware interrupts for the 8086 program. But actually the IF flag, managed by the operating system, always controls whether maskable hardware interrupts are enabled.
8086 EMULATION 5. Upon returning to virtual-8086 mode, the processor continues execution of the 8086 program. When the 8086 program is ready to receive maskable hardware interrupts, it executes the STI instruction to set the VIF flag (enabling maskable hardware interrupts).
8086 EMULATION tions in virtual-8086 mode in the same manner as an Intel386 or Intel486 processor does. When this flag is set, the virtual mode extension provides the following enhancements to virtual-8086 mode: • Speeds up the handling of software-generated interrupts in virtual-8086 mode by allowing the processor to bypass the virtual-8086 monitor and redirect software interrupts back to the interrupt handlers that are part of the currently running 8086 program.
8086 EMULATION Table 17-2. Software Interrupt Handling Methods While in Virtual-8086 Mode Method VME 1 0 Bit in Redir.
8086 EMULATION Last byte of bit 31 24 23 Task-State Segment (TSS) 0 1 1 1 1 1 1 1 1 map must be I/O Permission Bit Map Software Interrupt Redirection Bit Map (32 Bytes) I/O map base must not exceed DFFFH. I/O Map Base 64H 0 Figure 17-5. Software Interrupt Redirection Bit Map in TSS Redirecting software interrupts back to the 8086 program potentially speeds up interrupt handling because a switch back and forth between virtual-8086 mode and protected mode is not required.
8086 EMULATION rupt handler in the protected-mode IDT pointed to by the interrupt vector. See Section 17.3.1, “Class 1—Hardware Interrupt and Exception Handling in Virtual-8086 Mode”, for a complete description of this mechanism and its possible uses. 17.3.3.2 Methods 2 and 3: Software Interrupt Handling When a software interrupt occurs in virtual-8086 mode and the method 2 or 3 conditions are present, the processor generates a general-protection exception (#GP).
8086 EMULATION 3. Clears the IF flag in the EFLAGS register to disable interrupts. 4. Clears the TF flag, in the EFLAGS register. 5. Locates the 8086 program interrupt vector table at linear address 0 for the 8086mode task. 6. Loads the CS and EIP registers with values from the interrupt vector table entry pointed to by the interrupt vector number. Only the 16 low-order bits of the EIP are loaded and the 16 high-order bits are set to 0.
8086 EMULATION cient means of handling maskable hardware interrupts that occur during a virtual8086 mode task. Also, because the IOPL value is less than 3 and the VIF flag is enabled, the information pushed on the stack by the processor when invoking the interrupt handler is slightly different between methods 5 and 6 (see Table 17-2). 17.
8086 EMULATION It is only possible to enter virtual-8086 mode through a task switch or the execution of an IRET instruction, and it is only possible to leave virtual-8086 mode by faulting to a protected-mode interrupt handler (typically the general-protection exception handler, which in turn calls the virtual 8086-mode monitor). In both cases, the EFLAGS register is saved and restored. This is not true, however, in protected mode when the PVI flag is set and the processor is not in virtual-8086 mode.
8086 EMULATION 17-32 Vol.
CHAPTER 18 MIXING 16-BIT AND 32-BIT CODE Program modules written to run on IA-32 processors can be either 16-bit modules or 32-bit modules. Table 18-1 shows the characteristic of 16-bit and 32-bit modules. Table 18-1.
MIXING 16-BIT AND 32-BIT CODE 18.1 DEFINING 16-BIT AND 32-BIT PROGRAM MODULES The following IA-32 architecture mechanisms are used to distinguish between and support 16-bit and 32-bit segments and operations: • • • • • The D (default operand and address size) flag in code-segment descriptors. The B (default stack size) flag in stack-segment descriptors. 16-bit and 32-bit call gates, interrupt gates, and trap gates. Operand-size and address-size instruction prefixes.
MIXING 16-BIT AND 32-BIT CODE These prefixes reverse the default size selected by the D flag in the code-segment descriptor. For example, the processor can interpret the (MOV mem, reg) instruction in any of four ways: • In a 32-bit code segment: — Moves 32 bits from a 32-bit register to memory using a 32-bit effective address. — If preceded by an operand-size prefix, moves 16 bits from a 16-bit register to memory using a 32-bit effective address.
MIXING 16-BIT AND 32-BIT CODE 18.3 SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS Data segments can be accessed from both 16-bit and 32-bit code segments. When a data segment that is larger than 64 KBytes is to be shared among 16- and 32-bit code segments, the data that is to be accessed from the 16-bit code segments must be located within the first 64 KBytes of the data segment. The reason for this is that 16-bit pointers by definition can only point to the first 64 KBytes of a segment.
MIXING 16-BIT AND 32-BIT CODE Likewise, there are three ways for procedure in a 32-bit code segment to safely make a call to a 16-bit code segment: • Make the call through a 16-bit call gate. Here, the EIP value at the CALL instruction cannot exceed FFFFH. • Make a 32-bit call to a 16-bit interface procedure. The interface procedure then makes a 16-bit call to the intended destination. • Modify the 32-bit procedure, inserting an operand-size prefix before the call, changing it to a 16-bit call.
MIXING 16-BIT AND 32-BIT CODE instruction (see Figure 18-1). On a 16-bit call, the processor pushes the contents of the 16-bit IP register and (for calls between privilege levels) the 16-bit SP register. The matching RET instruction must also use a 16-bit operand size to pop these 16-bit values from the stack into the 16-bit registers. A 32-bit CALL instruction pushes the contents of the 32-bit EIP register and (for inter-privilege-level calls) the 32-bit ESP register.
MIXING 16-BIT AND 32-BIT CODE While executing 32-bit code, if a call is made to a 16-bit code segment which is at the same or a more privileged level (that is, the DPL of the called code segment is less than or equal to the CPL of the calling code segment) through a 16-bit call gate, then the upper 16-bits of the ESP register may be unreliable upon returning to the 32-bit code segment (that is, after executing a RET in the 16-bit code segment).
MIXING 16-BIT AND 32-BIT CODE segments can be modified to safely call procedures to 32-bit code segments in either of two ways: • Relink the CALL instruction to point to 32-bit call gates (see Section 18.4.2.2, “Passing Parameters With a Gate”). • Add a 32-bit operand-size prefix to each CALL instruction. 18.4.2.2 Passing Parameters With a Gate When referencing 32-bit gates with 16-bit procedures, it is important to consider the number of parameters passed in each procedure call.
MIXING 16-BIT AND 32-BIT CODE 18.4.5 Writing Interface Procedures Placing interface code between 32-bit and 16-bit procedures can be the solution to the following interface problems: • Allowing procedures in 16-bit code segments to call procedures with offsets greater than FFFFH in 32-bit code segments. • • Matching operand-size attributes between companion CALL and RET instructions. • The possible invalidation of the upper bits of the ESP register.
MIXING 16-BIT AND 32-BIT CODE 18-10 Vol.
CHAPTER 19 ARCHITECTURE COMPATIBILITY Intel 64 and IA-32 processors are binary compatible. Compatibility means that, within limited constraints, programs that execute on previous generations of processors will produce identical results when executed on later processors. The compatibility constraints and any implementation differences between the Intel 64 and IA-32 processors are described in this chapter.
ARCHITECTURE COMPATIBILITY • Pentium D Processors — A family of dual-core Intel 64 processors that provides two processor cores in a physical package. Each core is based on the Intel NetBurst microarchitecture. • Pentium Processor Extreme Editions — A family of dual-core Intel 64 processors that provides two processor cores in a physical package. Each core is based on the Intel NetBurst microarchitecture and supports Intel HyperThreading Technology.
ARCHITECTURE COMPATIBILITY original value results in a general-protection exception (#GP). So, programs that execute on the P6 family and Pentium processors cannot erroneously enable functions that may be implemented in future IA-32 processors. The P6 family and Pentium processors do not check for attempts to set reserved bits in model-specific registers; however these bits may be checked on more recent processors. It is the obligation of the software writer to enforce this discipline.
ARCHITECTURE COMPATIBILITY control and status register. These instructions and registers are designed to allow SIMD computations to be made on single-precision floating-point numbers. Several of these new instructions also operate in the MMX registers.
ARCHITECTURE COMPATIBILITY 19.10 INTEL HYPER-THREADING TECHNOLOGY Intel Hyper-Threading Technology provides two logical processors that can execute two separate code streams (called threads) concurrently by using shared resources in a single processor core or in a physical package. This feature was introduced in the Intel Xeon processor MP and later steppings of the Intel Xeon processor, and Pentium 4 processors supporting Intel Hyper-Threading Technology.
ARCHITECTURE COMPATIBILITY 19.13.1 Instructions Added Prior to the Pentium Processor The following instructions were added in the Intel486 processor: • • • • • • BSWAP (byte swap) instruction. XADD (exchange and add) instruction. CMPXCHG (compare and exchange) instruction. ΙNVD (invalidate cache) instruction. WBINVD (write-back and invalidate cache) instruction. INVLPG (invalidate TLB entry) instruction. Table 19-1.
ARCHITECTURE COMPATIBILITY • • • • • • • • • • Single-bit instructions. Bit scan instructions. Double-shift instructions. Byte set on condition instruction. Move with sign/zero extension. Generalized multiply instruction. MOV to and from control registers. MOV to and from test registers (now obsolete). MOV to and from debug registers. RSM (resume from SMM). This instruction was introduced in the Intel386 SL and Intel486 SL processors.
ARCHITECTURE COMPATIBILITY The following flags were added to the EFLAGS register in the Pentium processor: • • • VIF (virtual interrupt flag), bit 19. VIP (virtual interrupt pending), bit 20. ID (identification flag), bit 21. The AC flag (bit 18) was added to the EFLAGS register in the Intel486 processor. 19.16.
ARCHITECTURE COMPATIBILITY XCHG BP, [BP] This code functions as the 8086 processor PUSH SP instruction on the P6 family, Pentium, Intel486, Intel386, and Intel 286 processors. 19.17.2 EFLAGS Pushed on the Stack The setting of the stored values of bits 12 through 15 (which includes the IOPL field and the NT flag) in the EFLAGS register by the PUSHF instruction, by interrupts, and by exceptions is different with the 32-bit IA-32 processors than with the 8086 and Intel 286 processors.
ARCHITECTURE COMPATIBILITY math coprocessor (flag is clear) or an Intel 387 DX math coprocessor (flag is set). This bit is hardwired to 1 in the P6 family, Pentium, and Intel486 processors. The NE (Numeric Exception) flag (bit 5 of the CR0 register) is used in the P6 family, Pentium, and Intel486 processors to determine whether unmasked floating-point exceptions are reported internally through interrupt vector 16 (flag is set) or externally through an external interrupt (flag is clear).
ARCHITECTURE COMPATIBILITY On the 32-bit x87 FPUs, the C2 flag serves as an incomplete flag for the FTAN instruction. On the 16-bit IA-32 math coprocessors, the C2 flag is undefined for the FPTAN instruction. This difference has no impact on software, because Intel 287 or 8087 programs do not check C2 after an FPTAN instruction. The use of this flag on later processors allows fast checking of operand range. 19.18.2.
ARCHITECTURE COMPATIBILITY Software written to run on a 16-bit IA-32 math coprocessor may not operate correctly on a 16-bit x87 FPU, if it uses the FLDENV, FRSTOR, or FXRSTOR instructions to change tags to values (other than to empty) that are different from actual register contents. The encoding in the tag word for the 32-bit x87 FPUs for unsupported data formats (including pseudo-zero and unnormal) is special (10B), to comply with IEEE Standard 754.
ARCHITECTURE COMPATIBILITY ters. The only affect may be in how software handles the tags in the tag word (see also: Section 19.18.4, “x87 FPU Tag Word”). 19.18.6 Floating-Point Exceptions This section identifies the implementation differences in exception handling for floating-point instructions in the various x87 FPUs and math coprocessors. 19.18.6.
ARCHITECTURE COMPATIBILITY The difference is apparent only to the exception handler. This difference is for IEEE Standard 754 compatibility. 19.18.6.3 Numeric Underflow Exception (#U) When the underflow exception is masked on the 32-bit x87 FPUs, the underflow exception is signaled when both the result is tiny and denormalization results in a loss of accuracy.
ARCHITECTURE COMPATIBILITY the 8087 interrupt, both exception vectors should call the floating-point-error exception handler. Some instructions in a floating-point-error exception handler may need to be deleted if they use the interrupt controller. The P6 family, Pentium, and Intel486 processors have signals that, with the addition of external logic, support reporting for emulation of the interrupt mechanism used in many personal computers.
ARCHITECTURE COMPATIBILITY 19.18.6.9 Alignment Check Exceptions (#AC) If alignment checking is enabled, a misaligned data operand on the P6 family, Pentium, and Intel486 processors causes an alignment check exception (#AC) when a program or procedure is running at privilege-level 3, except for the stack portion of the FSAVE/FNSAVE, FXSAVE, FRSTOR, and FXRSTOR instructions. 19.18.6.
ARCHITECTURE COMPATIBILITY 19.18.7 Changes to Floating-Point Instructions This section identifies the differences in floating-point instructions for the various Intel FPU and math coprocessor architectures, the reason for the differences, and their impact on software. 19.18.7.1 FDIV, FPREM, and FSQRT Instructions The 32-bit x87 FPUs support operations on denormalized operands and, when detected, an underflow exception can occur, for compatibility with the IEEE Standard 754.
ARCHITECTURE COMPATIBILITY tions do not exist on the 16-bit IA-32 math coprocessors. The availability of these new instructions has no impact on existing software. 19.18.7.6 FPTAN Instruction On the 32-bit x87 FPUs, the range of the operand for the FPTAN instruction is much less restricted (| ST(0) | < 263) than on earlier math coprocessors. The instruction reduces the operand internally using an internal π/4 constant that is more accurate.
ARCHITECTURE COMPATIBILITY arithmetic. The 16-bit IA-32 math coprocessors do report a denormal-operand exception in this situation. This difference does not affect existing software. On the 32-bit x87 FPUs, loading a denormal value that is in single- or double-real format causes the value to be converted to extended-real format. Loading a denormal value on the 16-bit IA-32 math coprocessors causes the value to be converted to an unnormal.
ARCHITECTURE COMPATIBILITY FPUs handle all addressing and exception-pointer information, whether in protected mode or not. 19.18.7.15 FXAM Instruction With the 32-bit x87 FPUs, if the FPU encounters an empty register when executing the FXAM instruction, it not generate combinations of C0 through C3 equal to 1101 or 1111. The 16-bit IA-32 math coprocessors may generate these combinations, among others.
ARCHITECTURE COMPATIBILITY 19.18.10 WAIT/FWAIT Prefix Differences On the Intel486 processor, when a WAIT/FWAIT instruction precedes a floating-point instruction (one which itself automatically synchronizes with the previous floatingpoint instruction), the WAIT/FWAIT instruction is treated as a no-op. Pending floating-point exceptions from a previous floating-point instruction are processed not on the WAIT/FWAIT instruction but on the floating-point instruction following the WAIT/FWAIT instruction.
ARCHITECTURE COMPATIBILITY 19.20 FPU AND MATH COPROCESSOR INITIALIZATION Table 9-1 shows the states of the FPUs in the P6 family, Pentium, Intel486 processors and of the Intel 387 math coprocessor and Intel 287 coprocessor following a powerup, reset, or INIT, or following the execution of an FINIT/FNINIT instruction. The following is some additional compatibility information concerning the initialization of x87 FPUs and math coprocessors. 19.20.
ARCHITECTURE COMPATIBILITY Table 19-3. EM and MP Flag Interpretation EM MP Interpretation 0 0 Floating-point instructions are passed to FPU; WAIT/FWAIT and other waiting-type instructions ignore TS. 0 1 Floating-point instructions are passed to FPU; WAIT/FWAIT and other waiting-type instructions test TS. 1 0 Floating-point instructions trap to emulator; WAIT/FWAIT and other waiting-type instructions ignore TS.
ARCHITECTURE COMPATIBILITY 19.21 CONTROL REGISTERS The following sections identify the new control registers and control register flags and fields that were introduced to the 32-bit IA-32 in various processor families. See Figure 2-6 for the location of these flags and fields in the control registers. The Pentium III processor introduced one new control flag in control register CR4: • OSXMMEXCPT (bit 10) — The OS will set this bit if it supports unmasked SIMD floating-point exceptions.
ARCHITECTURE COMPATIBILITY • NE — Numeric error. Enables the normal mechanism for reporting floating-point numeric errors. • WP — Write protect. Write-protects read-only pages against supervisor-mode accesses. • AM — Alignment mask. Controls whether alignment checking is performed. Operates in conjunction with the AC (Alignment Check) flag. • NW — Not write-through.
ARCHITECTURE COMPATIBILITY 19.22.1.2 Global Pages The new PGE (page global enable) flag in control register CR4, bit 7, provides a mechanism for preventing frequently used pages from being flushed from the translation lookaside buffer (TLB). When this flag is set, frequently used pages (such as pages containing kernel procedures or common data tables) can be marked global by setting the global flag in a page-directory or page-table entry.
ARCHITECTURE COMPATIBILITY 19.22.4 Changes in Segment Descriptor Loads On the Intel386 processor, loading a segment descriptor always causes a locked read and write to set the accessed bit of the descriptor. On the P6 family, Pentium, and Intel486 processors, the locked read and write occur only if the bit is not already set. 19.23 DEBUG FACILITIES The P6 family and Pentium processors include extensions to the Intel486 processor debugging support for breakpoints.
ARCHITECTURE COMPATIBILITY are enabled (the DE flag is set), attempts to reference registers DR4 or DR5 will result in an invalid-opcode exception (#UD). 19.24 RECOGNITION OF BREAKPOINTS For the Pentium processor, it is recommended that debuggers execute the LGDT instruction before returning to the program being debugged to ensure that breakpoints are detected. This operation does not need to be performed on the P6 family, Intel486, or Intel386 processors.
ARCHITECTURE COMPATIBILITY may not be implemented or implemented differently in future processors. The MCE flag in control register CR4 enables the machine-check exception. When this bit is clear (which it is at reset), the processor inhibits generation of the machinecheck exception. • General-protection exception (#GP, interrupt 13) — New exception condition added. An attempt to write a 1 to a reserved bit position of a special register causes a general-protection exception to be generated.
ARCHITECTURE COMPATIBILITY 19.25.1 Machine-Check Architecture The Pentium Pro processor introduced a new architecture to the IA-32 for handling and reporting on machine-check exceptions. This machine-check architecture (described in detail in Chapter 15, “Machine-Check Architecture”) greatly expands the ability of the processor to report on internal hardware errors. 19.25.2 Priority OF Exceptions The priority of exceptions are broken down into several major categories: 1.
ARCHITECTURE COMPATIBILITY 19.26.3 IDT Limit The LIDT instruction can be used to set a limit on the size of the IDT. A double-fault exception (#DF) is generated if an interrupt or exception attempts to read a vector beyond the limit. Shutdown then occurs on the 32-bit IA-32 processors if the doublefault handler vector is beyond the limit. (The 8086 processor does not have a shutdown mode nor a limit.) 19.
ARCHITECTURE COMPATIBILITY • The remote read delivery mode provided in the 82489DX and local APIC for Pentium processors is not supported in the local APIC in the Pentium 4, Intel Xeon, and P6 family processors. • For the 82489DX, in the lowest priority delivery mode, all the target local APICs specified by the destination field participate in the lowest priority arbitration. For the local APIC, only those local APICs which have free interrupt slots will participate in the lowest priority arbitration.
ARCHITECTURE COMPATIBILITY 19.28.1 P6 Family and Pentium Processor TSS When the virtual mode extensions are enabled (by setting the VME flag in control register CR4), the TSS in the P6 family and Pentium processors contain an interrupt redirection bit map, which is used in virtual-8086 mode to redirect interrupts back to an 8086 program. 19.28.2 TSS Selector Writes During task state saves, the Intel486 processor writes 2-byte segment selectors into a 32-bit TSS, leaving the upper 16 bits undefined.
ARCHITECTURE COMPATIBILITY than 0DFFFH, the Intel486 processor will not wrap around and access incorrect locations within the TSS for I/O port validation and the P6 family and Pentium processors will not experience general-protection exceptions (#GP). Figure 19-1 demonstrates the different areas accessed by the Intel486 and the P6 family and Pentium processors.
ARCHITECTURE COMPATIBILITY data cache and L2 cache of the P6 family processors. In the Intel486 processor, setting these flags to (00B) enables write-through for the cache. External system hardware can force the Pentium processor to disable caching or to use the write-through cache policy should that be required. In the P6 family processors, the MTRRs can be used to override the CD and NW flags (see Table 11-6).
ARCHITECTURE COMPATIBILITY 19.29.2 Disabling the L3 Cache A unified third-level (L3) cache in processors based on Intel NetBurst microarchitecture (see Section 11.1, “Internal Caches, TLBs, and Buffers”) provides the third-level cache disable flag, bit 6 of the IA32_MISC_ENABLE MSR. The third-level cache disable flag allows the L3 cache to be disabled and enabled, independently of the L1 and L2 caches (see Section 11.5.4, “Disabling and Enabling the L3 Cache”).
ARCHITECTURE COMPATIBILITY 19.30.3 Enabling and Disabling Paging Paging is enabled and disabled by loading a value into control register CR0 that modifies the PG flag. For backward and forward compatibility with all IA-32 processors, Intel recommends that the following operations be performed when enabling or disabling paging: 1. Execute a MOV CR0, REG instruction to either set (enable paging) or clear (disable paging) the PG flag. 2. Execute a near JMP instruction.
ARCHITECTURE COMPATIBILITY • The initial stack pointer is FFFCH (32-bit operand) or FFFEH (16-bit operand) and will wrap around to 0H as a result of the POP operation. The result of the memory write is implementation-specific. For example, in P6 family processors, the result of the memory write is SS:0H plus any scaled index and displacement.
ARCHITECTURE COMPATIBILITY 19.32 MIXING 16- AND 32-BIT SEGMENTS The features of the 16-bit Intel 286 processor are an object-code compatible subset of those of the 32-bit IA-32 processors. The D (default operation size) flag in segment descriptors indicates whether the processor treats a code or data segment as a 16-bit or 32-bit segment; the B (default stack size) flag in segment descriptors indicates whether the processor treats a stack segment as a 16-bit or 32-bit segment.
ARCHITECTURE COMPATIBILITY 19.33.1 Segment Wraparound On the 8086 processor, an attempt to access a memory operand that crosses offset 65,535 or 0FFFFH or offset 0 (for example, moving a word to offset 65,535 or pushing a word when the stack pointer is set to 1) causes the offset to wrap around modulo 65,536 or 010000H. With the Intel 286 processor, any base and offset combination that addresses beyond 16 MBytes wraps around to the 1 MByte of the address space.
ARCHITECTURE COMPATIBILITY with the exception of “fast string” store operations (see Section 8.2.4, “Out-of-Order Stores For String Operations”). The Pentium processor has two store buffers, one corresponding to each of the pipelines. Writes in these buffers are always written to memory in the order they were generated by the processor core. It should be noted that only memory writes are buffered and I/O writes are not.
ARCHITECTURE COMPATIBILITY memory. If the access does split across a cache line, it locks the bus and accesses system memory. I/O reads are never reordered in front of buffered memory writes on an IA-32 processor. This ensures an update of all memory locations before reading the status from an I/O device. 19.35 BUS LOCKING The Intel 286 processor performs the bus locking differently than the Intel P6 family, Pentium, Intel486, and Intel386 processors.
ARCHITECTURE COMPATIBILITY sors. The following sections describe these model-specific extensions. The CPUID instruction indicates the availability of some of the model-specific features. 19.37.1 Model-Specific Registers The Pentium processor introduced a set of model-specific registers (MSRs) for use in controlling hardware functions and performance monitoring. To access these MSRs, two new instructions were added to the IA-32 architecture: read MSR (RDMSR) and write MSR (WRMSR).
ARCHITECTURE COMPATIBILITY Earlier IA-32 processors (such as the Intel486 and Pentium processors) used the KEN# (cache enable) pin and external logic to maintain an external memory map and signal cacheable accesses to the processor. The MTRR mechanism simplifies hardware designs by eliminating the KEN# pin and the external logic required to drive it. See Chapter 9, “Processor Management and Initialization,” and Appendix B, “ModelSpecific Registers (MSRs),” for more information on the MTRRs. 19.37.
ARCHITECTURE COMPATIBILITY The performance-monitoring counters are useful for debugging programs, optimizing code, diagnosing system failures, or refining hardware designs. See Chapter 30, “Performance Monitoring,” for more information on these counters. 19.
ARCHITECTURE COMPATIBILITY 19-46 Vol.