Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3A: System Programming Guide, Part 1 NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture, Order Number 253665; Instruction Set Reference A-M, Order Number 253666; Instruction Set Reference N-Z, Order Number 253667; System Programming Guide, Part 1, Order Number 253668; System Programming Guide, Part 2, Order Number 253669.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.
CONTENTS PAGE CHAPTER 1 ABOUT THIS MANUAL 1.1 PROCESSORS COVERED IN THIS MANUAL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-1 1.2 OVERVIEW OF THE SYSTEM PROGRAMMING GUIDE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-2 1.3 NOTATIONAL CONVENTIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1-5 1.3.1 Bit and Byte Order . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 2.6.7 2.6.7.1 Reading and Writing Model-Specific Registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-31 Reading and Writing Model-Specific Registers in 64-Bit Mode. . . . . . . . . . . . . . . . . . 2-31 CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT 3.1 MEMORY MANAGEMENT OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3-1 3.2 USING SEGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 4 PROTECTION 4.1 ENABLING AND DISABLING SEGMENT AND PAGE PROTECTION . . . . . . . . . . . . . . . . . . . . . . . 4-2 4.2 FIELDS AND FLAGS USED FOR SEGMENT-LEVEL AND PAGE-LEVEL PROTECTION . . . . . . 4-2 4.2.1 Code Segment Descriptor in 64-bit Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4-5 4.3 LIMIT CHECKING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 5 INTERRUPT AND EXCEPTION HANDLING 5.1 INTERRUPT AND EXCEPTION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-1 5.2 EXCEPTION AND INTERRUPT VECTORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.3 SOURCES OF INTERRUPTS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5-2 5.3.1 External Interrupts. . . . . . . . . . . . .
CONTENTS PAGE Interrupt 16—x87 FPU Floating-Point Error (#MF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-58 Interrupt 17—Alignment Check Exception (#AC). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-60 Interrupt 18—Machine-Check Exception (#MC) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .5-62 Interrupt 19—SIMD Floating-Point Exception (#XF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 7.5.4.1 7.5.4.2 7.5.5 7.6 7.7 7.7.1 7.7.2 7.7.3 7.7.4 7.8 7.8.1 7.8.2 7.8.3 7.8.4 7.8.5 7.8.6 7.8.7 7.8.8 7.8.9 7.8.10 7.8.11 7.8.12 7.8.13 7.8.13.1 7.8.13.2 7.8.13.3 7.8.13.4 7.9 7.9.1 7.9.2 7.9.3 7.9.4 7.9.5 7.10 7.10.1 7.10.2 7.10.3 7.10.4 7.11 7.11.1 7.11.2 7.11.3 7.11.4 7.11.5 7.11.6 7.11.6.1 7.11.6.2 7.11.6.3 7.11.6.4 Typical BSP Initialization Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7-19 Typical AP Initialization Sequence. . . . .
CONTENTS PAGE 7.11.6.5 7.11.6.6 7.11.6.7 Guidelines for Scheduling Threads on Logical Processors Sharing Execution Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-55 Eliminate Execution-Based Timing Loops. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7-55 Place Locks and Semaphores in Aligned, 128-Byte Blocks of Memory. . . . . . . . . . .
CONTENTS PAGE 8.11.1 8.11.2 Message Address Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-47 Message Data Register Format . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8-48 CHAPTER 9 PROCESSOR MANAGEMENT AND INITIALIZATION 9.1 INITIALIZATION OVERVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9-1 9.1.
CONTENTS PAGE 9.11.7.2 9.11.8 9.11.8.1 9.11.8.2 9.11.8.3 9.11.8.4 9.11.8.5 9.11.8.6 9.11.8.7 9.11.8.8 9.11.8.9 Authenticating the Update. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-49 Pentium 4, Intel Xeon, and P6 Family Processor Microcode Update Specifications9-50 Responsibilities of the BIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9-50 Responsibilities of the Calling Program . . . . . .
CONTENTS PAGE 10.11.7.1 MemTypeGet() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-39 10.11.7.2 MemTypeSet() Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-40 10.11.8 MTRR Considerations in MP Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10-43 10.11.9 Large Page Size Considerations . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 13.4.2 13.4.2.1 13.4.2.2 13.4.2.3 13.4.2.4 13.4.2.5 13.4.3 13.4.4 13.4.5 13.4.5.1 13.4.5.2 Thermal Monitor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-6 Thermal Monitor 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13-7 Thermal Monitor 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 15.1.2 Registers Supported in Real-Address Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 15.1.3 Instructions Supported in Real-Address Mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-4 15.1.4 Interrupt and Exception Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15-6 15.2 VIRTUAL-8086 MODE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 17.5 INTEL MMX TECHNOLOGY. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3 17.6 STREAMING SIMD EXTENSIONS (SSE) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-3 17.7 STREAMING SIMD EXTENSIONS 2 (SSE2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-4 17.8 STREAMING SIMD EXTENSIONS 3 (SSE3) . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 17.17.7.11 FLD Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-18 17.17.7.12 FXTRACT Instruction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-18 17.17.7.13 Load Constant Instructions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17-19 17.17.7.14 FSETPM Instruction . . . . . . . . . . . . .
CONTENTS PAGE 17.29.1 17.29.2 17.29.3 17.30 17.30.1 17.30.2 17.30.3 17.30.4 17.31 17.32 17.32.1 17.33 17.34 17.35 17.36 17.36.1 17.36.2 17.36.3 17.36.4 17.36.5 17.37 Large Pages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PCD and PWT Flags . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Enabling and Disabling Paging . . . . . . . . . . . . . . . . . .
CONTENTS PAGE ® 18.6.3.1 LBR Stack and Intel 64 Processors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-24 18.6.4 Monitoring Branches, Exceptions, and Interrupts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-24 18.6.5 Single-Stepping on Branches, Exceptions, and Interrupts . . . . . . . . . . . . . . . . . . . . . . . . 18-24 18.6.6 Branch Trace Messages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 18.15.6.7 EXTENDED CASCADING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-77 18.15.6.8 Generating an Interrupt on Overflow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-79 18.15.6.9 Counter Usage Guideline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18-79 18.15.7 At-Retirement Counting . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE CHAPTER 20 VIRTUAL-MACHINE CONTROL STRUCTURES 20.1 OVERVIEW. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-1 20.2 FORMAT OF THE VMCS REGION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-2 20.3 ORGANIZATION OF VMCS DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20-3 20.
CONTENTS PAGE CHAPTER 22 VM ENTRIES 22.1 BASIC VM-ENTRY CHECKS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-2 22.2 CHECKS ON VMX CONTROLS AND HOST-STATE AREA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22-3 22.2.1 Checks on VMX Controls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22-3 22.2.1.1 VM-Execution Control Fields . . . . . . . . . . . . . . . . .
CONTENTS PAGE 23.5 23.5.1 23.5.2 23.5.3 23.5.4 23.5.5 23.5.6 23.6 23.7 23.8 LOADING HOST STATE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23-18 Loading Host Control Registers, Debug Registers, MSRs . . . . . . . . . . . . . . . . . . . . . . . . . 23-19 Loading Host Segment and Descriptor-Table Registers. . . . . . . . . . . . . . . . . . . . . . . . . . 23-20 Loading Host RIP, RSP, and RFLAGS . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 24.16.4.1 24.16.4.2 24.16.4.3 24.16.4.4 24.16.4.5 24.16.4.6 24.16.4.7 24.16.4.8 24.16.5 24.16.6 24.16.6.1 24.16.6.2 24.16.6.3 24.16.6.4 24.16.6.5 24.16.7 Checks on the Executive-VMCS Pointer Field . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checks on VM-Execution Control Fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Checks on Guest Non-Register State . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE 25.10.4.4 Handling the SWAPGS Instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-20 25.10.4.5 Implementation Specific Behavior on Writing to Certain MSRs . . . . . . . . . . . . . . . . 25-20 25.10.5 Handling Accesses to Reserved MSR Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-21 25.11 HANDLING ACCESSES TO CONTROL REGISTERS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25-21 25.
CONTENTS PAGE APPENDIX A PERFORMANCE-MONITORING EVENTS A.1 ARCHITECTURAL PERFORMANCE-MONITORING EVENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 A.2 PERFORMANCE MONITORING EVENTS FOR INTEL® XEON® PROCESSOR 5100 SERIES AND INTEL® CORE™ 2 DUO PROCESSORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-2 A.3 PERFORMANCE MONITORING EVENTS FOR INTEL® CORE™ SOLO AND INTEL® CORE™ DUO PROCESSORS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE APPENDIX G VMX CAPABILITY REPORTING FACILITY G.1 BASIC VMX INFORMATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G.2 VM-EXECUTION CONTROLS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G.3 VM-EXIT CONTROLS. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . G.
CONTENTS PAGE Figure 3-6. Figure 3-7. Figure 3-8. Figure 3-9. Figure 3-10. Figure 3-11. Figure 3-12. Figure 3-13. Figure 3-14. Figure 3-15. Figure 3-16. Figure 3-17. Figure 3-18. Figure 3-19. Figure 3-20. Figure 3-21. Figure 3-22. Figure 3-23. Figure 3-24. Figure 3-25. Figure 3-26. Figure 3-27. Figure 3-28. Figure 4-1. Figure 4-2. Figure 4-3. Figure 4-4. Figure 4-5. Figure 4-6. Figure 4-7. Figure 4-8. Figure 4-9. Figure 4-10. Figure 4-11. Figure 4-12. Figure 4-13. Figure 4-14. Figure 4-15. Figure 5-1.
CONTENTS PAGE Figure 5-8. Figure 5-9. Figure 6-1. Figure 6-2. Figure 6-3. Figure 6-4. Figure 6-5. Figure 6-6. Figure 6-7. Figure 6-8. Figure 6-9. Figure 6-10. Figure 6-11. Figure 7-1. Figure 7-2. Figure 7-3. Figure 7-4. Figure 7-5. Figure 7-6. Figure 8-1. Figure 8-2. Figure 8-3. Figure 8-4. Figure 8-5. Figure 8-6. Figure 8-7. Figure 8-8. Figure 8-9. Figure 8-10. Figure 8-11. Figure 8-12. Figure 8-13. Figure 8-14. Figure 8-15. Figure 8-16. Figure 8-17. Figure 8-18. Figure 8-19. Figure 8-20. Figure 8-21.
CONTENTS PAGE Figure 9-4. Figure 9-5. Figure 9-6. Figure 9-7. Figure 9-8. Figure 9-9. Figure 10-1. Figure 10-2. Figure 10-3. Figure 10-4. Figure 10-5. Figure 10-6. Figure 10-7. Figure 11-1. Figure 11-2. Figure 12-1. Figure 13-1. Figure 13-2. Figure 13-3. Figure 13-4. Figure 13-5. Figure 13-6. Figure 13-7. Figure 13-8. Figure 13-9. Figure 14-1. Figure 14-2. Figure 14-3. Figure 14-4. Figure 14-5. Figure 14-6. Figure 15-1. Figure 15-2. Figure 15-3. Figure 15-4. Figure 15-5. Figure 16-1. Figure 17-1.
CONTENTS PAGE Figure 18-7. Figure 18-8. Figure 18-9. Figure 18-10. Figure 18-11. Figure 18-12. Figure 18-13. Figure 18-14. Figure 18-15. Figure 18-16. Figure 18-17. Figure 18-18. Figure 18-19. Figure 18-20. Figure 18-21. Figure 18-22. Figure 18-23. Figure 18-24. Figure 18-25. Figure 18-26. Figure 18-27. Figure 18-28. Figure 18-29. Figure 18-30. Figure 18-31. Figure 18-32. Figure 18-33. Figure 18-34. Figure 18-35. Figure 18-36. Figure 19-1. Figure 19-2. Figure 24-1. Figure 24-2. Figure 24-3. Figure 24-4.
CONTENTS PAGE TABLES Table 2-1. Table 2-2. Table 3-1. Table 3-2. Table 3-3. Table 3-4. Table 3-5. Table 4-1. Table 4-2. Table 4-3. Table 4-4. Table 4-5. Table 4-6. Table 4-7. Table 4-8. Table 4-9. Table 4-10. Table 5-1. Table 5-2. Table 5-3. Table 5-4. Table 5-5. Table 5-6. Table 5-7. Table 5-8. Table 6-1. Table 6-2. Table 7-1. Table 7-2. Table 8-1. Table 8-2. Table 8-3. Table 8-4. Table 9-1. Table 9-2. Table 9-3. Table 9-4. Table 9-5.
CONTENTS PAGE Table 9-6. Table 9-7. Table 9-8. Table 9-9. Table 9-10. Table 9-11. Table 9-12. Table 9-13. Table 9-14. Table 9-15. Table 9-16. Table 9-17. Table 9-18. Table 10-1. Table 10-2. Table 10-3. Table 10-4. Table 10-5. Table 10-6. Table 10-7. Table 10-8. Table 10-9. Table 10-11. Table 10-10. Table 10-12. Table 11-1. Table 11-2. Table 11-3. Table 12-1. Table 12-2. Table 13-1. Table 14-1. Table 14-2. Table 14-3. Table 14-4. Table 14-5. Table 14-6. Table 14-7. Table 14-8. Table 14-9. Table 14-10.
CONTENTS PAGE Table 15-2. Table 16-1. Table 17-1. Table 17-2. Table 17-3. Table 18-1. Table 18-2. Table 18-3. Table 18-4. Table 18-5. Table 18-6. Table 18-7. Table 18-8. Table 18-9. Table 18-10. Table 18-12. Table 18-11. Table 18-13. Table 18-14. Table 18-15. Table 18-16. Table 18-17. Table 18-18. Table 18-19. Table 18-20. Table 18-21. Table 20-1. Table 20-2. Table 20-3. Table 20-4. Table 20-5. Table 20-6. Table 20-7. Table 20-8. Table 20-9. Table 20-10. Table 20-11. Table 20-12. Table 20-13. Table 20-14.
CONTENTS PAGE Table 23-5. Table 24-1. Table 24-2. Table 24-3. Table 24-4. Table 24-5. Table 24-6. Table 24-7. Table 24-1. Table 24-1. Table 25-1. Table A-1. Table A-2. Table A-3. Table A-4. Table A-5. Table A-6. Table A-7. Table A-9. Table A-8. Table A-10. Table A-11. Table A-12. Table A-13. Table A-14. Table A-15. Table B-1. Table B-2. Table B-3. Table B-4. Table B-5. Table B-6. Table B-7. Table B-8. Table C-1. Table E-1. Table E-2. Table E-3. Table F-1. Table F-2. xxxiv Vol.
CONTENTS PAGE Table F-3. Table F-4. Table G-1. Table H-1. Table H-2. Table H-3. Table H-4. Table H-5. Table H-6. Table H-7. Table H-8. Table H-9. Table H-10. Table H-11. Table H-12. Table I-1. Table J-1. Non-Focused Lowest Priority Message (34 Cycles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-3 APIC Bus Status Cycles Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . F-5 Memory Types Used For VMCS Access. . . . . . . . . . . . . . . . . . . .
CONTENTS PAGE xxxvi Vol.
CHAPTER 1 ABOUT THIS MANUAL The Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A: System Programming Guide, Part 1 (order number 253668) and the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2 (order number 253669) are part of a set that describes the architecture and programming environment of all Intel 64 and IA-32 Architecture processors.
ABOUT THIS MANUAL • • • Dual-Core Intel® Xeon® processor LV Intel® CoreTM2 Duo processor Intel® Xeon® processor 5100 series P6 family processors are IA-32 processors based on the P6 family microarchitecture. This includes the Pentium® Pro, Pentium® II, Pentium® III, and Pentium® III Xeon® processors. The Pentium® 4, Pentium® D, and Pentium® processor Extreme Editions are based on the Intel NetBurst® microarchitecture. Most early Intel® Xeon® processors are based on the Intel NetBurst® microarchitecture.
ABOUT THIS MANUAL Chapter 4 — Protection. Describes the support for page and segment protection provided in the Intel 64 and IA-32 architectures. This chapter also explains the implementation of privilege rules, stack switching, pointer validation, user and supervisor modes. Chapter 5 — Interrupt and Exception Handling.
ABOUT THIS MANUAL Chapter 16 — Mixing 16-Bit and 32-Bit Code. Describes how to mix 16-bit and 32-bit code modules within the same program or task. Chapter 17 — IA-32 Architecture Compatibility. Describes architectural compatibility among IA-32 processors. Chapter 18 — Debugging and Performance Monitoring. Describes the debugging registers and other debug mechanism provided in Intel 64 or IA-32 processors. This chapter also describes the time-stamp counter and the performance-monitoring counters.
ABOUT THIS MANUAL Solo, Intel Core Duo processors, and Intel Core 2 processor family and describes their functions. Appendix C — MP Initialization For P6 Family Processors. Gives an example of how to use of the MP protocol to boot P6 family processors in n MP system. Appendix D — Programming the LINT0 and LINT1 Inputs. Gives an example of how to program the LINT0 and LINT1 pins for specific interrupt vectors. Appendix E — Interpreting Machine-Check Error Codes.
ABOUT THIS MANUAL 1.3.2 Reserved Bits and Software Compatibility In many register and memory layout descriptions, certain bits are marked as reserved. When bits are marked as reserved, it is essential for compatibility with future processors that software treat these bits as having a future, though unknown, effect. The behavior of reserved bits should be regarded as not only undefined, but unpredictable.
ABOUT THIS MANUAL 1.3.3 Instruction Operands When instructions are represented symbolically, a subset of assembly language is used. In this subset, an instruction has the following format: label: mnemonic argument1, argument2, argument3 where: • • A label is an identifier which is followed by a colon. • The operands argument1, argument2, and argument3 are optional. There may be from zero to three operands, depending on the opcode.
ABOUT THIS MANUAL segments. Code addresses would always refer to the code space, and stack addresses would always refer to the stack space. The following notation is used to specify a byte address within a segment: Segment-register:Byte-address For example, the following segment address identifies the byte at address FF79H in the segment pointed by the DS register: DS:FF79H The following segment address identifies an instruction address in the code segment.
ABOUT THIS MANUAL 6\QWD[ 5HSUHVHQWDWLRQ IRU &38,' ,QSXW DQG 2XWSXW &38,' + (&; 66( >ELW @ ,QSXW YDOXH IRU ($; GHILQHV RXWSXW 127( 6RPH OHDYHV UHTXLUH LQSXW YDOXHV IRU ($; DQG (&; ,I RQO\ RQH YDOXH LV SUHVHQW ($; LV LPSOLHG 2XWSXW UHJLVWHU DQG IHDWXUH IODJ RU ILHOG QDPH ZLWK ELW SRVLWLRQ V 9DOXH RU UDQJH RI RXWSXW )RU &RQWURO 5HJLVWHU 9DOXHV &5 26);65>ELW @ ([DPSOH &5 QDPH )HDWXUH IODJ RU ILHOG QDPH ZLWK ELW SRVLWLRQ V 9DOXH RU UDQJH RI RXWSXW )RU 0RGHO 6SHFLILF 5HJLVWHU 9DOXHV
ABOUT THIS MANUAL This example refers to a page-fault exception under conditions where an error code naming a type of fault is reported. Under some conditions, exceptions which produce error codes may not be able to report an accurate code. In this case, the error code is zero, as shown below for a general-protection exception: #GP(0) 1.4 RELATED LITERATURE Literature related to Intel 64 and IA-32 processors is listed on-line at this link: http://developer.intel.com/products/processor/index.
CHAPTER 2 SYSTEM ARCHITECTURE OVERVIEW IA-32 architecture (beginning with the Intel386 processor family) provides extensive support for operating-system and system-development software. This support offers multiple modes of operation, which include: • Real mode, protected mode, virtual 8086 mode, and system management mode. These are sometimes referred to as legacy modes.
SYSTEM ARCHITECTURE OVERVIEW initiates the switch from real-address mode to protected mode. If IA-32e mode operation is desired, software also initiates a switch from protected mode to IA-32e mode. 2.1 OVERVIEW OF THE SYSTEM-LEVEL ARCHITECTURE System-level architecture consists of a set of registers, data structures, and instructions designed to support basic system-level operations such as memory management, interrupt and exception handling, task management, and control of multiple processors.
SYSTEM ARCHITECTURE OVERVIEW Physical Address EFLAGS Register Control Registers CR4 CR3 CR2 CR1 CR0 Task Register Interrupt Vector Code, Data or Stack Segment Linear Address Task-State Segment (TSS) Segment Selector Register Global Descriptor Table (GDT) Segment Sel. Seg. Desc. TSS Seg. Sel. TSS Desc. Interrupt Handler Code Current Stack TSS Seg. Desc. Interrupt Descriptor Table (IDT) Task-State Segment (TSS) TSS Desc. Interrupt Gate Task Code Data Stack LDT Desc.
SYSTEM ARCHITECTURE OVERVIEW RFLAGS Physical Address Control Register CR8 CR4 CR3 CR2 CR1 CR0 Task Register Interrupt Vector Code, Data or Stack Segment (Base =0) Linear Address Task-State Segment (TSS) Segment Selector Register Global Descriptor Table (GDT) Segment Sel. Seg. Desc. TR TSS Desc. NULL Seg. Desc. Interrupt Descriptor Table (IDT) Interr. Handler Seg. Desc. Interrupt Gate LDT Desc. GDTR Trap Gate IST Local Descriptor Table (LDT) NULL Call-Gate Segment Selector Seg. Desc.
SYSTEM ARCHITECTURE OVERVIEW 2.1.1 Global and Local Descriptor Tables When operating in protected mode, all memory accesses pass through either the global descriptor table (GDT) or an optional local descriptor table (LDT) as shown in Figure 2-1. These tables contain entries called segment descriptors. Segment descriptors provide the base address of segments well as access rights, type, and usage information. Each segment descriptor has an associated segment selector.
SYSTEM ARCHITECTURE OVERVIEW The architecture also defines a set of special descriptors called gates (call gates, interrupt gates, trap gates, and task gates). These provide protected gateways to system procedures and handlers that may operate at a different privilege level than application programs and most procedures.
SYSTEM ARCHITECTURE OVERVIEW 3. Accesses the new TSS through a segment descriptor in the GDT. 4. Loads the state of the new task from the new TSS into the general-purpose registers, the segment registers, the LDTR, control register CR3 (page-table base address), the EFLAGS register, and the EIP register. 5. Begins execution of the new task. A task can also be accessed through a task gate.
SYSTEM ARCHITECTURE OVERVIEW 2.1.5 Memory Management System architecture supports either direct physical addressing of memory or virtual memory (through paging). When physical addressing is used, a linear address is treated as a physical address. When paging is used: all code, data, stack, and system segments (including the GDT and IDT) can be paged with only the most recently accessed pages being held in physical memory.
SYSTEM ARCHITECTURE OVERVIEW 2.1.6 System Registers To assist in initializing the processor and controlling system operations, the system architecture provides system flags in the EFLAGS register and several system registers: • The system flags and IOPL field in the EFLAGS register control task and mode switching, interrupt handling, instruction tracing, and access rights. See also: Section 2.3, “System Flags and Fields in the EFLAGS Register.
SYSTEM ARCHITECTURE OVERVIEW On systems that support IA-32e mode, the extended feature enable register (IA32_EFER) is available. This model-specific register controls activation of IA-32e mode and other IA-32e mode operations. In addition, there are several modelspecific registers that govern IA-32e mode instructions: • • • • IA32_KernelGSbase — Used by SWAPGS instruction. IA32_LSTAR — Used by SYSCALL instruction. IA32_SYSCALL_FLAG_MASK — Used by SYSCALL instruction.
SYSTEM ARCHITECTURE OVERVIEW running program or task. SMM-specific code may then be executed transparently. Upon returning from SMM, the processor is placed back into its state prior to the SMI. • Virtual-8086 mode — In protected mode, the processor supports a quasioperating mode known as virtual-8086 mode. This mode allows the processor execute 8086 software in a protected, multitasking environment.
SYSTEM ARCHITECTURE OVERVIEW virtual-8086 mode are generally carried out as part of a task switch or a return from an interrupt or exception handler. See also: Section 15.2.5, “Entering Virtual-8086 Mode.” The LMA bit (IA32_EFER.LMA.LMA[bit 10]) determines whether the processor is operating in IA-32e mode. When running in IA-32e mode, 64-bit or compatibility sub-mode operation is determined by CS.L bit of the code segment.
SYSTEM ARCHITECTURE OVERVIEW 31 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved (set to 0) V V I I I A V R 0 N T C M F D P F I O P L O D I T S Z P C A F F F F F F 0 F 0 F 1 F ID — Identification Flag VIP — Virtual Interrupt Pending VIF — Virtual Interrupt Flag AC — Alignment Check VM — Virtual-8086 Mode RF — Resume Flag NT — Nested Task Flag IOPL— I/O Privilege Level IF — Interrupt Enable Flag TF — Trap Flag Reserved Figure 2-4.
SYSTEM ARCHITECTURE OVERVIEW explicitly set or cleared with the POPF/POPFD instructions; however, changing to the state of this flag can generate unexpected exceptions in application programs. See also: Section 6.4, “Task Linking.” RF Resume (bit 16) — Controls the processor’s response to instruction-breakpoint conditions.
SYSTEM ARCHITECTURE OVERVIEW VIP Virtual interrupt pending (bit 20) — Set by software to indicate that an interrupt is pending; cleared to indicate that no interrupt is pending. This flag is used in conjunction with the VIF flag. The processor reads this flag but never modifies it. The processor only recognizes the VIP flag when either the VME flag or the PVI flag in control register CR4 is set and the IOPL is less than 3.
SYSTEM ARCHITECTURE OVERVIEW 47(79) System Table Registers 16 15 0 GDTR 32(64)-bit Linear Base Address 16-Bit Table Limit IDTR 32(64)-bit Linear Base Address 16-Bit Table Limit System Segment Registers 15 0 Task Register LDTR Segment Descriptor Registers (Automatically Loaded) Attributes Seg. Sel. 32(64)-bit Linear Base Address Segment Limit Seg. Sel. 32(64)-bit Linear Base Address Segment Limit Figure 2-5. Memory Management Registers 2.4.
SYSTEM ARCHITECTURE OVERVIEW 2.4.3 IDTR Interrupt Descriptor Table Register The IDTR register holds the base address (32 bits in protected mode; 64 bits in IA-32e mode) and 16-bit table limit for the IDT. The base address specifies the linear address of byte 0 of the IDT; the table limit specifies the number of bytes in the table. The LIDT and SIDT instructions load and store the IDTR register, respectively.
SYSTEM ARCHITECTURE OVERVIEW • The MOV CRn instructions do not check that addresses written to CR2 and CR3 are within the linear-address or physical-address limitations of the implementation. • Register CR8 is available in 64-bit mode only. The control registers are summarized below, and each architecturally defined control field in these control registers are described individually. In Figure 2-6, the width of the register in 64-bit mode is indicated in parenthesis (except for CR0).
SYSTEM ARCHITECTURE OVERVIEW 31(63) 13 12 11 10 9 8 7 6 5 4 3 2 1 0 V M 0 X E Reserved (set to 0) 0 T P V P P M P P C G C A S D S V M E D I E E E E E E CR4 OSXMMEXCPT OSFXSR 31(63) 12 11 5 4 3 2 P P C W D T Page-Directory Base 31(63) CR3 (PDBR) 0 Page-Fault Linear Address CR2 31(63) 0 CR1 31 30 29 28 P C N G D W 19 18 17 16 15 A M 6 5 4 3 2 1 0 W P N E T E M P E T S M P E CR0 Reserved Figure 2-6.
SYSTEM ARCHITECTURE OVERVIEW See also: Section 10.5.3, “Preventing Caching,” and Section 10.5, “Cache Control.” NW Not Write-through (bit 29 of CR0) — When the NW and CD flags are clear, write-back (for Pentium 4, Intel Xeon, P6 family, and Pentium processors) or write-through (for Intel486 processors) is enabled for writes that hit the cache and invalidation cycles are enabled. See Table 10-5 for detailed information about the affect of the NW flag on caching for other settings of the CD and NW flags.
SYSTEM ARCHITECTURE OVERVIEW • If the TS flag is set and the EM flag (bit 2 of CR0) is clear, a device-notavailable exception (#NM) is raised prior to the execution of any x87 FPU/MMX/SSE/ SSE2/SSE3 instruction; with the exception of PAUSE, PREFETCHh, SFENCE, LFENCE, MFENCE, MOVNTI, and CLFLUSH. See the paragraph below for the special case of the WAIT/FWAIT instructions.
SYSTEM ARCHITECTURE OVERVIEW clear. This flag also affects the execution of MMX/SSE/SSE2/SSE3 instructions. When the EM flag is set, execution of an x87 FPU instruction generates a device-not-available exception (#NM). This flag must be set when the processor does not have an internal x87 FPU or is not connected to an external math coprocessor. Setting this flag forces all floating-point instructions to be handled by software emulation.
SYSTEM ARCHITECTURE OVERVIEW PWT Page-level Writes Transparent (bit 3 of CR3) — Controls the writethrough or write-back caching policy of the current page directory. When the PWT flag is set, write-through caching is enabled; when the flag is clear, write-back caching is enabled. This flag affects only internal caches (both L1 and L2, when present). The processor ignores this flag if paging is not used (the PG flag in register CR0 is clear) or the CD (cache disable) flag in CR0 is set.
SYSTEM ARCHITECTURE OVERVIEW PAE Physical Address Extension (bit 5 of CR4) — When set, enables paging mechanism to reference greater-or-equal-than-36-bit physical addresses. When clear, restricts physical addresses to 32 bits. PAE must be enabled to enable IA-32e mode operation. Enabling and disabling IA-32e mode operation also requires modifying CR4.PAE. See also: Section 3.8, “36-Bit Physical Addressing Using the PAE Paging Mechanism.
SYSTEM ARCHITECTURE OVERVIEW NOTE CPUID feature flags FXSR, SSE, SSE2, and SSE3 indicate availability of the FXSAVE/FXRESTOR instructions, SSE extensions, SSE2 extensions, and SSE3 extensions respectively. The OSFXSR bit provides operating system software with a means of enabling these features and indicating that the operating system supports the features.
SYSTEM ARCHITECTURE OVERVIEW Table 2-2 lists the system instructions and indicates whether they are available and useful for application programs. These instructions are described in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volumes 2A & 2B. Table 2-2.
SYSTEM ARCHITECTURE OVERVIEW Table 2-2. Summary of System Instructions (Contd.) Useful to Application? Protected from Application? Instruction Description RDPMC4 Read Performance-Monitoring Counter Yes Yes2 RDTSC3 Read Time-Stamp Counter Yes Yes2 NOTES: 1. Useful to application programs running at a CPL of 1 or 2. 2. The TSD and PCE flags in control register CR4 control access to these instructions by application programs running at a CPL of 3. 3.
SYSTEM ARCHITECTURE OVERVIEW to run on 32-bit IA-32 processors should not use these instructions. Instead, they should access the control register CR0 using the MOV instruction. The CLTS (clear TS flag in CR0) instruction is provided for use in handling a device-not-available exception (#NM) that occurs when the processor attempts to execute a floating-point instruction when the TS flag is set.
SYSTEM ARCHITECTURE OVERVIEW Instructions),” for a detailed explanation of the function and use of this instruction. 2.6.3 Loading and Storing Debug Registers Internal debugging facilities in the processor are controlled by a set of 8 debug registers (DR0-DR7). The MOV instruction allows setup data to be loaded to and stored from these registers. On processors that support Intel 64 architecture, debug registers DR0-DR7 are 64 bits.
SYSTEM ARCHITECTURE OVERVIEW introduced with the Pentium Pro processor). If any non-wake events are pending during shutdown, they will be handled after the wake event from shutdown is processed (for example, A20M# interrupts). The LOCK prefix invokes a locked (atomic) read-modify-write operation when modifying a memory operand.
SYSTEM ARCHITECTURE OVERVIEW See Section 18.11, “Performance Monitoring Overview,” and Section 18.10, “TimeStamp Counter,” for more information about the performance monitoring and timestamp counters. The RDTSC instruction was introduced into the IA-32 architecture with the Pentium processor. The RDPMC instruction was introduced into the IA-32 architecture with the Pentium Pro processor and the Pentium processor with MMX technology.
SYSTEM ARCHITECTURE OVERVIEW 2-32 Vol.
CHAPTER 3 PROTECTED-MODE MEMORY MANAGEMENT This chapter describes the Intel 64 and IA-32 architecture’s protected-mode memory management facilities, including the physical memory requirements, segmentation mechanism, and paging mechanism. See also: Chapter 4, “Protection” (for a description of the processor’s protection mechanism) and Chapter 15, “8086 Emulation” (for a description of memory addressing protection in real-address and virtual-8086 modes). 3.
PROTECTED-MODE MEMORY MANAGEMENT segment, the segment type, and the location of the first byte of the segment in the linear address space (called the base address of the segment). The offset part of the logical address is added to the base address for the segment to locate a byte within the segment. The base address plus the offset thus forms a linear address in the processor’s linear address space.
PROTECTED-MODE MEMORY MANAGEMENT storage. When using paging, each segment is divided into pages (typically 4 KBytes each in size), which are stored either in physical memory or on the disk. The operating system or executive maintains a page directory and a set of page tables to keep track of the pages.
PROTECTED-MODE MEMORY MANAGEMENT FFFF_FFF0H. RAM (DRAM) is placed at the bottom of the address space because the initial base address for the DS data segment after reset initialization is 0. 3.2.2 Protected Flat Model The protected flat model is similar to the basic flat model, except the segment limits are set to include only the range of addresses for which physical memory actually exists (see Figure 3-3).
PROTECTED-MODE MEMORY MANAGEMENT More complexity can be added to this protected flat model to provide more protection. For example, for the paging mechanism to provide isolation between user and supervisor code and data, four segments need to be defined: code and data segments at privilege level 3 for the user, and code and data segments at privilege level 0 for the supervisor. Usually these segments all overlay each other and start at address 0 in the linear address space.
PROTECTED-MODE MEMORY MANAGEMENT Segment Registers Segment Descriptors Linear Address Space (or Physical Memory) CS Access Limit Base Address Stack SS Access Limit Base Address DS Access Limit Base Address ES Access Limit Base Address FS Access Limit Base Address GS Access Limit Base Address Access Limit Base Address Code Data Data Data Access Limit Base Address Access Limit Base Address Data Access Limit Base Address Figure 3-4.
PROTECTED-MODE MEMORY MANAGEMENT In 64-bit mode, segmentation is generally (but not completely) disabled, creating a flat 64-bit linear-address space. The processor treats the segment base of CS, DS, ES, SS as zero, creating a linear address that is equal to the effective address. The FS and GS segments are exceptions. These segment registers (which hold the segment base) can be used as an additional base registers in linear address calculations.
PROTECTED-MODE MEMORY MANAGEMENT 3.3.1 Intel® 64 Processors and Physical Address Space On processors that support Intel 64 architecture (CPUID.80000001:EDX[29] = 1), the size of the physical address range is implementation-specific and indicated by CPUID.80000008H:EAX[bits 7-0]. For the format of information returned in EAX, see “CPUID—CPU Identification” in Chapter 3 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A. See also: Section 3.8.1, “Enhanced Legacy PAE Paging.
PROTECTED-MODE MEMORY MANAGEMENT Logical Address 0 31(63) Offset (Effective Address) 15 0 Seg. Selector Descriptor Table Segment Descriptor Base Address + 31(63) 0 Linear Address Figure 3-5. Logical Address to Linear Address Translation If paging is not used, the processor maps the linear address directly to a physical address (that is, the linear address goes out on the processor’s address bus).
PROTECTED-MODE MEMORY MANAGEMENT 15 3 2 1 0 Index T RPL I Table Indicator 0 = GDT 1 = LDT Requested Privilege Level (RPL) Figure 3-6. Segment Selector Requested Privilege Level (RPL) (Bits 0 and 1) — Specifies the privilege level of the selector. The privilege level can range from 0 to 3, with 0 being the most privileged level. See Section 4.
PROTECTED-MODE MEMORY MANAGEMENT be made available by loading their segment selectors into these registers during program execution. Visible Part Segment Selector Hidden Part Base Address, Limit, Access Information CS SS DS ES FS GS Figure 3-7. Segment Registers Every segment register has a “visible” part and a “hidden” part. (The hidden part is sometimes referred to as a “descriptor cache” or a “shadow register.
PROTECTED-MODE MEMORY MANAGEMENT 3.4.4 Segment Loading Instructions in IA-32e Mode Because ES, DS, and SS segment registers are not used in 64-bit mode, their fields (base, limit, and attribute) in segment descriptor registers are ignored. Some forms of segment load instructions are also invalid (for example, LDS, POP ES). Address calculations that reference the ES, DS, or SS segments are treated as if the segment base is zero.
PROTECTED-MODE MEMORY MANAGEMENT 3.4.5 Segment Descriptors A segment descriptor is a data structure in a GDT or LDT that provides the processor with the size and location of a segment, as well as access control and status information. Segment descriptors are typically created by compilers, linkers, loaders, or the operating system or executive, but not application programs. Figure 3-8 illustrates the general descriptor format for all types of segment descriptors.
PROTECTED-MODE MEMORY MANAGEMENT to the segment limit. Offsets greater than the segment limit generate general-protection exceptions (#GP). For expand-down segments, the segment limit has the reverse function; the offset can range from the segment limit to FFFFFFFFH or FFFFH, depending on the setting of the B flag. Offsets less than the segment limit generate generalprotection exceptions.
PROTECTED-MODE MEMORY MANAGEMENT store its own data, such as information regarding the whereabouts of the missing segment. D/B (default operation size/default stack pointer size and/or upper bound) flag Performs different functions depending on whether the segment descriptor is an executable code segment, an expand-down data segment, or a stack segment. (This flag should always be set to 1 for 32-bit code and data segments and to 0 for 16-bit code and data segments.) • Executable code segment.
PROTECTED-MODE MEMORY MANAGEMENT G (granularity) flag Determines the scaling of the segment limit field. When the granularity flag is clear, the segment limit is interpreted in byte units; when flag is set, the segment limit is interpreted in 4-KByte units. (This flag does not affect the granularity of the base address; it is always byte granular.) When the granularity flag is set, the twelve least significant bits of an offset are not tested when checking the offset against the segment limit.
PROTECTED-MODE MEMORY MANAGEMENT Table 3-1.
PROTECTED-MODE MEMORY MANAGEMENT either by using an instruction with a CS override prefix or by loading a segment selector for the code segment in a data-segment register (the DS, ES, FS, or GS registers). In protected mode, code segments are not writable. Code segments can be either conforming or nonconforming. A transfer of execution into a more-privileged conforming segment allows execution to continue at the current privilege level.
PROTECTED-MODE MEMORY MANAGEMENT • • Trap-gate descriptor. Task-gate descriptor. These descriptor types fall into two categories: system-segment descriptors and gate descriptors. System-segment descriptors point to system segments (LDT and TSS segments). Gate descriptors are in themselves “gates,” which hold pointers to procedure entry points in code segments (call, interrupt, and trap gates) or which hold segment selectors for TSS’s (task gates).
PROTECTED-MODE MEMORY MANAGEMENT 3.5.1 Segment Descriptor Tables A segment descriptor table is an array of segment descriptors (see Figure 3-10). A descriptor table is variable in length and can contain up to 8192 (213) 8-byte descriptors.
PROTECTED-MODE MEMORY MANAGEMENT (see Section 2.4, “Memory-Management Registers”). The base addresses of the GDT should be aligned on an eight-byte boundary to yield the best processor performance. The limit value for the GDT is expressed in bytes. As with segments, the limit value is added to the base address to get the address of the last valid byte. A limit value of 0 results in exactly one valid byte.
PROTECTED-MODE MEMORY MANAGEMENT 3.5.2 Segment Descriptor Tables in IA-32e Mode In IA-32e mode, a segment descriptor table can contain up to 8192 (213) 8-byte descriptors. An entry in the segment descriptor table can be 8 bytes. System descriptors are expanded to 16 bytes (occupying the space of two entries). GDTR and LDTR registers are expanded to hold 64-bit base address. The corresponding pseudo-descriptor is 80 bits. (see the bottom diagram in Figure 3-11).
PROTECTED-MODE MEMORY MANAGEMENT To minimize the number of bus cycles required for address translation, the most recently accessed page-directory and page-table entries are cached in the processor in devices called translation lookaside buffers (TLBs). The TLBs satisfy most requests for reading the current page directory and page tables without requiring a bus cycle.
PROTECTED-MODE MEMORY MANAGEMENT reference physical addresses above FFFFFFFFH. The PSE-36 feature flag (bit 17 in the EDX register when the CPUID instruction is executed with a source operand of 1) indicates the availability of this addressing mechanism. See Section 3.9, “36-Bit Physical Addressing Using the PSE-36 Paging Mechanism”, for more information about the PSE-36 physical address extension and page size extension mechanism. 3.6.
PROTECTED-MODE MEMORY MANAGEMENT GBytes. The 32-bit physical addressing described applies to IA-32 processors or when the following situations are all true: • • The processor supports Intel 64 architecture but IA-32e mode is not active. PAE or PSE mechanism is not active. Section 3.8, “36-Bit Physical Addressing Using the PAE Paging Mechanism” and Section 3.
PROTECTED-MODE MEMORY MANAGEMENT Linear Address 31 22 21 12 11 Table Directory 0 Offset 12 10 10 Page Table 4-KByte Page Physical Address Page Directory Page-Table Entry 20 Directory Entry 1024 PDE ∗ 1024 PTE = 220 Pages 32* CR3 (PDBR) *32 bits aligned onto a 4-KByte boundary. Figure 3-12.
PROTECTED-MODE MEMORY MANAGEMENT 31 Linear Address 22 21 Offset Directory 22 10 Page Directory Directory Entry 32* 0 4-MByte Page Physical Address 10 1024 PDE = 1024 Pages CR3 (PDBR) *32 bits aligned onto a 4-KByte boundary. Figure 3-13. Linear Address Translation (4-MByte Pages) The 4-MByte page size is selected by setting the PSE flag in control register CR4 and setting the page size (PS) flag in a page-directory entry (see Figure 3-14).
PROTECTED-MODE MEMORY MANAGEMENT A typical example of mixing 4-KByte and 4-MByte pages is to place the operating system or executive’s kernel in a large page to reduce TLB misses and thus improve overall system performance. The processor maintains 4-MByte page entries and 4-KByte page entries in separate TLBs. So, placing often used code such as the kernel in a large page, frees up 4-KByte-page TLB entries for application programs and tasks. 3.7.
PROTECTED-MODE MEMORY MANAGEMENT interpreted as the 20 most-significant bits of the physical address, which forces pages to be aligned on 4-KByte boundaries.
PROTECTED-MODE MEMORY MANAGEMENT base address bits are interpreted as the 10 most-significant bits of the physical address, which forces 4-MByte pages to be aligned on 4-MByte boundaries. Page-Directory Entry (4-MByte Page) 31 13 12 11 22 21 Page Base Address Reserved P A T 9 8 7 6 5 4 3 2 1 0 P P U R Avail.
PROTECTED-MODE MEMORY MANAGEMENT Read/write (R/W) flag, bit 1 Specifies the read-write privileges for a page or group of pages (in the case of a page-directory entry that points to a page table). When this flag is clear, the page is read only; when the flag is set, the page can be read and written into. This flag interacts with the U/S flag and the WP flag in register CR0. See Section 4.11, “Page-Level Protection”, and Table 4-3 for a detailed discussion of the use of these flags.
PROTECTED-MODE MEMORY MANAGEMENT to manage the transfer of pages and page tables into and out of physical memory. NOTE: The accesses used by the processor to set this bit may or may not be exposed to the processor’s Self-Modifying Code detection logic. If the processor is executing code from the same memory area that is being used for page table structures, the setting of the bit may or may not result in an immediate change to the executing code stream.
PROTECTED-MODE MEMORY MANAGEMENT entry for the page is not invalidated in the TLB when register CR3 is loaded or a task switch occurs. This flag is provided to prevent frequently used pages (such as pages that contain kernel or other operating system or executive code) from being flushed from the TLB. Only software can set or clear this flag. For page-directory entries that point to page tables, this flag is ignored and the global characteristics of a page are set in the page-table entries. See Section 3.
PROTECTED-MODE MEMORY MANAGEMENT additional address line pins to accommodate the additional address bits. To use this option, the following flags must be set: • • PG flag (bit 31) in control register CR0—Enables paging PAE flag (bit 5) in control register CR4 are set—Enables the PAE paging mechanism. When the PAE paging mechanism is enabled, the processor supports two sizes of pages: 4-KByte and 2-MByte.
PROTECTED-MODE MEMORY MANAGEMENT 3.8.2 Linear Address Translation With PAE Enabled (4-KByte Pages) Figure 3-18 shows the page-directory-pointer, page-directory, and page-table hierarchy when mapping linear addresses to 4-KByte pages when the PAE paging mechanism enabled. This paging method can be used to address up to 220 pages, which spans a linear address space of 232 bytes (4 GBytes).
PROTECTED-MODE MEMORY MANAGEMENT 3.8.3 Linear Address Translation With PAE Enabled (2-MByte Pages) Figure 3-19 shows how a page-directory-pointer table and page directories can be used to map linear addresses to 2-MByte pages when the PAE paging mechanism enabled. This paging method can be used to map up to 2048 pages (4 page-directory-pointer-table entries times 512 page-directory entries) into a 4-GByte linear address space.
PROTECTED-MODE MEMORY MANAGEMENT 3.8.4 Accessing the Full Extended Physical Address Space With the Extended Page-Table Structure The page-table structure described in the previous two sections allows up to 4 GBytes of the 64 GByte extended physical address space to be addressed at one time.
PROTECTED-MODE MEMORY MANAGEMENT Page-Directory-Pointer-Table Entry 63 36 35 Reserved (set to 0) 31 12 11 Page-Directory Base Address 32 Base Addr. 9 8 5 4 3 2 1 0 P P Reserved C W Res. P D T Avail Page-Directory Entry (4-KByte Page Table) 63 36 35 Reserved (set to 0) 31 12 11 Page-Table Base Address 32 Base Addr.
PROTECTED-MODE MEMORY MANAGEMENT Page-Directory-Pointer-Table Entry 63 36 35 Reserved (set to 0) 31 32 Base Addr. 12 11 Page Directory Base Address 9 8 Avail. 5 4 3 2 1 0 P P Reserved C W Res. P D T Page-Directory Entry (2-MByte Page) 63 36 35 Reserved (set to 0) 31 21 20 Page Base Address 13 12 11 Reserved (set to 0) 32 Base Addr. P A T 9 8 7 6 5 4 3 2 1 0 P P U R Avail. G 1 D A C W / / P D T S W Figure 3-21.
PROTECTED-MODE MEMORY MANAGEMENT Access (A) and dirty (D) flags (bits 5 and 6) are provided for table entries that point to pages. Bits 9, 10, and 11 in all the table entries for the physical address extension are available for use by software. (When the present flag is clear, bits 1 through 63 are available to software.) All bits in Figure 3-14 that are marked reserved or 0 should be set to 0 by software and not accessed by software.
PROTECTED-MODE MEMORY MANAGEMENT Figure 3-22 shows how the expanded page directory entry can be used to map a 32-bit linear address to a 36-bit physical address. Here, the linear address is divided into two sections: • Page directory entry — Bits 22 through 35 provide an offset to an entry in the page directory. The selected entry provides the 14 most significant bits of a 36-bit address, which locates the base physical address of a 4-MByte page.
PROTECTED-MODE MEMORY MANAGEMENT Page-Directory Entry (4-MByte Page) 31 22 21 Page Base Address (Bits 22 Through 31) 17 16 Reserved 13 12 11 P A T 9 8 7 6 5 4 3 2 1 0 P P U R Avail. G P D A C W / / P S D T S W Page Base Address (Bits 32 Through 35) Page Attribute Table Index Available for system programmer’s use Global page Page size (must be set to 1) Dirty Accessed Cache disabled Write-through User/Supervisor Read/Write Present Figure 3-23.
PROTECTED-MODE MEMORY MANAGEMENT + PDP + PDE + PTE + page offset) becomes 48. The method for translating the highorder 16 linear-address bits into a physical address is currently reserved. The PS flag in the page directory entry (PDE.PS) selects between 4-KByte and 2-MByte page sizes. Because PDE.PS is used to control large page selection, the CR4.PSE bit is ignored. 3.10.
PROTECTED-MODE MEMORY MANAGEMENT Linear Address 39 38 63 48 47 30 29 Sign Extended PML4 Directory Directory Ptr 21 20 Table 9 12 11 Offset 9 9 0 12 4-KByte Page Physical Addr Page-Table Entry Page-DirectoryPointer Table Directory Entry 28 Page Table Page-Directory Dir. Pointer Entry 9 512 PML4 *512 PDPTE ∗ 512 PDE ∗ 512 PTE = 236 Pages PML4 Entry 401 CR3 (PML4) NOTES: 1. 40 bits aligned onto a 4-KByte boundary Figure 3-24. IA-32e Mode Paging Structures (4-KByte Pages) 3.10.
PROTECTED-MODE MEMORY MANAGEMENT • Page-directory-pointer-table entry — Bits 38:30 provide an offset to an entry in the page-directory-pointer table. The selected entry provides the base physical address of a page directory. • Page-directory entry — Bits 29:21 provide an offset to an entry in the page directory. The selected entry provides the base physical address of a 2-MByte page. • Page offset — Bits 20:0 provides an offset to a physical address in the page.
PROTECTED-MODE MEMORY MANAGEMENT Except for the PML4 table; enhanced formats of page-directory-pointer table, pagedirectory, and page-table entries are also used in enhanced legacy PAE-enabled paging on processors that support Intel 64 architecture (see Section 3.8.1, “Enhanced Legacy PAE Paging”). Page-Map-Level-4-Table Entry 63 62 E X B 39 51 Avail Base Address Reserved (set to 0) 31 32 12 11 Avail PML4 Base Address 6 5 4 P Rsvd.
PROTECTED-MODE MEMORY MANAGEMENT • The maximum number of entries in a page directory, page table, or PML4 table is 512. • The P, R/W, U/S, PWT, PCD, and A flags are implemented uniformly across all four levels. • The base physical address field in each entry is extended to 28 bits if the processor’s implementation supports a 40-bit physical address. • • Bits 62:52 are available for use by system programmers.
PROTECTED-MODE MEMORY MANAGEMENT Intel® 64 Processors and Reserved Bit Checking 3.10.3.1 On processors supporting Intel 64 architecture and/or supporting the execute disable bit, the processor enforces reserved bit checking on paging mode specific bits. Table 3-4 shows the reserved bits that are checked on Intel 64 processors when execute disable bit checking is either disabled or not supported.
PROTECTED-MODE MEMORY MANAGEMENT Table 3-5.
PROTECTED-MODE MEMORY MANAGEMENT The IA-32 architecture does not enforce correspondence between the boundaries of pages and segments. A page can contain the end of one segment and the beginning of another. Likewise, a segment can contain the end of one page and the beginning of another. Memory-management software may be simpler and more efficient if it enforces some alignment between page and segment boundaries.
PROTECTED-MODE MEMORY MANAGEMENT procedures running at privilege level of 0 can invalidate TLBs or selected TLB entries. Whenever a page-directory or page-table entry is changed (including when the present flag is set to zero), the operating-system must immediately invalidate the corresponding entry in the TLB so that it can be updated the next time the entry is referenced.
PROTECTED-MODE MEMORY MANAGEMENT 3-52 Vol.
CHAPTER 4 PROTECTION In protected mode, the Intel 64 and IA-32 architectures provide a protection mechanism that operates at both the segment level and the page level. This protection mechanism provides the ability to limit access to certain segments or pages based on privilege levels (four privilege levels for segments and two privilege levels for pages).
PROTECTION 4.1 ENABLING AND DISABLING SEGMENT AND PAGE PROTECTION Setting the PE flag in register CR0 causes the processor to switch to protected mode, which in turn enables the segment-protection mechanism. Once in protected mode, there is no control bit for turning the protection mechanism on or off.
PROTECTION • Requested privilege level (RPL) field — (Bits 0 and 1 of any segment selector.) Specifies the requested privilege level of a segment selector. • Current privilege level (CPL) field — (Bits 0 and 1 of the CS segment register.) Indicates the privilege level of the currently executing program or procedure. The term current privilege level (CPL) refers to the setting of this field. • User/supervisor (U/S) flag — (Bit 2 of a page-directory or page-table entry.
PROTECTION Data-Segment Descriptor 31 24 23 22 21 20 19 Base 31:24 A G B 0 V L 16 15 14 13 12 11 Limit 19:16 31 P D P L 0 8 7 Type Base 23:16 4 1 0 E W A 16 15 0 Base Address 15:00 Segment Limit 15:00 0 Code-Segment Descriptor 31 24 23 22 21 20 19 Base 31:24 A G D 0 V L 16 15 14 13 12 11 Limit 19:16 31 P D P L 8 7 Type 0 Base 23:16 4 1 1 C R A 16 15 0 Base Address 15:00 Segment Limit 15:00 0 System-Segment Descriptor 31 24 23 22 21 20 19 Base 31:24 G 0 31 16 15
PROTECTION The following sections describe how the processor uses these fields and flags to perform the various categories of checks described in the introduction to this chapter. 4.2.1 Code Segment Descriptor in 64-bit Mode Code segments continue to exist in 64-bit mode even though, for address calculations, the segment base is treated as zero.
PROTECTION Code-Segment Descriptor 31 24 23 22 21 20 19 16 15 14 13 12 11 A G D L V L D P L P 8 7 0 Type 4 1 1 C R A 0 31 0 A AVL C D DPL L Accessed Available to Sys. Programmer’s Conforming Default Descriptor Privilege Level 64-Bit Flag G R P Granularity Readable Present Figure 4-2. Descriptor Fields with Flags used in IA-32e Mode 4.3 LIMIT CHECKING The limit field of a segment descriptor prevents programs or procedures from addressing memory locations outside the segment.
PROTECTION • • A doubleword at an offset greater than the (effective-limit – 3) A quadword at an offset greater than the (effective-limit – 7) For expand-down data segments, the segment limit has the same function but is interpreted differently. Here, the effective limit specifies the last address that is not allowed to be accessed within the segment; the range of valid offsets is from (effective-limit + 1) to FFFFFFFFH if the B flag is set and from (effective-limit + 1) to FFFFH if the B flag is clear.
PROTECTION The processor examines type information at various times while operating on segment selectors and segment descriptors. The following list gives examples of typical operations where type checking is performed (this list is not exhaustive): • When a segment selector is loaded into a segment register — Certain segment registers can contain only certain descriptor types, for example: — The CS register only can be loaded with a selector for a code segment.
PROTECTION instruction. If the descriptor type is for a code segment or call gate, a call or jump to another code segment is indicated; if the descriptor type is for a TSS or task gate, a task switch is indicated. — On a call or jump through a call gate (or on an interrupt- or exception-handler call through a trap or interrupt gate), the processor automatically checks that the segment descriptor being pointed to by the gate is for a code segment.
PROTECTION Protection Rings Operating System Kernel Level 0 Operating System Services Level 1 Level 2 Applications Level 3 Figure 4-3. Protection Rings The processor uses privilege levels to prevent a program or task operating at a lesser privilege level from accessing a segment with a greater privilege, except under controlled situations. When the processor detects a privilege level violation, it generates a general-protection exception (#GP).
PROTECTION — Nonconforming code segment (without using a call gate) — The DPL indicates the privilege level that a program or task must be at to access the segment. For example, if the DPL of a nonconforming code segment is 0, only programs running at a CPL of 0 can access the segment. — Call gate — The DPL indicates the numerically highest privilege level that the currently executing program or task can be at and still be able to access the call gate. (This is the same access rule as for a data segment.
PROTECTION 4.6 PRIVILEGE LEVEL CHECKING WHEN ACCESSING DATA SEGMENTS To access operands in a data segment, the segment selector for the data segment must be loaded into the data-segment registers (DS, ES, FS, or GS) or into the stacksegment register (SS). (Segment registers can be loaded with the MOV, POP, LDS, LES, LFS, LGS, and LSS instructions.
PROTECTION than the DPL of data segment E. Even if a code segment C procedure were to use segment selector E1 or E2, such that the RPL would be acceptable, it still could not access data segment E because its CPL is not privileged enough. 4. The procedure in code segment D should be able to access data segment E because code segment D’s CPL is numerically less than the DPL of data segment E.
PROTECTION violate privilege-level security for the data segment. To prevent these types of privilege-level-check violations, a program or procedure can check access privileges whenever it receives a data-segment selector from another procedure (see Section 4.10.4, “Checking Caller Access Privileges (ARPL Instruction)”). 4.6.1 Accessing Data in Code Segments In some instances it may be desirable to access data structures that are contained in a code segment.
PROTECTION Program control transfers are carried out with the JMP, CALL, RET, SYSENTER, SYSEXIT, INT n, and IRET instructions, as well as by the exception and interrupt mechanisms. Exceptions, interrupts, and the IRET instruction are special cases discussed in Chapter 5, “Interrupt and Exception Handling.” This chapter discusses only the JMP, CALL, RET, SYSENTER, and SYSEXIT instructions.
PROTECTION CS Register CPL Segment Selector For Code Segment RPL Destination Code Segment Descriptor DPL Privilege Check C Figure 4-6. Privilege Check for Control Transfer Without Using a Gate • The DPL of the segment descriptor for the destination code segment that contains the called procedure. • • The RPL of the segment selector of the destination code segment.
PROTECTION Code Segment B CPL=3 3 Segment Sel. D2 RPL=3 Segment Sel. C2 RPL=3 Lowest Privilege Code Segment A CPL=2 2 Segment Sel. C1 RPL=2 Segment Sel. D1 RPL=2 Code Segment C DPL=2 Nonconforming Code Segment Code Segment D DPL=1 Conforming Code Segment 1 0 Highest Privilege Figure 4-7.
PROTECTION In the example in Figure 4-7, code segment D is a conforming code segment. Therefore, calling procedures in both code segment A and B can access code segment D (using either segment selector D1 or D2, respectively), because they both have CPLs that are greater than or equal to the DPL of the conforming code segment. For conforming code segments, the DPL represents the numerically lowest privilege level that a calling procedure may be at to successfully make a call to the code segment.
PROTECTION 4.8.3 Call Gates Call gates facilitate controlled transfers of program control between different privilege levels. They are typically used only in operating systems or executives that use the privilege-level protection mechanism. Call gates are also useful for transferring program control between 16-bit and 32-bit code segments, as described in Section 16.4, “Transferring Control Among Mixed-Size Code Segments.” Figure 4-8 shows the format of a call-gate descriptor.
PROTECTION Note that the P flag in a gate descriptor is normally always set to 1. If it is set to 0, a not present (#NP) exception is generated when a program attempts to access the descriptor. The operating system can use the P flag for special purposes. For example, it could be used to track the number of times the gate is used. Here, the P flag is initially set to 0 causing a trap to the not-present exception handler.
PROTECTION 13 12 11 10 9 8 7 31 Type 0 Reserved Reserved 16 0 0 0 0 0 31 0 8 Offset in Segment 63:31 31 8 7 16 15 14 13 12 11 Offset in Segment 31:16 31 P D P L Type 0 0 16 15 Segment Selector . 4 0 1 1 0 0 0 Offset in Segment 15:00 0 DPL Descriptor Privilege Level P Gate Valid Figure 4-9. Call-Gate Descriptor in IA-32e Mode • Target code segments referenced by a 64-bit call gate must be 64-bit code segments (CS.L = 1, CS.D = 0).
PROTECTION 4.8.4 Accessing a Code Segment Through a Call Gate To access a call gate, a far pointer to the gate is provided as a target operand in a CALL or JMP instruction. The segment selector from this pointer identifies the call gate (see Figure 4-10); the offset from the pointer is required, but not used or checked by the processor. (The offset can be set to any value.
PROTECTION CS Register CPL Call-Gate Selector RPL Call Gate (Descriptor) DPL Privilege Check Destination CodeSegment Descriptor DPL Figure 4-11. Privilege Check for Control Transfer with Call Gate The privilege checking rules are different depending on whether the control transfer was initiated with a CALL or a JMP instruction, as shown in Table 4-1. Table 4-1.
PROTECTION segments B and C. The dotted line shows that a calling procedure in code segment A cannot access call gate B. The RPL of the segment selector to a call gate must satisfy the same test as the CPL of the calling procedure; that is, the RPL must be less than or equal to the DPL of the call gate. In the example in Figure 4-15, a calling procedure in code segment C can access call gate B using gate selector B2 or B1, but it could not use gate selector B3 to access call gate B.
PROTECTION 3 Code Segment A Gate Selector A RPL=3 CPL=3 Gate Selector B3 RPL=3 Call Gate A DPL=3 Lowest Privilege Code Segment B CPL=2 Gate Selector B1 RPL=2 Call Gate B DPL=2 2 Code Segment C CPL=1 Gate Selector B2 RPL=1 No Stack Switch Occurs 1 Stack Switch Occurs Code Segment D DPL=0 0 Highest Privilege Conforming Code Segment Code Segment E DPL=0 Nonconforming Code Segment Figure 4-12.
PROTECTION Each task must define up to 4 stacks: one for applications code (running at privilege level 3) and one for each of the privilege levels 2, 1, and 0 that are used. (If only two privilege levels are used [3 and 0], then only two stacks must be defined.) Each of these stacks is located in a separate segment and is identified with a segment selector and an offset into the stack segment (a stack pointer).
PROTECTION 3. Checks the stack-segment descriptor for the proper privileges and type and generates an invalid TSS (#TS) exception if violations are detected. 4. Temporarily saves the current values of the SS and ESP registers. 5. Loads the segment selector and stack pointer for the new stack in the SS and ESP registers. 6. Pushes the temporarily saved values for the SS and ESP registers (for the calling procedure) onto the new stack (see Figure 4-13). 7.
PROTECTION dure, one of the parameters can be a pointer to a data structure, or the saved contents of the SS and ESP registers may be used to access parameters in the old stack space. The size of the data items passed to the called procedure depends on the call gate size, as described in Section 4.8.3, “Call Gates.” 4.8.5.1 Stack Switching in 64-bit Mode Although protection-check rules for call gates are unchanged from 32-bit mode, stack-switch changes in 64-bit mode are different.
PROTECTION 4.8.6 Returning from a Called Procedure The RET instruction can be used to perform a near return, a far return at the same privilege level, and a far return to a different privilege level. This instruction is intended to execute returns from procedures that were called with a CALL instruction. It does not support returns from a JMP instruction, because the JMP instruction does not save a return instruction pointer on the stack.
PROTECTION limit violations detected while loading the stack-segment selector or stack pointer cause a general-protection exception (#GP) to be generated. The new stack-segment descriptor is also checked for type and privilege violations. 5. (If the RET instruction includes a parameter count operand.) Adds the parameter count (in bytes obtained from the RET instruction) to the current ESP register value, to step past the parameters on the calling procedure’s stack.
PROTECTION For SYSEXIT, target fields are generated using the following sources: • Target code segment — Computed by adding 16 to the value in the IA32_SYSENTER_CS. • • • Target instruction — Reads this from EDX. Stack segment — Computed by adding 24 to the value in IA32_SYSENTER_CS. Stack pointer — Reads this from ECX.
PROTECTION • • • Target instruction — Reads 64-bit canonical address in RDX. Stack segment — Computed by adding 40 to the value of IA32_SYSENTER_CS. Stack pointer — Update RSP using 64-bit canonical address in RCX. When SYSEXIT transfers control to compatibility mode user code when the operand size attribute is 32 bits, the following fields are generated and bits set: • Target code segment — Computed by adding 16 to the value in IA32_SYSENTER_CS.
PROTECTION When SYSRET transfers control to 64-bit mode user code using REX.W, the processor gets the privilege level 3 target instruction and stack pointer from: • Target code segment — Reads a non-NULL selector from IA32_STAR[63:48] + 16. • • • Target instruction — Copies the value in RCX into RIP. Stack segment — IA32_STAR[63:48] + 8. EFLAGS — Loaded from R11.
PROTECTION 4.9 PRIVILEGED INSTRUCTIONS Some of the system instructions (called “privileged instructions”) are protected from use by application programs. The privileged instructions control system functions (such as the loading of system registers). They can be executed only when the CPL is 0 (most privileged). If one of these instructions is executed when the CPL is not 0, a general-protection exception (#GP) is generated.
PROTECTION 3. Checking if the pointer offset exceeds the segment limit. 4. Checking if the supplier of the pointer is allowed to access the segment. 5. Checking the offset alignment. The processor automatically performs first, second, and third checks during instruction execution. Software must explicitly request the fourth check by issuing an ARPL instruction. The fifth check (offset alignment) is performed automatically at privilege level 3 if alignment checking is turned on.
PROTECTION 4.10.2 Checking Read/Write Rights (VERR and VERW Instructions) When the processor accesses any code or data segment it checks the read/write privileges assigned to the segment to verify that the intended read or write operation is allowed. Software can check read/write rights using the VERR (verify for reading) and VERW (verify for writing) instructions. Both these instructions specify the segment selector for the segment being checked.
PROTECTION 5. If the privilege level and type checks pass, loads the unscrambled limit (the limit scaled according to the setting of the G flag in the segment descriptor) into the destination register and sets the ZF flag in the EFLAGS register. If the segment selector is not visible at the current privilege level or is an invalid type for the LSL instruction, the instruction does not modify the destination register and clears the ZF flag.
PROTECTION Passed as a parameter on the stack. Application Program Code Segment A CPL=3 3 Gate Selector B RPL=3 Call Gate B Segment Sel. D1 RPL=3 DPL=3 Lowest Privilege 2 Access not allowed 1 Code Operating Segment C System DPL=0 0 Highest Privilege Segment Sel. D2 RPL=0 Access allowed Data Segment D DPL=0 Figure 4-15.
PROTECTION The example in Figure 4-15 demonstrates how the ARPL instruction is intended to be used. When the operating-system receives segment selector D2 from the application program, it uses the ARPL instruction to compare the RPL of the segment selector with the privilege level of the application program (represented by the code-segment selector pushed onto the stack).
PROTECTION tion of the page-fault exception mechanism. This chapter describes the protection violations which lead to page-fault exceptions. 4.11.1 Page-Protection Flags Protection information for pages is contained in two flags in a page-directory or pagetable entry (see Figure 3-14): the read/write flag (bit 1) and the user/supervisor flag (bit 2). The protection checks are applied to both first- and second-level page tables (that is, page directories and page tables). 4.11.
PROTECTION When the processor is in supervisor mode and the WP flag in register CR0 is clear (its state following reset initialization), all pages are both readable and writable (writeprotection is ignored). When the processor is in user mode, it can write only to usermode pages that are read/write accessible. User-mode pages which are read/write or read-only are readable; supervisor-mode pages are neither readable nor writable from user mode.
PROTECTION Page-level protections cannot be used to override segment-level protection. For example, a code segment is by definition not writable. If a code segment is paged, setting the R/W flag for the pages to read-write does not make the pages writable. Attempts to write into the pages will be blocked by segment-level protection checks. Page-level protection can be used to enhance segment-level protection.
PROTECTION 4.13 PAGE-LEVEL PROTECTION AND EXECUTE-DISABLE BIT In addition to page-level protection offered by the U/S and R/W flags, enhanced PAEenabled paging structures (see Section 3.10.3, “Enhanced Paging Data Structures”) provide the execute-disable bit. This bit offers additional protection for data pages. An Intel 64 or IA-32 processor with the execute disable bit capability can prevent data pages from being used by malicious software to execute code.
PROTECTION If the execute disable bit capability is not available, a write to IA32_EFER.NXE produces a #GP exception. See Table 4-5. Table 4-5. Extended Feature Enable MSR (IA32_EFER) 63:12 11 10 9 8 7:1 0 Reserved Executedisable bit enable (NXE) IA-32e mode active (LMA) Reserve d IA-32e mode enable (LME) Reserve d SysCall enable (SCE) 4.13.2 Execute-Disable Bit Page Protection The execute-disable bit in paging structures enhances page protection for data pages.
PROTECTION In legacy PAE-enabled mode, Table 4-7 and Table 4-8 show the effect of setting the execute-disable bit for code and data pages. Table 4-7. Legacy PAE-Enabled 4-KByte Page Level Protection Matrix with Execute-Disable Bit Capability Execute Disable Bit Value (Bit 63) Valid Usage PDE PTE Bit 63 = 1 * Data * Bit 63 = 1 Data Bit 63 = 0 Bit 63 = 0 Data/Code NOTE: * Value not checked. Table 4-8.
PROTECTION Table 4-9.
PROTECTION Table 4-10.
PROTECTION 4-48 Vol.
CHAPTER 5 INTERRUPT AND EXCEPTION HANDLING This chapter describes the interrupt and exception-handling mechanism when operating in protected mode on an Intel 64 or IA-32 processor. Most of the information provided here also applies to interrupt and exception mechanisms used in realaddress, virtual-8086 mode, and 64-bit mode. Chapter 15, “8086 Emulation,” describes information specific to interrupt and exception mechanisms in real-address and virtual-8086 mode. Section 5.
INTERRUPT AND EXCEPTION HANDLING 5.2 EXCEPTION AND INTERRUPT VECTORS To aid in handling exceptions and interrupts, each architecturally defined exception and each interrupt condition requiring special handling by the processor is assigned a unique identification number, called a vector. The processor uses the vector assigned to an exception or interrupt as an index into the interrupt descriptor table (IDT). The table provides the entry point to an exception or interrupt handler (see Section 5.
INTERRUPT AND EXCEPTION HANDLING (see Section 5.2, “Exception and Interrupt Vectors”). Asserting the NMI pin signals a non-maskable interrupt (NMI), which is assigned to interrupt vector 2. Table 5-1. Protected-Mode Exceptions and Interrupts Vector No. Mne- Description monic Type Error Code Source 0 #DE Divide Error Fault No DIV and IDIV instructions. 1 #DB RESERVED Fault/ Trap No For Intel use only. 2 — NMI Interrupt Interrupt No Nonmaskable external interrupt.
INTERRUPT AND EXCEPTION HANDLING Table 5-1. Protected-Mode Exceptions and Interrupts (Contd.) 19 #XF SIMD Floating-Point Exception 20-31 — Intel reserved. Do not use. 32255 — User Defined (Nonreserved) Interrupts Fault Interrupt No SSE/SSE2/SSE3 floating-point instructions5 External interrupt or INT n instruction. NOTES: 1. The UD2 instruction was introduced in the Pentium Pro processor. 2. Processors after the Intel386 processor do not generate this exception. 3.
INTERRUPT AND EXCEPTION HANDLING The IF flag in the EFLAGS register permits all maskable hardware interrupts to be masked as a group (see Section 5.8.1, “Masking Maskable Hardware Interrupts”). Note that when interrupts 0 through 15 are delivered through the local APIC, the APIC indicates the receipt of an illegal vector. 5.3.3 Software-Generated Interrupts The INT n instruction permits interrupts to be generated from within software by supplying an interrupt vector number as an operand.
INTERRUPT AND EXCEPTION HANDLING The INT n instruction can be used to emulate exceptions in software; but there is a limitation. If INT n provides a vector for one of the architecturally-defined exceptions, the processor generates an interrupt to the correct vector (to access the exception handler) but does not push an error code on the stack. This is true even if the associated hardware-generated exception normally produces an error code.
INTERRUPT AND EXCEPTION HANDLING NOTE One exception subset normally reported as a fault is not restartable. Such exceptions result in loss of some processor state. For example, executing a POPAD instruction where the stack frame crosses over the end of the stack segment causes a fault to be reported. In this situation, the exception handler sees that the instruction pointer (CS:EIP) has been restored as if the POPAD instruction had not been executed.
INTERRUPT AND EXCEPTION HANDLING processor when the abort exception occurred and then shut down the application and system as gracefully as possible. Interrupts rigorously support restarting of interrupted programs and tasks without loss of continuity. The return instruction pointer saved for an interrupt points to the next instruction to be executed at the instruction boundary where the processor took the interrupt.
INTERRUPT AND EXCEPTION HANDLING 5.7.1 Handling Multiple NMIs While an NMI interrupt handler is executing, the processor disables additional calls to the NMI handler until the next IRET instruction is executed. This blocking of subsequent NMIs prevents stacking up calls to the NMI handler. It is recommended that the NMI interrupt handler be accessed through an interrupt gate to disable maskable hardware interrupts (see Section 5.8.1, “Masking Maskable Hardware Interrupts”).
INTERRUPT AND EXCEPTION HANDLING executed only if the CPL is equal to or less than the IOPL. A general-protection exception (#GP) is generated if they are executed when the CPL is greater than the IOPL. (The effect of the IOPL on these instructions is modified slightly when the virtual mode extension is enabled by setting the VME flag in control register CR4: see Section 15.3, “Interrupt and Exception Handling in Virtual-8086 Mode.” Behavior is also impacted by the PVI flag: see Section 15.
INTERRUPT AND EXCEPTION HANDLING To prevent this situation, the processor inhibits interrupts, debug exceptions, and single-step trap exceptions after either a MOV to SS instruction or a POP to SS instruction, until the instruction boundary following the next instruction is reached. All other faults may still be generated. If the LSS instruction is used to modify the contents of the SS register (which is the recommended method of modifying this register), this problem does not occur. 5.
INTERRUPT AND EXCEPTION HANDLING Table 5-2. Priority Among Simultaneous Exceptions and Interrupts (Contd.) 10 (Lowest) Faults on Executing an Instruction - Overflow - Bound error - Invalid TSS - Segment Not Present - Stack fault - General Protection - Data Page Fault - Alignment Check - x87 FPU Floating-point exception - SIMD floating-point exception NOTE: 1. The Intel486TM processor and earlier processors group nonmaskable and maskable interrupts in the same priority class.
INTERRUPT AND EXCEPTION HANDLING The LIDT (load IDT register) and SIDT (store IDT register) instructions load and store the contents of the IDTR register, respectively. The LIDT instruction loads the IDTR register with the base address and limit held in a memory operand. This instruction can be executed only when the CPL is 0. It normally is used by the initialization code of an operating system when creating an IDT. An operating system also may use it to change from one IDT to another.
INTERRUPT AND EXCEPTION HANDLING 5.11 IDT DESCRIPTORS The IDT may contain any of three kinds of gate descriptors: • • • Task-gate descriptor Interrupt-gate descriptor Trap-gate descriptor Figure 5-2 shows the formats for the task-gate, interrupt-gate, and trap-gate descriptors. The format of a task gate used in an IDT is the same as that of a task gate used in the GDT or an LDT (see Section 6.2.5, “Task-Gate Descriptor”).
INTERRUPT AND EXCEPTION HANDLING Task Gate 31 16 15 14 13 12 P 31 D P L 0 8 7 4 0 0 1 0 1 16 15 0 TSS Segment Selector 0 Interrupt Gate 31 16 15 14 13 12 Offset 31..16 31 P D P L 8 7 0 D 1 1 0 5 4 0 0 0 0 16 15 4 0 Segment Selector Offset 15..0 0 Trap Gate 31 16 15 14 13 12 Offset 31..16 31 P D P L 8 7 0 D 1 1 1 5 4 0 0 0 16 15 Segment Selector DPL Offset P Selector D 0 4 0 Offset 15..
INTERRUPT AND EXCEPTION HANDLING “Returning from a Called Procedure”). If index points to a task gate, the processor executes a task switch to the exception- or interrupt-handler task in a manner similar to a CALL to a task gate (see Section 6.3, “Task Switching”). 5.12.1 Exception- or Interrupt-Handler Procedures An interrupt gate or trap gate references an exception- or interrupt-handler procedure that runs in the context of the currently executing task (see Figure 5-3).
INTERRUPT AND EXCEPTION HANDLING When the processor performs a call to the exception- or interrupt-handler procedure: • If the handler procedure is going to be executed at a numerically lower privilege level, a stack switch occurs. When the stack switch occurs: a. The segment selector and stack pointer for the stack to be used by the handler are obtained from the TSS for the currently executing task.
INTERRUPT AND EXCEPTION HANDLING Stack Usage with No Privilege-Level Change Interrupted Procedure’s and Handler’s Stack EFLAGS CS EIP Error Code ESP Before Transfer to Handler ESP After Transfer to Handler Stack Usage with Privilege-Level Change Interrupted Procedure’s Stack Handler’s Stack ESP Before Transfer to Handler ESP After Transfer to Handler SS ESP EFLAGS CS EIP Error Code Figure 5-4.
INTERRUPT AND EXCEPTION HANDLING Section 4.8.4, “Accessing a Code Segment Through a Call Gate”). The processor does not permit transfer of execution to an exception- or interrupt-handler procedure in a less privileged code segment (numerically greater privilege level) than the CPL. An attempt to violate this rule results in a general-protection exception (#GP).
INTERRUPT AND EXCEPTION HANDLING A subsequent IRET instruction restores the IF flag to its value in the saved contents of the EFLAGS register on the stack. Accessing a handler procedure through a trap gate does not affect the IF flag. 5.12.2 Interrupt Tasks When an exception or interrupt handler is accessed through a task gate in the IDT, a task switch results.
INTERRUPT AND EXCEPTION HANDLING IDT Interrupt Vector TSS for InterruptHandling Task Task Gate TSS Selector GDT TSS Base Address TSS Descriptor Figure 5-5. Interrupt Task Switch Vol.
INTERRUPT AND EXCEPTION HANDLING 5.13 ERROR CODE When an exception condition is related to a specific segment, the processor pushes an error code onto the stack of the exception handler (whether it is a procedure or task). The error code has the format shown in Figure 5-6.
INTERRUPT AND EXCEPTION HANDLING 5.14 EXCEPTION AND INTERRUPT HANDLING IN 64-BIT MODE In 64-bit mode, interrupt and exception handling is similar to what has been described for non-64-bit modes. The following are the exceptions: • All interrupt handlers pointed by the IDT are in 64-bit code (this does not apply to the SMI handler). • The size of interrupt-stack pushes is fixed at 64 bits; and the processor uses 8-byte, zero extended stores.
INTERRUPT AND EXCEPTION HANDLING Interrupt/Trap Gate 31 0 Reserved 12 31 0 Offset 63..32 31 8 16 15 14 13 12 11 Offset 31..16 P D P 0 8 7 TYPE 5 4 0 2 0 0 0 0 0 4 IST L 31 16 15 Segment Selector DPL Offset P Selector IST 0 Offset 15..0 0 Descriptor Privilege Level Offset to procedure entry point Segment Present flag Segment Selector for destination code segment Interrupt Stack Table Figure 5-7.
INTERRUPT AND EXCEPTION HANDLING 5.14.2 64-Bit Mode Stack Frame In legacy mode, the size of an IDT entry (16 bits or 32 bits) determines the size of interrupt-stack-frame pushes. SS:ESP is pushed only on a CPL change. In 64-bit mode, the size of interrupt stack-frame pushes is fixed at eight bytes. This is because only 64-bit mode gates can be referenced. 64-bit mode also pushes SS:RSP unconditionally, rather than only on a CPL change.
INTERRUPT AND EXCEPTION HANDLING that exit with an IRET unconditionally pop SS:RSP off of the interrupt stack frame, even if the target code segment is running in 64-bit mode or at CPL = 0. This is because the original interrupt always pushes SS:RSP. In IA-32e mode, IRET is allowed to load a NULL SS under certain conditions. If the target mode is 64-bit mode and the target CPL <> 3, IRET allows SS to be loaded with a NULL selector.
INTERRUPT AND EXCEPTION HANDLING Legacy Mode +20 +16 +12 +8 +4 0 Stack Usage with Privilege-Level Change IA-32e Mode Handler’s Stack Handler’s Stack SS ESP EFLAGS CS EIP Error Code SS ESP EFLAGS CS EIP Error Code Stack Pointer After Transfer to Handler +40 +32 +24 +16 +8 0 Figure 5-8. IA-32e Mode Stack Usage After Privilege Level Change 5.14.
INTERRUPT AND EXCEPTION HANDLING 5.15 EXCEPTION AND INTERRUPT REFERENCE The following sections describe conditions which generate exceptions and interrupts. They are arranged in the order of vector numbers. The information contained in these sections are as follows: • Exception Class — Indicates whether the exception class is a fault, trap, or abort type. Some exceptions can be either a fault or trap type, depending on when the error condition is detected. (This section is not applicable to interrupts.
INTERRUPT AND EXCEPTION HANDLING Interrupt 0—Divide Error Exception (#DE) Exception Class Fault. Description Indicates the divisor operand for a DIV or IDIV instruction is 0 or that the result cannot be represented in the number of bits specified for the destination operand. Exception Error Code None. Saved Instruction Pointer Saved contents of CS and EIP registers point to the instruction that generated the exception.
INTERRUPT AND EXCEPTION HANDLING Interrupt 1—Debug Exception (#DB) Exception Class Trap or Fault. The exception handler can distinguish between traps or faults by examining the contents of DR6 and the other debug registers. Description Indicates that one or more of several debug-exception conditions has been detected. Whether the exception is a fault or a trap depends on the condition (see Table 5-3).
INTERRUPT AND EXCEPTION HANDLING Interrupt 2—NMI Interrupt Exception Class Not applicable. Description The nonmaskable interrupt (NMI) is generated externally by asserting the processor’s NMI pin or through an NMI request set by the I/O APIC to the local APIC. This interrupt causes the NMI interrupt handler to be called. Exception Error Code Not applicable. Saved Instruction Pointer The processor always takes an NMI interrupt on an instruction boundary.
INTERRUPT AND EXCEPTION HANDLING Interrupt 3—Breakpoint Exception (#BP) Exception Class Trap. Description Indicates that a breakpoint instruction (INT 3) was executed, causing a breakpoint trap to be generated. Typically, a debugger sets a breakpoint by replacing the first opcode byte of an instruction with the opcode for the INT 3 instruction. (The INT 3 instruction is one byte long, which makes it easy to replace an opcode in a code segment in RAM with the breakpoint opcode.
INTERRUPT AND EXCEPTION HANDLING Interrupt 4—Overflow Exception (#OF) Exception Class Trap. Description Indicates that an overflow trap occurred when an INTO instruction was executed. The INTO instruction checks the state of the OF flag in the EFLAGS register. If the OF flag is set, an overflow trap is generated. Some arithmetic instructions (such as the ADD and SUB) perform both signed and unsigned arithmetic.
INTERRUPT AND EXCEPTION HANDLING Interrupt 5—BOUND Range Exceeded Exception (#BR) Exception Class Fault. Description Indicates that a BOUND-range-exceeded fault occurred when a BOUND instruction was executed. The BOUND instruction checks that a signed array index is within the upper and lower bounds of an array located in memory. If the array index is not within the bounds of the array, a BOUND-range-exceeded fault is generated. Exception Error Code None.
INTERRUPT AND EXCEPTION HANDLING Interrupt 6—Invalid Opcode Exception (#UD) Exception Class Fault. Description Indicates that the processor did one of the following things: • • Attempted to execute an invalid or reserved opcode. • Attempted to execute an MMX or SSE/SSE2/SSE3 instruction on an Intel 64 or IA-32 processor that does not support the MMX technology or SSE/SSE2/SSE3/SSSE3 extensions, respectively.
INTERRUPT AND EXCEPTION HANDLING processor and earlier IA-32 processors, this exception is not generated as the result of prefetching and preliminary decoding of an invalid instruction. (See Section 5.5, “Exception Classifications,” for general rules for taking of interrupts and exceptions.) The opcodes D6 and F1 are undefined opcodes reserved by the Intel 64 and IA-32 architectures. These opcodes, even though undefined, do not generate an invalid opcode exception.
INTERRUPT AND EXCEPTION HANDLING Interrupt 7—Device Not Available Exception (#NM) Exception Class Fault. Description Indicates one of the following things: The device-not-available exception is generated by either of three conditions: • The processor executed an x87 FPU floating-point instruction while the EM flag in control register CR0 was set (1). See the paragraph below for the special case of the WAIT/FWAIT instruction.
INTERRUPT AND EXCEPTION HANDLING Saved Instruction Pointer The saved contents of CS and EIP registers point to the floating-point instruction or the WAIT/FWAIT instruction that generated the exception. Program State Change A program-state change does not accompany a device-not-available fault, because the instruction that generated the exception is not executed.
INTERRUPT AND EXCEPTION HANDLING Interrupt 8—Double Fault Exception (#DF) Exception Class Abort. Description Indicates that the processor detected a second exception while calling an exception handler for a prior exception. Normally, when the processor detects another exception while trying to call an exception handler, the two exceptions can be handled serially. If, however, the processor cannot handle them serially, it signals the double-fault exception.
INTERRUPT AND EXCEPTION HANDLING A segment or page fault may be encountered while prefetching instructions; however, this behavior is outside the domain of Table 5-5. Any further faults generated while the processor is attempting to transfer control to the appropriate fault handler could still lead to a double-fault sequence. Table 5-5.
INTERRUPT AND EXCEPTION HANDLING Program State Change A program-state following a double-fault exception is undefined. The program or task cannot be resumed or restarted. The only available action of the double-fault exception handler is to collect all possible context information for use in diagnostics and then close the application and/or shut down or reset the processor.
INTERRUPT AND EXCEPTION HANDLING Interrupt 9—Coprocessor Segment Overrun Exception Class Abort. (Intel reserved; do not use. Recent IA-32 processors do not generate this exception.) Description Indicates that an Intel386 CPU-based systems with an Intel 387 math coprocessor detected a page or segment violation while transferring the middle portion of an Intel 387 math coprocessor operand.
INTERRUPT AND EXCEPTION HANDLING Interrupt 10—Invalid TSS Exception (#TS) Exception Class Fault. Description Indicates that there was an error related to a TSS. Such an error might be detected during a task switch or during the execution of instructions that use information from a TSS. Table 5-6 shows the conditions that cause an invalid TSS exception to be generated. Table 5-6.
INTERRUPT AND EXCEPTION HANDLING Table 5-6. Invalid TSS Conditions (Contd.) Error Code Index Invalid Condition Stack segment selector index The stack segment selector RPL != CPL. Code segment selector index The code segment selector exceeds descriptor table limit. Code segment selector index The code segment selector is NULL. Code segment selector index The code segment descriptor is not a code segment type. Code segment selector index The nonconforming code segment DPL != CPL.
INTERRUPT AND EXCEPTION HANDLING Exception Error Code An error code containing the segment selector index for the segment descriptor that caused the violation is pushed onto the stack of the exception handler. If the EXT flag is set, it indicates that the exception was caused by an event external to the currently running program (for example, if an external interrupt handler using a task gate attempted a task switch to an invalid TSS).
INTERRUPT AND EXCEPTION HANDLING Interrupt 11—Segment Not Present (#NP) Exception Class Fault. Description Indicates that the present flag of a segment or gate descriptor is clear. The processor can generate this exception during any of the following operations: • While attempting to load CS, DS, ES, FS, or GS registers. [Detection of a notpresent segment while loading the SS register causes a stack fault exception (#SS) to be generated.] This situation can occur while performing a task switch.
INTERRUPT AND EXCEPTION HANDLING tors for the segment selectors in a new TSS, the CS and EIP registers point to the first instruction in the new task. If the exception occurred while accessing a gate descriptor, the CS and EIP registers point to the instruction that invoked the access (for example a CALL instruction that references a call gate).
INTERRUPT AND EXCEPTION HANDLING Interrupt 12—Stack Fault Exception (#SS) Exception Class Fault. Description Indicates that one of the following stack related conditions was detected: • A limit violation is detected during an operation that refers to the SS register.
INTERRUPT AND EXCEPTION HANDLING not rely on being able to use the segment selectors found in the CS, SS, DS, ES, FS, and GS registers without causing another exception. The exception handler should check all segment registers before trying to resume the new task; otherwise, general protection faults may result later under conditions that are more difficult to diagnose.
INTERRUPT AND EXCEPTION HANDLING Interrupt 13—General Protection Exception (#GP) Exception Class Fault. Description Indicates that the processor detected one of a class of protection violations called “general-protection violations.” The conditions that cause this exception to be generated comprise all the protection violations that do not cause other exceptions to be generated (such as, invalid-TSS, segment-not-present, stack-fault, or page-fault exceptions).
INTERRUPT AND EXCEPTION HANDLING • • Loading the CR0 register with a set NW flag and a clear CD flag. • Attempting to access an interrupt or exception handler through an interrupt or trap gate from virtual-8086 mode when the handler’s code segment DPL is greater than 0. • • Attempting to write a 1 into a reserved bit of CR4. • • • Writing to a reserved bit in an MSR. Referencing an entry in the IDT (following an interrupt or exception) that is not an interrupt, trap, or task gate.
INTERRUPT AND EXCEPTION HANDLING • • A selector from a TSS involved in a task switch. IDT vector number. Saved Instruction Pointer The saved contents of CS and EIP registers point to the instruction that generated the exception. Program State Change In general, a program-state change does not accompany a general-protection exception, because the invalid instruction or operation is not executed.
INTERRUPT AND EXCEPTION HANDLING • If the segment descriptor pointed to by the segment selector in the destination operand is a code segment and it has both the D-bit and the L-bit set. • • If the segment descriptor from a 64-bit call gate is in non-canonical space. • • If the upper type field of a 64-bit call gate is not 0x0. • If an attempt is made to load null selector in the SS register in CPL3 and 64-bit mode.
INTERRUPT AND EXCEPTION HANDLING Interrupt 14—Page-Fault Exception (#PF) Exception Class Fault.
INTERRUPT AND EXCEPTION HANDLING — The RSVD flag indicates that the processor detected 1s in reserved bits of the page directory, when the PSE or PAE flags in control register CR4 are set to 1. Note: • The PSE flag is only available in recent Intel 64 and IA-32 processors including the Pentium 4, Intel Xeon, P6 family, and Pentium processors. • The PAE flag is only available on recent Intel 64 and IA-32 processors including the Pentium 4, Intel Xeon, and P6 family processors.
INTERRUPT AND EXCEPTION HANDLING violation, the access flag in the page-directory entry is set when the fault occurs. The behavior of IA-32 processors regarding the access flag in the corresponding page-table entry is model specific and not architecturally defined. Saved Instruction Pointer The saved contents of CS and EIP registers generally point to the instruction that generated the exception.
INTERRUPT AND EXCEPTION HANDLING Additional Exception-Handling Information Special care should be taken to ensure that an exception that occurs during an explicit stack switch does not cause the processor to use an invalid stack pointer (SS:ESP).
INTERRUPT AND EXCEPTION HANDLING Interrupt 16—x87 FPU Floating-Point Error (#MF) Exception Class Fault. Description Indicates that the x87 FPU has detected a floating-point error. The NE flag in the register CR0 must be set for an interrupt 16 (floating-point error exception) to be generated. (See Section 2.5, “Control Registers,” for a detailed description of the NE flag.) NOTE SIMD floating-point exceptions (#XF) are signaled through interrupt 19.
INTERRUPT AND EXCEPTION HANDLING Prior to executing a waiting x87 FPU instruction or the WAIT/FWAIT instruction, the x87 FPU checks for pending x87 FPU floating-point exceptions (as described in step 2 above). Pending x87 FPU floating-point exceptions are ignored for “non-waiting” x87 FPU instructions, which include the FNINIT, FNCLEX, FNSTSW, FNSTSW AX, FNSTCW, FNSTENV, and FNSAVE instructions. Pending x87 FPU exceptions are also ignored when executing the state management instructions FXSAVE and FXRSTOR.
INTERRUPT AND EXCEPTION HANDLING Interrupt 17—Alignment Check Exception (#AC) Exception Class Fault. Description Indicates that the processor detected an unaligned memory operand when alignment checking was enabled. Alignment checks are only carried out in data (or stack) accesses (not in code fetches or system segment accesses). An example of an alignment-check violation is a word stored at an odd byte address, or a doubleword stored at an address that is not an integer multiple of 4.
INTERRUPT AND EXCEPTION HANDLING To enable alignment checking, the following conditions must be true: • • • AM flag in CR0 register is set. AC flag in the EFLAGS register is set. The CPL is 3 (protected mode or virtual-8086 mode). Alignment-check exceptions (#AC) are generated only when operating at privilege level 3 (user mode).
INTERRUPT AND EXCEPTION HANDLING Interrupt 18—Machine-Check Exception (#MC) Exception Class Abort. Description Indicates that the processor detected an internal machine error or a bus error, or that an external agent detected a bus error. The machine-check exception is modelspecific, available only on the Pentium 4, Intel Xeon, P6 family, and Pentium processors.
INTERRUPT AND EXCEPTION HANDLING For the Pentium 4, Intel Xeon, P6 family, and Pentium processors, a program-state change always accompanies a machine-check exception, and an abort class exception is generated. For abort exceptions, information about the exception can be collected from the machine-check MSRs, but the program cannot generally be restarted.
INTERRUPT AND EXCEPTION HANDLING Interrupt 19—SIMD Floating-Point Exception (#XF) Exception Class Fault. Description Indicates the processor has detected an SSE/SSE2/SSE3 SIMD floating-point exception. The appropriate status flag in the MXCSR register must be set and the particular exception unmasked for this interrupt to be generated.
INTERRUPT AND EXCEPTION HANDLING Note that because SIMD floating-point exceptions are precise and occur immediately, the situation does not arise where an x87 FPU instruction, a WAIT/FWAIT instruction, or another SSE/SSE2/SSE3 instruction will catch a pending unmasked SIMD floatingpoint exception.
INTERRUPT AND EXCEPTION HANDLING Exception Error Code None. Saved Instruction Pointer The saved contents of CS and EIP registers point to the SSE/SSE2/SSE3 instruction that was executed when the SIMD floating-point exception was generated. This is the faulting instruction in which the error condition was detected. Program State Change A program-state change does not accompany a SIMD floating-point exception because the handling of the exception is immediate unless the particular exception is masked.
INTERRUPT AND EXCEPTION HANDLING Interrupts 32 to 255—User Defined Interrupts Exception Class Not applicable. Description Indicates that the processor did one of the following things: • Executed an INT n instruction where the instruction operand is one of the vector numbers from 32 through 255. • Responded to an interrupt request at the INTR pin or from the local APIC when the interrupt vector number associated with the request is from 32 through 255. Exception Error Code Not applicable.
INTERRUPT AND EXCEPTION HANDLING 5-68 Vol.
CHAPTER 6 TASK MANAGEMENT This chapter describes the IA-32 architecture’s task management facilities. These facilities are only available when the processor is running in protected mode. This chapter focuses on 32-bit tasks and the 32-bit TSS structure. For information on 16-bit tasks and the 16-bit TSS structure, see Section 6.6, “16-Bit Task-State Segment (TSS).” For information specific to task management in 64-bit mode, see Section 6.7, “Task Management in 64-bit Mode.” 6.
TASK MANAGEMENT Code Segment Task-State Segment (TSS) Data Segment Stack Segment (Current Priv. Level) Stack Seg. Priv. Level 0 Stack Seg. Priv. Level 1 Task Register CR3 Stack Segment (Priv. Level 2) Figure 6-1. Structure of a Task 6.1.2 Task State The following items define the state of the currently executing task: • The task’s current execution space, defined by the segment selectors in the segment registers (CS, DS, SS, ES, FS, and GS).
TASK MANAGEMENT 6.1.3 Executing a Task Software or the processor can dispatch a task for execution in one of the following ways: • • • • • A explicit call to a task with the CALL instruction. A explicit jump to a task with the JMP instruction. An implicit call (by the processor) to an interrupt-handler task. An implicit call to an exception-handler task. A return (initiated with an IRET instruction) when the NT flag in the EFLAGS register is set.
TASK MANAGEMENT page tables as other privilege-level-3 tasks can access code and corrupt data and the stack of other tasks. Use of task management facilities for handling multitasking applications is optional. Multitasking can be handled in software, with each software defined task executed in the context of a single IA-32 architecture task. 6.2 TASK MANAGEMENT DATA STRUCTURES The processor defines five data structures for handling task-related activities: • • • • • Task-state segment (TSS).
TASK MANAGEMENT 31 0 15 Reserved I/O Map Base Address LDT Segment Selector Reserved T 100 96 Reserved GS 92 Reserved FS 88 Reserved DS 84 Reserved SS 80 Reserved CS 76 Reserved ES 72 EDI 68 ESI 64 EBP 60 ESP 56 EBX 52 EDX 48 ECX 44 EAX 40 EFLAGS 36 EIP 32 CR3 (PDBR) 28 Reserved SS2 Reserved SS1 SS0 8 4 ESP0 Reserved 16 12 ESP1 Reserved 24 20 ESP2 Previous Task Link 0 Reserved bits. Set to 0. Figure 6-2.
TASK MANAGEMENT • EIP (instruction pointer) field — State of the EIP register prior to the task switch. • Previous task link field — Contains the segment selector for the TSS of the previous task (updated on a task switch that was initiated by a call, interrupt, or exception). This field (which is sometimes called the back link field) permits a task switch back to the previous task by using the IRET instruction. The processor reads the static fields, but does not normally change them.
TASK MANAGEMENT • Task switches are carried out faster if the pages containing these structures are present in memory before the task switch is initiated. 6.2.2 TSS Descriptor The TSS, like all other segments, is defined by a segment descriptor. Figure 6-3 shows the format of a TSS descriptor. TSS descriptors may only be placed in the GDT; they cannot be placed in an LDT or the IDT.
TASK MANAGEMENT The base, limit, and DPL fields and the granularity and present flags have functions similar to their use in data-segment descriptors (see Section 3.4.5, “Segment Descriptors”). When the G flag is 0 in a TSS descriptor for a 32-bit TSS, the limit field must have a value equal to or greater than 67H, one byte less than the minimum size of a TSS. Attempting to switch to a task whose TSS descriptor has a limit less than 67H generates an invalid-TSS exception (#TS).
TASK MANAGEMENT TSS (or LDT) Descriptor 31 13 12 Reserved 0 8 7 0 12 Reserved 31 0 8 Base Address 63:32 31 24 23 22 21 20 19 Base 31:24 A G 0 0 V L 31 16 15 14 13 12 11 Limit 19:16 P D P L 0 8 7 Type 16 15 Base Address 15:00 AVL B BASE DPL G LIMIT P TYPE 4 Base 23:16 0 0 Segment Limit 15:00 0 Available for use by system software Busy flag Segment Base Address Descriptor Privilege Level Granularity Segment Limit Segment Present Segment Type Figure 6-4.
TASK MANAGEMENT The LTR instruction loads a segment selector (source operand) into the task register that points to a TSS descriptor in the GDT. It then loads the invisible portion of the task register with information from the TSS descriptor. LTR is a privileged instruction that may be executed only when the CPL is 0. It’s used during system initialization to put an initial value in the task register. Afterwards, the contents of the task register are changed implicitly when a task switch occurs.
TASK MANAGEMENT 6.2.5 Task-Gate Descriptor A task-gate descriptor provides an indirect, protected reference to a task (see Figure 6-6). It can be placed in the GDT, an LDT, or the IDT. The TSS segment selector field in a task-gate descriptor points to a TSS descriptor in the GDT. The RPL in this segment selector is not used. The DPL of a task-gate descriptor controls access to the TSS descriptor during a task switch.
TASK MANAGEMENT to be handled by handler tasks. When an interrupt or exception vector points to a task gate, the processor switches to the specified task. Figure 6-7 illustrates how a task gate in an LDT, a task gate in the GDT, and a task gate in the IDT can all point to the same task. LDT GDT TSS Task Gate Task Gate TSS Descriptor IDT Task Gate Figure 6-7. Task Gates Referencing the Same Task 6.
TASK MANAGEMENT • • An interrupt or exception vector points to a task-gate descriptor in the IDT. The current task executes an IRET when the NT flag in the EFLAGS register is set. JMP, CALL, and IRET instructions, as well as interrupts and exceptions, are all mechanisms for redirecting a program. The referencing of a TSS descriptor or a task gate (when calling or jumping to a task) or the state of the NT flag (when executing an IRET instruction) determines whether a task switch occurs.
TASK MANAGEMENT 10. If the task switch was initiated with a CALL instruction, JMP instruction, an exception, or an interrupt, the processor sets the busy (B) flag in the new task’s TSS descriptor; if initiated with an IRET instruction, the busy (B) flag is left set. 11. Loads the task register with the segment selector and descriptor for the new task's TSS. 12. The TSS state is loaded into the processor.
TASK MANAGEMENT tasks are isolated by their separate address spaces and TSSs and because privilege rules control access to a TSS, software does not need to perform explicit privilege checks on a task switch. Table 6-1 shows the exception conditions that the processor checks for when switching tasks. It also shows the exception that is generated for each check if an error is detected and the segment that the error code references.
TASK MANAGEMENT Table 6-1. Exception Conditions Checked During a Task Switch (Contd.) Condition Checked Exception1 Error Code Reference2 DS, ES, FS, and GS segments are present in memory. #NP New Data Segment DS, ES, FS, and GS segment DPL greater than or equal to CPL (unless these are conforming segments). New Data Segment #TS NOTES: 1. #NP is segment-not-present exception, #GP is general-protection exception, #TS is invalid-TSS exception, and #SF is stack-fault exception. 2.
TASK MANAGEMENT Top Level Task Nested Task More Deeply Nested Task Currently Executing Task TSS TSS TSS EFLAGS NT=1 NT=1 NT=0 Previous Task Link Previous Task Link NT=1 Previous Task Link Task Register Figure 6-8. Nested Tasks Table 6-2 shows the busy flag (in the TSS segment descriptor), the NT flag, the previous task link field, and TS flag (in control register CR0) during a task switch. The NT flag may be modified by software executing at any privilege level.
TASK MANAGEMENT 6.4.1 Use of Busy Flag To Prevent Recursive Task Switching A TSS allows only one context to be saved for a task; therefore, once a task is called (dispatched), a recursive (or re-entrant) call to the task would cause the current state of the task to be lost. The busy flag in the TSS segment descriptor is provided to prevent re-entrant task switching and a subsequent loss of task state information. The processor manages the busy flag as follows: 1.
TASK MANAGEMENT 3. Clear the busy (B) flag in the TSS segment descriptor for the task being removed from the chain. If more than one task is being removed from the chain, the busy flag for each task being remove must be cleared. 4. Enable interrupts. In a multiprocessing system, additional synchronization and serialization operations must be added to this procedure to insure that the TSS and its segment descriptor are both locked when the previous task link field is changed and the busy flag is cleared.
TASK MANAGEMENT page directory for each task. Because the PDBR (control register CR3) is loaded on task switches, each task may have a different page directory. The linear address spaces of different tasks may map to completely distinct physical addresses. If the entries of different page directories point to different page tables and the page tables point to different pages of physical memory, then the tasks do not share physical addresses.
TASK MANAGEMENT physical-address space common to all tasks, then all tasks can share the data and code in those segments. • Through a shared LDT — Two or more tasks can use the same LDT if the LDT fields in their TSSs point to the same LDT. If some segment descriptors in a shared LDT point to segments that are mapped to a common area of the physical address space, the data and code in those segments can be shared among the tasks that share the LDT.
TASK MANAGEMENT 15 0 Task LDT Selector 42 DS Selector 40 SS Selector 38 CS Selector ES Selector 36 34 DI 32 SI 30 BP 28 SP 26 BX 24 DX 22 CX 20 AX 18 FLAG Word 16 IP (Entry Point) 14 SS2 12 SP2 10 SS1 8 SP1 6 SS0 4 SP0 2 Previous Task Link 0 Figure 6-10. 16-Bit TSS Format 6-22 Vol.
TASK MANAGEMENT 6.7 TASK MANAGEMENT IN 64-BIT MODE In 64-bit mode, task structure and task state are similar to those in protected mode. However, the task switching mechanism available in protected mode is not supported in 64-bit mode. Task management and switching must be performed by software. The processor issues a general-protection exception (#GP) if the following is attempted in 64-bit mode: • • Control transfer to a TSS or a task gate using JMP, CALL, INTn, or interrupt. An IRET with EFLAGS.
TASK MANAGEMENT 31 0 15 Reserved I/O Map Base Address Reserved 96 Reserved 92 IST7 (upper 32 bits) 88 IST7 (lower 32 bits) 84 IST6 (upper 32 bits) 80 IST6 (lower 32 bits) 76 IST5 (upper 32 bits) 72 IST5 (lower 32 bits) 68 IST4 (upper 32 bits) 64 IST4 (lower 32 bits) 60 IST3 (upper 32 bits) 56 IST3 (lower 32 bits) 52 IST2 (upper 32 bits) 48 IST2 (lower 32 bits) 44 IST1 (upper 32 bits) 40 IST1 (lower 32 bits) 36 Reserved 32 Reserved 28 RSP2 (upper 32 bits) 24 RSP2 (lo
CHAPTER 7 MULTIPLE-PROCESSOR MANAGEMENT The Intel 64 and IA-32 architectures provide mechanisms for managing and improving the performance of multiple processors connected to the same system bus. These include: • Bus locking and/or cache coherency management for performing atomic operations on system memory. • Serializing instructions. These instructions apply only to the Pentium 4, Intel Xeon, P6 family, and Pentium processors.
MULTIPLE-PROCESSOR MANAGEMENT mechanism for receiving interrupts and distributing them to available processors for servicing. • To increase system performance by exploiting the multi-threaded and multiprocess nature of contemporary operating systems and applications. The caching mechanism and cache consistency of Intel 64 and IA-32 processors are discussed in Chapter 10. The APIC architecture is described in Chapter 8.
MULTIPLE-PROCESSOR MANAGEMENT The mechanisms for handling locked atomic operations have evolved with the complexity of IA-32 processors. More recent IA-32 processors (such as the Pentium 4, Intel Xeon, and P6 family processors) and Intel 64 provide a more refined locking mechanism than earlier processors. These mechanisms are described in the following sections. 7.1.
MULTIPLE-PROCESSOR MANAGEMENT For the P6 and more recent processor families, if the memory area being accessed is cached internally in the processor, the LOCK# signal is generally not asserted; instead, locking is only applied to the processor’s caches (see Section 7.1.4, “Effects of a LOCK Operation on Internal Processor Caches”). 7.1.2.
MULTIPLE-PROCESSOR MANAGEMENT 7.1.2.2 Software Controlled Bus Locking To explicitly force the LOCK semantics, software can use the LOCK prefix with the following instructions when they are used to modify a memory location. An invalidopcode exception (#UD) is generated when the LOCK prefix is used with any other instruction or when no write operation is made to memory (that is, when the destination operand is in a register). • • • • The bit test and modify instructions (BTS, BTR, and BTC).
MULTIPLE-PROCESSOR MANAGEMENT reference weakly ordered memory types (such as the WC memory type) may not be serialized. Locked instructions should not be used to insure that data written can be fetched as instructions. NOTE The locked instructions for the current versions of the Pentium 4, Intel Xeon, P6 family, Pentium, and Intel486 processors allow data written to be fetched as instructions.
MULTIPLE-PROCESSOR MANAGEMENT The act of one processor writing data into the currently executing code segment of a second processor with the intent of having the second processor execute that data as code is called cross-modifying code. As with self-modifying code, IA-32 processors exhibit model-specific behavior when executing cross-modifying code, depending upon how far ahead of the executing processors current execution pointer the code has been modified.
MULTIPLE-PROCESSOR MANAGEMENT have cached the same area of memory from simultaneously modifying data in that area. 7.2 MEMORY ORDERING The term memory ordering refers to the order in which the processor issues reads (loads) and writes (stores) through the system bus to system memory. The Intel 64 and IA-32 architectures support several memory ordering models depending on the implementation of the architecture.
MULTIPLE-PROCESSOR MANAGEMENT 7.2.2 Memory Ordering in P6 and More Recent Processor Families The Intel Core 2 Duo, Intel Core Duo, Pentium 4, and P6 family processors also use a processor-ordered memory ordering model that can be further defined as “write ordered with store-buffer forwarding.” This model can be characterized as follows. In a single-processor system for memory regions defined as write-back cacheable, the following ordering rules apply: 1.
MULTIPLE-PROCESSOR MANAGEMENT respective code sequences are executed on the processors. The final values in location A, B, and C would possibly vary on each execution of the write sequence. The processor-ordering model described in this section is virtually identical to that used by the Pentium and Intel486 processors. The only enhancements in the Pentium 4, Intel Xeon, and P6 family processors are: • • Added support for speculative reads.
MULTIPLE-PROCESSOR MANAGEMENT from an external perspective, the string in a cache line by cache line mode. This results in the processor looping on issuing a cache-line read for the source address and an invalidation on the external bus for the destination address, knowing that all bytes in the destination cache line will be modified, for the length of the string. In this mode interrupts will only be accepted by the processor on cache line boundaries.
MULTIPLE-PROCESSOR MANAGEMENT Memory mapped devices and other I/O devices on the bus are often sensitive to the order of writes to their I/O buffers. I/O instructions can be used to (the IN and OUT instructions) impose strong write ordering on such accesses as follows. Prior to executing an I/O instruction, the processor waits for all previous instructions in the program to complete and for all buffered writes to drain to memory. Only instruction fetch and page tables walks can pass I/O instructions.
MULTIPLE-PROCESSOR MANAGEMENT • For areas of memory where weak ordering is acceptable, the write back (WB) memory type can be chosen. Here, reads can be performed speculatively and writes can be buffered and combined.
MULTIPLE-PROCESSOR MANAGEMENT 3. Let all processors invalidate the PTEs and PDEs modified in their TLBs. 4. End barrier — Resume all processors; resume general processing. Alternate, performance-optimized, TLB shootdown algorithms may be developed; however, care must be taken by the developers to ensure that either of the following conditions are met: • Different TLB mappings are not used on different processors during the update process.
MULTIPLE-PROCESSOR MANAGEMENT The following instructions are memory ordering instructions, not serializing instructions. These drain the data memory subsystem. They do not effect the instruction execution stream: • Non-privileged memory ordering instructions — SFENCE, LFENCE, and MFENCE. The SFENCE, LFENCE, and MFENCE instructions provide more granularity in controlling the serialization of memory loads and stores (see Section 7.2.4, “Strengthening or Weakening the Memory Ordering Model”).
MULTIPLE-PROCESSOR MANAGEMENT multiple-processor systems. (Here, multiple processors is defined as two or more processors.) The MP initialization protocol has the following important features: • It supports controlled booting of multiple processors without requiring dedicated system hardware. • It allows hardware to initiate the booting of a system without the need for a dedicated signal or a predefined boot processor.
MULTIPLE-PROCESSOR MANAGEMENT and APs are initialized, the BSP then begins executing the operating-system initialization code. Following a power-up or reset, the APs complete a minimal self-configuration, then wait for a startup signal (a SIPI message) from the BSP processor. Upon receiving a SIPI message, an AP executes the BIOS AP configuration code, which ends with the AP being placed in halt state.
MULTIPLE-PROCESSOR MANAGEMENT 3. Each logical processor executes its internal BIST simultaneously with the other logical processors on the system bus. 4. Upon completion of the BIST, the logical processors use a hardware-defined selection mechanism to select the BSP and the APs from the available logical processors on the system bus.
MULTIPLE-PROCESSOR MANAGEMENT increments the processor counter by 1. At the completion of the initialization procedure, the AP executes a CLI instruction and halts itself. 8. When each of the APs has gained access to the semaphore and executed the AP initialization code, the BSP establishes a count for the number of processors connected to the system bus, completes executing the BIOS boot-strap code, and then begins executing operating-system boot-strap and start-up code. 9.
MULTIPLE-PROCESSOR MANAGEMENT 2. Loads the microcode update into the processor. 3. Initializes the MTRRs. 4. Enables the caches. 5. Executes the CPUID instruction with a value of 0H in the EAX register, then reads the EBX, ECX, and EDX registers to determine if the BSP is “GenuineIntel.” 6. Executes the CPUID instruction with a value of 1H in the EAX register, then saves the values in the EAX, ECX, and EDX registers in a system configuration space in RAM for use later. 7.
MULTIPLE-PROCESSOR MANAGEMENT 14. Performs the following operation to set up the BSP to detect the presence of APs in the system and the number of processors: — Sets the value of the COUNT variable to 1. — Starts a timer (set for an approximate interval of 100 milliseconds). In the AP BIOS initialization code, the AP will increment the COUNT variable to indicate its presence. When the timer expires, the BSP checks the value of the COUNT variable.
MULTIPLE-PROCESSOR MANAGEMENT 5. Executes the CPUID instruction with a value of 0H in the EAX register, then reads the EBX, ECX, and EDX registers to determine if the AP is “GenuineIntel.” 6. Executes the CPUID instruction with a value of 1H in the EAX register, then saves the values in the EAX, ECX, and EDX registers in a system configuration space in RAM for use later. 7. Switches to protected mode and insures that the APIC address space is mapped to the strong uncacheable (UC) memory type. 8.
MULTIPLE-PROCESSOR MANAGEMENT multi-threading resource topology in an MP system (See Section 7.10.1, “Hierarchical Mapping of Shared Resources”). The initial APIC ID may consist of up to four bit-fields. In a non-clustered MP system, the field consists of up to three bit fields. Figure 7-2 shows two examples of APIC ID bit fields in earlier single-core processors. In single-core Intel Xeon processors, the APIC ID assigned to a logical processor during power-up and initialization is 8 bits.
MULTIPLE-PROCESSOR MANAGEMENT For P6 family processors, the APIC ID that is assigned to a processor during powerup and initialization is 4 bits (see Figure 7-2). Here, bits 0 and 1 form a 2-bit processor (or socket) identifier and bits 2 and 3 form a 2-bit cluster ID. 7.
MULTIPLE-PROCESSOR MANAGEMENT The CPUID feature flag may indicate support for hardware multi-threading when only one logical processor available in the package. In this case, the decimal value represented by bits 16 through 23 in the EBX register will have a value of 1. Software should note that the number of logical processors enabled by system software may be less than the value of “logical processors per package”.
MULTIPLE-PROCESSOR MANAGEMENT 7.7.3 Executing Multiple Threads on an Intel® 64 or IA-32 Processor Supporting Hardware Multi-Threading Upon completing the operating system boot-up procedure, the bootstrap processor (BSP) executes operating system code. Other logical processors are placed in the halt state. To execute a code stream (thread) on a halted logical processor, the operating system issues an interprocessor interrupt (IPI) addressed to the halted logical processor.
MULTIPLE-PROCESSOR MANAGEMENT Intel Processor with Intel Processor with Hyper-Threading Technology Hyper-Threading Technology Logical Logical Processor 0 Processor 1 Logical Logical Processor 0 Processor 1 Processor Core Processor Core Local APIC Local APIC Local APIC Local APIC Bus Interface Bus Interface IPIs Interrupt Messages Interrupt Messages IPIs Interrupt Messages Bridge PCI I/O APIC External Interrupts System Chip Set Figure 7-3.
MULTIPLE-PROCESSOR MANAGEMENT Logical Processor 0 Architectural State Logical Processor 1 Architectural State Execution Engine Local APIC Local APIC Bus Interface System Bus Figure 7-4. IA-32 Processor with Two Logical Processors Supporting HT Technology 7.8.1 State of the Logical Processors The following features are part of the architectural state of logical processors within Intel 64 or IA-32 processors supporting Hyper-Threading Technology.
MULTIPLE-PROCESSOR MANAGEMENT • Machine check global status (IA32_MCG_STATUS) and machine check capability (IA32_MCG_CAP) MSRs • • • Thermal clock modulation and ACPI Power management control MSRs • • Local APIC registers. Time stamp counter MSRs Most of the other MSR registers, including the page attribute table (PAT). See the exceptions below. Additional general purpose registers (R8-R15), XMM registers (XMM8-XMM15), control register, IA32_EFER on Intel 64 processors.
MULTIPLE-PROCESSOR MANAGEMENT it is running. See Section 10.11, “Memory Type Range Registers (MTRRs),” for information on setting up MTRRs. 7.8.4 Page Attribute Table (PAT) Each logical processor has its own PAT MSR (IA32_CR_PAT). However, as described in Section 10.12, “Page Attribute Table (PAT),” the PAT MSR settings must be the same for all processors in a system, including the logical processors. 7.8.
MULTIPLE-PROCESSOR MANAGEMENT 7.8.7 Performance Monitoring Counters Performance counters and their companion control MSRs are shared between the logical processors within the physical processor. As a result, software must manage the use of these resources. The performance counter interrupts, events, and precise event monitoring support can be set up and allocated on a per thread (per logical processor) basis. See Section 18.
MULTIPLE-PROCESSOR MANAGEMENT update simultaneously, the processor core provides the necessary synchronization needed to insure that only one update is performed at a time. Operating system microcode update drivers that adhere to Intel’s guidelines do not need to be modified to run on processors supporting Hyper-Threading Technology. 7.8.
MULTIPLE-PROCESSOR MANAGEMENT • CD flag in control register CR0 — Each logical processor has its own CR0 control register, and thus its own CD flag in CR0. The CD flags for the two logical processors are ORed together, such that when any logical processor sets its CD flag, the entire cache is nominally disabled. 7.8.13.2 Processor Translation Lookaside Buffers (TLBs) In processors supporting Hyper-Threading Technology, data cache TLBs are shared.
MULTIPLE-PROCESSOR MANAGEMENT management within the system. When the STPCLK# signal is asserted, the processor core transitions to the stop-grant state, where instruction execution is halted but the processor core continues to respond to snoop transactions. Regardless of whether the logical processors are active or halted when the STPCLK# signal is asserted, execution is stopped on both logical processors and neither will respond to interrupts.
MULTIPLE-PROCESSOR MANAGEMENT 7.9.1 Logical Processor Support The topological composition of processor cores and logical processors in a multi-core processor can be discovered using CPUID. Within each processor core, one or more logical processors may be available. System software must follow the requirement MP initialization sequences (see Section 7.5, “Multiple-Processor (MP) Initialization”) to recognize and enable logical processors.
MULTIPLE-PROCESSOR MANAGEMENT 7.9.5 MICROCODE UPDATE Resources Microcode update facilities are shared between two logical processors sharing a processor core if the physical package supports Hyper-Threading Technology. They are not shared between logical processors in different cores or different physical packages. Either logical processor that has access to the microcode update facility can initiate an update. Each logical processor has its own BIOS signature MSR (IA32_BIOS_SIGN_ID at MSR address 8BH).
MULTIPLE-PROCESSOR MANAGEMENT The decomposition of an initial APIC_ID may consist of 4 sub fields, matching 4 levels of hierarchy: • Cluster — Some multi-threading environments consists of multiple clusters of multi-processor systems. The CLUSTER_ID sub-field distinguishes different clusters. For non-clustered systems, CLUSTER_ID is usually 0. • Package — A multi-processor system consists of two or more sockets, each mates with a physical processor package.
MULTIPLE-PROCESSOR MANAGEMENT context (See Example 7-3) is limited to the number of logical processors enabled at runtime by the OS boot process. Table 7-1 shows the APIC IDs that are initially reported for logical processors in a system with four Intel Xeon MP processors that support Hyper-Threading Technology (a total of 8 logical processors, each physical package has two processor cores and supports Hyper-Threading Technology).
MULTIPLE-PROCESSOR MANAGEMENT Table 7-2 shows the initial APIC IDs for a hypothetical situation with a dual processor system. Each physical package providing two processor cores, and each processor core also supporting Hyper-Threading Technology. Table 7-2.
MULTIPLE-PROCESSOR MANAGEMENT The extraction algorithm (for three-level mappings of an initial APIC_ID) uses the following support routines (Example 7-1): 1. Detect capability for hardware multi-threading support in the processor. 2. Identify the maximum number of logical processors in a physical processor package. This is used to determine the topological relationship between logical processors and the physical package. 3. Identify the maximum number of processor cores in a physical processor package.
MULTIPLE-PROCESSOR MANAGEMENT if (vendor string EQ GenuineIntel) { return (feature_flag_edx & HWMT_BIT); // bit 28 } return 0; } 2. Find the Max number of logical processors per physical processor package. #define NUM_LOGICAL_BITS 0x00FF0000 // Use the mask above and CPUID.1.
MULTIPLE-PROCESSOR MANAGEMENT 4. Extract the initial APIC ID of a logical processor. #define INITIAL_APIC_ID_BITS 0xFF000000 // CPUID.1.EBX[31:24] initial APIC ID // Returns the 8-bit unique initial APIC ID for the processor running the code. // Software can use OS services to affinitize the current thread to each logical processor // available under the OS to gather the initial APIC_IDs for each logical processor.
MULTIPLE-PROCESSOR MANAGEMENT SubID = Full_ID & MaskBits; Return SubID; } Software must not assume local APIC_ID values in an MP system are consecutive. Non-consecutive local APIC_IDs may be the result of hardware configurations or debug features implemented in the BIOS or OS. An identifier for each hierarchical level can be extracted from an 8-bit APIC_ID using the support routines illustrated in Example 7-1.
MULTIPLE-PROCESSOR MANAGEMENT • To detect the number of physical packages: use PACKAGE_ID to identify those logical processors that reside in the same physical package. This is shown in Example 7-3b. This example also depicts a technique to construct a mask to represent the logical processors that reside in the same package. • To detect the number of processor cores: use CORE_ID to identify those logical processors that reside in the same core. This is shown in Example 7-3.
MULTIPLE-PROCESSOR MANAGEMENT Example 7-3. Compute the Number of Packages, Cores, and Processor Relationships in a MP System a) Assemble lists of PACKAGE_ID, CORE_ID, and SMT_ID of each enabled logical processors //The BIOS and/or OS may limit the number of logical processors available to applications // after system boot. The below algorithm will compute topology for the processors visible // to the thread that is computing it.
MULTIPLE-PROCESSOR MANAGEMENT PackageIDBucket is an array of unique PACKAGE_ID values. Allocate an array of NumStartedLPs count of entries in this array. PackageProcessorMask is a corresponding array of the bit mask of processors belonging to the same package, these are processors with the same PACKAGE_ID The algorithm below assumes there is symmetry across package boundary if more than one socket is populated in an MP system. // Bucket Package IDs and compute processor mask for every package.
MULTIPLE-PROCESSOR MANAGEMENT ProcessorMask = 1; CoreProcessorMask[0] = ProcessorMask; For (ProcessorNum = 1; ProcessorNum < NumStartedLPs; ProcessorNum++) { ProcessorMask << = 1; For (i=0; i < CoreNum; i++) { // we may be comparing bit-fields of logical processors residing in different // packages, the code below assume package symmetry If ((PackageID[ProcessorNum] | CoreID[ProcessorNum]) == CoreIDBucket[i]) { CoreProcessorMask[i] |= ProcessorMask; Break; // found in existing bucket, skip to next iteratio
MULTIPLE-PROCESSOR MANAGEMENT 7.11.1 HLT Instruction The HLT instruction stops the execution of the logical processor on which it is executed and places it in a halted state until further notice (see the description of the HLT instruction in Chapter 3 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2A). When a logical processor is halted, active logical processors continue to have full access to the shared resources within the physical package.
MULTIPLE-PROCESSOR MANAGEMENT Example 7-4. Verifying MONITOR/MWAIT Support boolean MONITOR_MWAIT_works = TRUE; try { _asm { xor ecx, ecx xor edx, edx mov eax, MemArea monitor } // Use monitor } except (UNWIND) { // if we get here, MONITOR/MWAIT is not supported MONITOR_MWAIT_works = FALSE; } 7.11.4 MONITOR/MWAIT Instruction Operating systems usually implement idle loops to handle thread synchronization.
MULTIPLE-PROCESSOR MANAGEMENT (due to a variety of events, including a store to the monitored memory region). If upon execution of MWAIT, monitor hardware is in a triggered state: MWAIT behaves as a NOP and execution continues at the next instruction in the execution stream. The state of monitor hardware is not architecturally visible except through the behavior of MWAIT. Multiple events other than a write to the triggering address range can cause a processor that executed MWAIT to wake up.
MULTIPLE-PROCESSOR MANAGEMENT 7.11.5 Monitor/Mwait Address Range Determination To use the MONITOR/MWAIT instructions, software should know the length of the region monitored by the MONITOR/MWAIT instructions and the size of the coherence line size for cache-snoop traffic in a multiprocessor system. This information can be queried using the CPUID monitor leaf function (EAX = 05H).
MULTIPLE-PROCESSOR MANAGEMENT 7.11.6.1 Use the PAUSE Instruction in Spin-Wait Loops Intel recommends that a PAUSE instruction be placed in all spin-wait loops that run on Intel processors supporting Hyper-Threading Technology and multi-core processors. Software routines that use spin-wait loops include multiprocessor synchronization primitives (spin-locks, semaphores, and mutex variables) and idle loops.
MULTIPLE-PROCESSOR MANAGEMENT IF (WorkQueue) THEN { // Schedule work at WorkQueue. } ELSE { // No work to do - wait in appropriate C-state handler depending // on Idle time accumulated IF (IdleTime >= IdleTimeThreshhold) THEN { // Call appropriate C1, C2, C3 state handler, C1 handler // shown below } } } // C1 handler uses a Halt instruction VOID C1Handler() { STI HLT } The MONITOR and MWAIT instructions may be considered for use in the C0 idle state loops, if MONITOR and MWAIT are supported. Example 7-6.
MULTIPLE-PROCESSOR MANAGEMENT MWAIT } } } } // C1 handler uses a Halt instruction. VOID C1Handler() { STI HLT } 7.11.6.3 Halt Idle Logical Processors If one of two logical processors is idle or in a spin-wait loop of long duration, explicitly halt that processor by means of a HLT instruction. In an MP system, operating systems can place idle processors into a loop that continuously checks the run queue for runnable software tasks.
MULTIPLE-PROCESSOR MANAGEMENT } ELSE { // No work to do - wait in appropriate C-state handler depending // on Idle time accumulated IF (IdleTime >= IdleTimeThreshhold) THEN { // Call appropriate C1, C2, C3 state handler, C1 // handler shown below } } } // C1 handler uses a Halt instruction VOID C1Handler() { MONITOR WorkQueue // Setup of eax with WorkQueue LinearAddress, // ECX, EDX = 0 IF (WorkQueue != 0) THEN { STI MWAIT // EAX, ECX = 0 } } 7.11.6.
MULTIPLE-PROCESSOR MANAGEMENT • Timing loops cause problems when they are calibrated on a IA-32 processor running at one clock speed and then executed on a processor running at another clock speed. • Routines for calibrating execution-based timing loops produce unpredictable results when run on an IA-32 processor supporting Hyper-Threading Technology. This is due to the sharing of execution resources between the logical processors within a physical package.
CHAPTER 8 ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) The Advanced Programmable Interrupt Controller (APIC), referred to in the following sections as the local APIC, was introduced into the IA-32 processors with the Pentium processor (see Section 17.26, “Advanced Programmable Interrupt Controller (APIC)”) and is included in the P6 family, Pentium 4, Intel Xeon processors, and other more recent Intel 64 and IA-32 processor families (see Section 8.4.2, “Presence of the Local APIC”).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Local APICs can receive interrupts from the following sources: • Locally connected I/O devices — These interrupts originate as an edge or level asserted by an I/O device that is connected directly to the processor’s local interrupt pins (LINT0 and LINT1). The I/O devices may also be connected to an 8259-type interrupt controller that is in turn connected to the processor through one of the local interrupt pins.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) writing to the ICR causes an IPI message to be generated and issued on the system bus (for Pentium 4 and Intel Xeon processors) or on the APIC bus (for Pentium and P6 family processors). See Section 8.2, “System Bus Vs. APIC Bus.” IPIs can be sent to other processors in the system or to the originating processor (self-interrupts).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Both the local APIC and the I/O APIC are designed to operate in MP systems (see Figures 8-2 and 8-3). Each local APIC handles interrupts from the I/O APIC, IPIs from processors on the system bus, and self-generated interrupts. Interrupts can also be delivered to the individual processors through the local interrupt pins; however, this mechanism is commonly not used in MP systems.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) The IPI mechanism is typically used in MP systems to send fixed interrupts (interrupts for a specific vector number) and special-purpose interrupts to processors on the system bus. For example, a local APIC can use an IPI to forward a fixed interrupt to another processor for servicing. Special-purpose IPIs (including NMI, INIT, SMI and SIPI IPIs) allow one or more processors on the system bus to perform systemwide boot-up and control functions.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.4 LOCAL APIC The following sections describe the architecture of the local APIC and how to detect it, identify it, and determine its status. Descriptions of how to program the local APIC are given in Section 8.5.1, “Local Vector Table,” and Section 8.6.1, “Interrupt Command Register (ICR).” 8.4.1 The Local APIC Block Diagram Figure 8-4 gives a functional block diagram for the local APIC.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) DATA/ADDR Version Register EOI Register Timer Task Priority Register Current Count Register Initial Count Register Processor Priority Register Divide Configuration Register Prioritizer INTA From CPU Core INTR To CPU Core EXTINT Local Vector Table Timer LINT0/1 Perf. Mon.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 8-1 shows how the APIC registers are mapped into the 4-KByte APIC register space. Registers are 32 bits, 64 bits, or 256 bits in width; all are aligned on 128-bit boundaries. All 32-bit registers should be accessed using 128-bit aligned 32-bit loads or stores. Some processors may support loads and stores of less than 32 bits to some of the APIC registers. This is model specific behavior and is not guaranteed to work on all processors.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 8-1. Local APIC Register Address Map (Contd.) Address Register Name Software Read/Write FEE0 0200H through FEE0 0270H Interrupt Request Register (IRR) Read Only. FEE0 0280H Error Status Register Read Only. FEE0 0290H through FEE0 02F0H Reserved FEE0 0300H Interrupt Command Register (ICR) [0-31] Read/Write. FEE0 0310H Interrupt Command Register (ICR) [32-63] Read/Write. FEE0 0320H LVT Timer Register Read/Write. 2 Read/Write.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) flags returned in the EDX register indicates the presence (set) or absence (clear) of a local APIC. 8.4.3 Enabling or Disabling the Local APIC The local APIC can be enabled or disabled in either of two ways: 1. Using the APIC global enable/disable flag in the IA32_APIC_BASE MSR (MSR address 1BH; see Figure 8-5): — When IA32_APIC_BASE[11] is 0, the processor is functionally equivalent to an IA-32 processor without an on-chip APIC.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.4.4 Local APIC Status and Location The status and location of the local APIC are contained in the IA32_APIC_BASE MSR (see Figure 8-5). MSR bit functions are described below: • BSP flag, bit 8 ⎯ Indicates if the processor is the bootstrap processor (BSP). See Section 7.5, “Multiple-Processor (MP) Initialization.” Following a power-up or RESET, this flag is set to 1 for the processor selected as the BSP and set to 0 for the remaining processors (APs).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.4.6 Local APIC ID At power up, system hardware assigns a unique APIC ID to each local APIC on the system bus (for Pentium 4 and Intel Xeon processors) or on the APIC bus (for P6 family and Pentium processors). The hardware assigned APIC ID is based on system topology and includes encoding for socket position and cluster information (see Figure 7-2). In MP systems, the local APIC ID is also used as a processor ID by the BIOS and the operating system.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.4.7 Local APIC State The following sections describe the state of the local APIC and its registers following a power-up or RESET, after the local APIC has been software disabled, following an INIT reset, and following an INIT-deassert message. 8.4.7.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) • The reception or transmission of any IPIs that are in progress when the local APIC is disabled are completed before the local APIC enters the software-disabled state. • The mask bits for all the LVT entries are set. Attempts to reset these bits will be ignored. • (For Pentium and P6 family processors) The local APIC continues to listen to all bus messages in order to keep its arbitration ID synchronized with the rest of the system. 8.4.7.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) returned in the Max LVT field is 5; for the P6 family processors (which have 5 LVT entries), the value returned is 4; for the Pentium processor (which has 4 LVT entries), the value returned is 3. 31 24 23 Reserved 16 15 Max. LVT Entry 0 8 7 Reserved Version Value after reset: 000N 00VVH V = Version, N = # of LVT entries minus 1 Address: FEE0 0030H Figure 8-7. Local APIC Version Register 8.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) • LVT LINT1 Register (FEE0 0360H) — Specifies interrupt delivery when an interrupt is signaled at the LINT1 pin. • LVT Error Register (FEE0 0370H) — Specifies interrupt delivery when the APIC detects an internal error (see Section 8.5.3, “Error Handling”). The LVT performance counter register and its associated interrupt were introduced in the P6 processors and are also present in the Pentium 4 and Intel Xeon processors.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 18 17 16 15 13 12 11 8 7 0 Timer Vector Address: FEE0 0320H Value after Reset: 0001 0000H Timer Mode 0: One-shot 1: Periodic Delivery Status 0: Idle 1: Send Pending Mask† 0: Not Masked 1: Masked Interrupt Input Pin Polarity Delivery Mode 000: Fixed 010: SMI 100: NMI 111: ExtlNT 101: INIT All other combinations are Reserved Remote IRR Trigger Mode 0: Edge 1: Level 31 17 11 10 8 7 0 LINT0 Vector LINT1 Vector Error Vector Performance
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) The setup information that can be specified in the registers of the LVT table is as follows: Vector Interrupt vector number. Delivery Mode Specifies the type of interrupt to be sent to the processor. Some delivery modes will only operate as intended when used in conjunction with a specific trigger mode. The allowable delivery modes are as follows: 000 (Fixed) Delivers the interrupt specified in the vector field.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Remote IRR Flag (Read Only) For fixed mode, level-triggered interrupts; this flag is set when the local APIC accepts the interrupt for servicing and is reset when an EOI command is received from the processor. The meaning of this flag is undefined for edge-triggered interrupts and other delivery modes. Trigger Mode Selects the trigger mode for the local LINT0 and LINT1 pins: (0) edge sensitive and (1) level sensitive.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.5.3 Error Handling The local APIC provides an error status register (ESR) that it uses to record errors that it detects when handling interrupts (see Figure 8-9). An APIC error interrupt is generated when the local APIC sets one of the error bits in the ESR. The LVT error register allows selection of the interrupt vector to be delivered to the processor core when APIC error is detected.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8 7 6 5 4 3 2 1 0 31 Reserved Illegal Register Address1 Received Illegal Vector Send Illegal Vector Reserved Receive Accept Error2 Send Accept Error2 Receive Checksum Error2 Send Checksum Error2 Address: FEE0 0280H Value after reset: 0H NOTES: 1. Only used in the Pentium 4, Intel Xeon, and P6 family processors; reserved in the Pentium processor. 2. Only used in the P6 family and Pentium processors; reserved in the Pentium 4 and Intel Xeon processors.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 4 3 2 1 0 31 Reserved Address: FEE0 03E0H Value after reset: 0H 0 Divide Value (bits 0, 1 and 3) 000: Divide by 2 001: Divide by 4 010: Divide by 8 011: Divide by 16 100: Divide by 32 101: Divide by 64 110: Divide by 128 111: Divide by 1 Figure 8-10. Divide Configuration Register 31 0 Initial Count Current Count Address: Initial Count FEE0 0380H Current Count FEE0 0390H Value after reset: 0H Figure 8-11.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.5.5 Local Interrupt Acceptance When a local interrupt is sent to the processor core, it is subject to the acceptance criteria specified in the interrupt acceptance flow chart in Figure 8-17. If the interrupt is accepted, it is logged into the IRR register and handled by the processor according to its priority (see Section 8.8.4, “Interrupt Acceptance for Fixed Interrupts”). If the interrupt is not accepted, it is sent back to the local APIC and retried.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 63 56 55 32 Destination Field Reserved 31 20 19 18 17 16 15 14 13 12 11 10 Reserved Destination Shorthand 00: No Shorthand 01: Self 10: All Including Self 11: All Excluding Self Reserved 8 7 0 Vector Delivery Mode 000: Fixed 001: Lowest Priority1 010: SMI 011: Reserved 100: NMI 101: INIT 110: Start Up 111: Reserved Destination Mode 0: Physical 1: Logical Delivery Status 0: Idle 1: Send Pending Address: FEE0 0300H (0 - 31) FEE0 0310H (32 - 63)
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) ability for a processor to send a lowest priority IPI is model specific and should be avoided by BIOS and operating system software. 010 (SMI) Delivers an SMI interrupt to the target processor or processors. The vector field must be programmed to 00H for future compatibility. 011 (Reserved) 100 (NMI) Delivers an NMI interrupt to the target processor or processors. The vector information is ignored.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Destination Mode Selects either physical (0) or logical (1) destination mode (see Section 8.6.2, “Determining IPI Destination”). Delivery Status (Read Only) Indicates the IPI delivery status, as follows: 0 (Idle) There is currently no IPI activity for this local APIC, or the previous IPI sent from this local APIC was delivered and accepted by the target processor or processors.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) destination field set to FH for Pentium and P6 family processors and to FFH for Pentium 4 and Intel Xeon processors. 11: (All Excluding Self) The IPI is sent to all processors in a system with the exception of the processor sending the IPI. The APIC broadcasts a message with the physical destination mode and destination field set to 0xFH for Pentium and P6 family processors and to 0xFFH for Pentium 4 and Intel Xeon processors.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 8-3. Valid Combinations for the Pentium 4 and Intel Xeon Processors’ Local xAPIC Interrupt Command Register (Contd.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Table 8-4. Valid Combinations for the P6 Family Processors’ Local APIC Interrupt Command Register (Contd.) Destination Shorthand Valid/ Invalid Trigger Mode Delivery Mode Destination Mode All excluding Self Valid2 Level Fixed, Lowest Priority1, NMI X 5 All excluding Self Invalid Level SMI, Start-Up X All excluding Self 3 Level INIT X Level SMI, Start-Up X Valid 5 X Invalid NOTES: 1.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) How the ICR, LDR, and DFR are used to select an IPI destination depends on the destination mode used: physical, logical, broadcast/self, or lowest-priority delivery mode. These destination modes are described in the following sections. 8.6.2.1 Physical Destination Mode In physical destination mode, the destination processor is specified by its local APIC ID (see Section 8.4.6, “Local APIC ID”).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 24 23 0 Logical APIC ID Reserved Address: 0FEE0 00D0H Value after reset: 0000 0000H Figure 8-13. Logical Destination Register (LDR) Figure 8-14 shows the layout of the destination format register (DFR). The 4-bit model field in this register selects one of two models (flat or cluster) that can be used to interpret the MDA when using logical destination mode.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) MDA are compared with bits 28 through 31 of the LDR to determine if a local APIC is part of the cluster. Bits 24 through 27 of the MDA are compared with Bits 24 through 27 of the LDR to identify a local APICs within the cluster.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.6.2.4 Lowest Priority Delivery Mode With lowest priority delivery mode, the ICR is programmed to send an IPI to several processors on the system bus, using the logical or shorthand destination mechanism for selecting the processor. The selected processors then arbitrate with one another over the system bus or the APIC bus, with the lowest-priority processor accepting the IPI.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) the focus of an interrupt if it is currently servicing that interrupt or if it has a pending request for that interrupt. For Intel Xeon processors, the concept of a focus processor is not supported. In operating systems that use the lowest priority delivery mode but do not update the TPR, the TPR information saved in the chipset will potentially cause the interrupt to be always delivered to the same processor from the logical set.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Section 8.10, “APIC Bus Message Passing Mechanism and Protocol (P6 Family, Pentium Processors),” describes the APIC bus arbitration protocols and bus message formats, while Section 8.6.1, “Interrupt Command Register (ICR),” describes the INIT level de-assert IPI message. Note that except for the SIPI IPI (see Section 8.6.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 4. When interrupts are pending in the IRR and ISR register, the local APIC dispatches them to the processor one at a time, based on their priority and the current task and processor priorities in the TPR and PPR (see Section 8.8.3.1, “Task and Processor Priorities”). 5.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Wait to Receive No Discard Belong to Yes Is it NMI/SMI/INIT / Yes Accept No Lowest Fixed Delivery P6 Family No Set Status Yes Yes Am I Yes Accept Other Yes Discard Is Interrupt Is Status No No No Set Status No Accept Is Interrupt Slot Yes No Arbitrate Am I Yes Accept Figure 8-17. Interrupt Acceptance Flow Chart for the Local APIC (P6 Family and Pentium Processors) 1.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) interrupt, or one of the MP protocol IPI messages (BIPI, FIPI, and SIPI), the interrupt is sent directly to the processor core for handling. 3. If the local APIC determines that it is the designated destination for the interrupt but the interrupt request is not one of the interrupts given in step 2, the local APIC looks for an open slot in one of its two pending interrupt queues contained in the IRR and ISR registers (see Figure 8-20).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) of vectors within a priority group, the vector number is often divided into two parts, with the high 4 bits of the vector indicating its priority and the low 4 bit indicating its ranking within the priority group. 8.8.3.1 Task and Processor Priorities The local APIC also defines a task priority and a processor priority that it uses in determining the order in which interrupts should be handled.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 8 7 4 3 0 Reserved Address: FEE0 00A0H Value after reset: 0H Processor Priority Processor Priority Sub-Class Figure 8-19. Processor Priority Register (PPR) Its value in the PPR is computed as follows: IF TPR[7:4] ≥ ISRV[7:4] THEN PPR[7:0] ← TPR[7:0] ELSE PPR[7:4] ← ISRV[7:4] PPR[3:0] ← 0 Here, the ISRV value is the vector number of the highest priority ISR bit that is set, or 00H if no ISR bit is set.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 255 16 15 0 Reserved IRR Reserved ISR Reserved TMR Addresses: IRR FEE0 0200H - FEE0 0270H ISR FEE0 0100H - FEE0 0170H TMR FEE0 0180H - FEE0 01F0H Value after reset: 0H Figure 8-20. IRR, ISR and TMR Registers The IRR contains the active interrupt requests that have been accepted, but not yet dispatched to the processor for servicing.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) bit is cleared for edge-triggered interrupts and set for level-triggered interrupts. If a TMR bit is set when an EOI cycle for its corresponding interrupt vector is generated, an EOI message is sent to all I/O APICs. 8.8.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) The TPR (shown in Figure 8-18) is cleared to 0 on reset. In 64-bit mode, software can read and write the TPR using an alternate interface, MOV CR8 instruction. The new priority level is established when the MOV CR8 instruction completes execution. Software does not need to force serialization after loading the TPR using MOV CR8. Use of the MOV CRn instruction requires a privilege level of 0.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.9 SPURIOUS INTERRUPT A special situation may occur when a processor raises its task priority to be greater than or equal to the level of the interrupt for which the processor INTR signal is currently being asserted. If at the time the INTA cycle is issued, the interrupt that was to be dispensed has become masked (programmed by software), the local APIC will deliver a spurious-interrupt vector.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 31 10 9 8 7 0 Reserved Focus Processor Checking1 0: Enabled 1: Disabled APIC Software Enable/Disable 0: APIC Disabled 1: APIC Enabled Spurious Vector2 Address: FEE0 00F0H Value after reset: 0000 00FFH 1. Not supported in Pentium 4 and Intel Xeon processors. 2. For the P6 family and Pentium processors, bits 0 through 3 of the spurious vector are hardwired to 1. Figure 8-23. Spurious-Interrupt Vector Register (SVR) 8.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) issues an EOI message simultaneously. In the latter case, the APICs sending the EOI messages arbitrate using their arbitration priorities. If the APICs are set up to use “lowest priority” arbitration (see Section 8.6.2.4, “Lowest Priority Delivery Mode”) and multiple APICs are currently executing at the lowest priority (the value in the APR register), the arbitration priorities (unique values in the Arb ID register) are used to break ties.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8.11.1 Message Address Register Format The format of the Message Address Register (lower 32-bits) is shown in Figure 8-24. 31 20 19 0FEEH 12 11 Destination ID 4 Reserved 3 2 RH DM 1 0 XX Figure 8-24. Layout of the MSI Message Address Register Fields in the Message Address Register are as follows: 1. Bits 31-20 — These bits contain a fixed value for interrupt messages (0FEEH).
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) APIC ID is considered for delivery of that interrupt (this means no re-direction). If RH is 1 and DM is 1, the Destination ID Field is interpreted as in logical destination mode and the redirection is limited to only those processors that are part of the logical group of processors based on the processor’s logical APIC ID and the Destination ID field in the message.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) Reserved fields are not assumed to be any value. Software must preserve their contents on writes. Other fields in the Message Data Register are described below. 1. Vector — This 8-bit field contains the interrupt vector associated with the message. Values range from 010H to 0FEH. Software must guarantee that the field is not programmed with vector 00H to 0FH. 2. Delivery Mode — This 3-bit field specifies how the interrupt receipt is handled.
ADVANCED PROGRAMMABLE INTERRUPT CONTROLLER (APIC) 8-50 Vol.
CHAPTER 9 PROCESSOR MANAGEMENT AND INITIALIZATION This chapter describes the facilities provided for managing processor wide functions and for initializing the processor. The subjects covered include: processor initialization, x87 FPU initialization, processor configuration, feature determination, mode switching, the MSRs (in the Pentium, P6 family, Pentium 4, and Intel Xeon processors), and the MTRRs (in the P6 family, Pentium 4, and Intel Xeon processors). 9.
PROCESSOR MANAGEMENT AND INITIALIZATION The software-initialization code performs all system-specific initialization of the BSP or primary processor and the system logic. At this point, for MP (or DP) systems, the BSP (or primary) processor wakes up each AP (or secondary) processor to enable those processors to execute self-configuration code. When all processors are initialized, configured, and synchronized, the BSP or primary processor begins executing an initial operating-system or executive task.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-1.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-1. IA-32 Processor States Following Power-up, Reset, or INIT (Contd.
PROCESSOR MANAGEMENT AND INITIALIZATION Paging disabled: 0 Caching disabled: 1 Not write-through disabled: 1 Alignment check disabled: 0 Write-protect disabled: 0 31 30 29 28 P C N GDW 19 18 17 16 15 Reserved A M 6 5 4 3 2 1 0 W P N T E M P 1 E S MP E Reserved External x87 FPU error reporting: 0 (Not used): 1 No task switch: 0 x87 FPU instructions not trapped: 0 WAIT/FWAIT instructions not trapped: 0 Real-address mode: 0 Figure 9-1. Contents of CR0 Register after Reset 9.1.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.1.4 First Instruction Executed The first instruction that is fetched and executed following a hardware reset is located at physical address FFFFFFF0H. This address is 16 bytes below the processor’s uppermost physical address. The EPROM containing the softwareinitialization code must be located at this address. The address FFFFFFF0H is beyond the 1-MByte addressable range of the processor while in real-address mode.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-2. Recommended Settings of EM and MP Flags on IA-32 Processors EM MP NE IA-32 processor 1 0 1 0 1 1 or 0* Pentium 4, Intel Xeon, P6 family, Pentium, Intel486™ DX, and Intel 487 SX processors, and Intel386 DX and Intel386 SX processors when a companion math coprocessor is present. 0 1 1 or 0* More recent Intel 64 or IA-32 processors Intel486™ SX, Intel386™ DX, and Intel386™ SX processors only, without the presence of a math coprocessor.
PROCESSOR MANAGEMENT AND INITIALIZATION floating-point instruction. (Table 9-2 shows when it is appropriate to use this flag.) Setting this flag has two functions: • It allows x87 FPU code to run on an IA-32 processor that has neither an integrated x87 FPU nor is connected to an external math coprocessor, by using a floating-point emulator.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.4 MODEL-SPECIFIC REGISTERS (MSRS) Most IA-32 processors (starting from Pentium processors) and Intel 64 processors contain a model-specific registers (MSRs). A given MSR may not be supported across all families and models for Intel 64 and IA-32 processors. Some MSRs are designated as architectural to simplify software programming; a feature introduced by an architectural MSR is expected to be supported in future processors.
PROCESSOR MANAGEMENT AND INITIALIZATION See Section 10.11, “Memory Type Range Registers (MTRRs),” for detailed information on the MTRRs. 9.6 INITIALIZING SSE/SSE2/SSE3/SSSE3 EXTENSIONS For processors that contain SSE/SSE2/SSE3/SSSE3 extensions, steps must be taken when initializing the processor to allow execution of these instructions. 1.
PROCESSOR MANAGEMENT AND INITIALIZATION mode. The protected-mode data structures that must be loaded are described in Section 9.8, “Software Initialization for Protected-Mode Operation.” 9.7.1 Real-Address Mode IDT In real-address mode, the only system data structure that must be loaded into memory is the IDT (also called the “interrupt vector table”). By default, the address of the base of the IDT is physical address 0H.
PROCESSOR MANAGEMENT AND INITIALIZATION modules into memory to support reliable operation of the processor in protected mode. These data structures include the following: • • • • • • A IDT. • One or more code modules that contain the necessary interrupt and exception handlers. A GDT. A TSS. (Optional) An LDT. If paging is to be used, at least one page directory and one page table. A code segment that contains the code to be executed when the processor switches to protected mode.
PROCESSOR MANAGEMENT AND INITIALIZATION descriptors in the GDT. Some operating systems allocate new segments and LDTs as they are needed. This provides maximum flexibility for handling a dynamic programming environment. However, many operating systems use a single LDT for all tasks, allocating GDT entries in advance. An embedded system, such as a process controller, might pre-allocate a fixed number of segments and LDTs for a fixed number of application programs.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.8.4 Initializing Multitasking If the multitasking mechanism is not going to be used and changes between privilege levels are not allowed, it is not necessary load a TSS into memory or to initialize the task register. If the multitasking mechanism is going to be used and/or changes between privilege levels are allowed, software initialization code must load at least one TSS and an accompanying TSS descriptor.
PROCESSOR MANAGEMENT AND INITIALIZATION following instructions must be located in an identity-mapped page (until such time that a branch to non-identity mapped pages can be effected). 64-bit mode paging tables must be located in the first 4 GBytes of physical-address space prior to activating IA-32e mode. This is necessary because the MOV CR3 instruction used to initialize the page-directory base must be executed in legacy mode prior to activating IA-32e mode (setting CR0.PG = 1 to enable paging).
PROCESSOR MANAGEMENT AND INITIALIZATION Non-maskable interrupts (NMI) must be disabled using external hardware. 9.8.5.3 64-bit Mode and Compatibility Mode Operation IA-32e mode uses two code segment-descriptor bits (CS.L and CS.D, see Figure 3-8) to control the operating modes after IA-32e mode is initialized. If CS.L = 1 and CS.D = 0, the processor is running in 64-bit mode. With this encoding, the default operand size is 32 bits and default address size is 64 bits.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.9 MODE SWITCHING To use the processor in protected mode after hardware or software reset, a mode switch must be performed from real-address mode. Once in protected mode, software generally does not need to return to real-address mode. To run software written to run in real-address mode (8086 mode), it is generally more convenient to run the software in virtual-8086 mode, than to switch back to real-address mode. 9.9.
PROCESSOR MANAGEMENT AND INITIALIZATION 6. Execute the LTR instruction to load the task register with a segment selector to the initial protected-mode task or to a writable area of memory that can be used to store TSS information on a task switch. 7. After entering protected mode, the segment registers continue to hold the contents they had in real-address mode. The JMP or CALL instruction in step 4 resets the CS register.
PROCESSOR MANAGEMENT AND INITIALIZATION — Byte granular (G = 0) — Expand up (E = 0) — Writable (W = 1) — Present (P = 1) — Base = any value The segment registers must be loaded with non-null segment selectors or the segment registers will be unusable in real-address mode. Note that if the segment registers are not reloaded, execution continues using the descriptor attributes loaded during protected mode. 5.
PROCESSOR MANAGEMENT AND INITIALIZATION Figure 9-3 shows the physical memory layout for the processor following a hardware reset and the starting point of this example. The EPROM that contains the initialization code resides at the upper end of the processor’s physical memory address range, starting at address FFFFFFFFH and going down from there. The address of the first instruction to be executed is at FFFFFFF0H, the default starting address for the processor following a hardware reset.
PROCESSOR MANAGEMENT AND INITIALIZATION After Reset [CS.BASE+EIP] FFFF FFFFH FFFF FFF0H 64K EPROM EIP = 0000 FFF0H CS.BASE = FFFF 0000H DS.BASE = 0H ES.BASE = 0H SS.BASE = 0H ESP = 0H [SP, DS, SS, ES] FFFF 0000H 0 Figure 9-3. Processor State After Reset Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing STARTUP.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-4. Main Initialization Steps in STARTUP.ASM Source Listing (Contd.) STARTUP.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.10.2 STARTUP.ASM Listing Example 9-1 provides high-level sample code designed to move the processor into protected mode. This listing does not include any opcode and offset information. Example 9-1. STARTUP.ASM MS-DOS* 5.0(045-N) 386(TM) MACRO ASSEMBLER STARTUP PAGE 1 09:44:51 08/19/92 MS-DOS 5.0(045-N) 386(TM) MACRO ASSEMBLER V4.0, ASSEMBLY OF MODULE STARTUP OBJECT MODULE PLACED IN startup.obj ASSEMBLER INVOKED BY: f:\386tools\ASM386.EXE startup.
PROCESSOR MANAGEMENT AND INITIALIZATION 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 ; RAM_START will contain the linear address of the first ; free byte above the copied tables - this may be useful if ; a memory manager is used.
PROCESSOR MANAGEMENT AND INITIALIZATION 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 SS_reg DW ? SS_h DW ? DS_reg DW ? DS_h DW ? FS_reg DW ? FS_h DW ? GS_reg DW ? GS_h DW ? LDT_reg DW ? LDT_h DW ? TRAP_reg DW ? IO_map_baseDW ? TASK_STATE ENDS ; basic structure of a descriptor DESC STRUC lim_0_15 DW ? bas_0_15 DW ? bas_16_23 DB ? access DB ? gran DB ? bas_24_31 DB ? DESC ENDS ; structure for use with LGDT and
PROCESSOR MANAGEMENT AND INITIALIZATION 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 ; ------------------------- DATA SEGMENT---------------------; Initially, this data segment starts at linear 0, according ; to the processor’s power-up state.
PROCESSOR MANAGEMENT AND INITIALIZATION 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 ; DS,ES address the bottom 64K of flat linear memory ASSUME DS:STARTUP_DATA, ES:STARTUP_DATA ; See Figure 9-4 ; load GDTR with temporary GDT LEA EBX,TEMP_GDT ; build the TEMP_GDT in low ram, MOV DWORD PTR [EBX],0 ; where we can address MOV DWORD PTR [EBX]+4,0 MOV DWORD PTR [EBX]+8, LINEAR_PROTO_LO MOV DWORD PTR [EBX]+12, LINEAR_PROTO_HI MOV TEMP_GDT_scratch.
PROCESSOR MANAGEMENT AND INITIALIZATION 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 9-28 Vol. 3A MOV ADD MOV MOV MOVZX MOV INC MOV MOV ADD REP MOVS ; fixup MOV MOV ROR MOV MOV ECX, CS_BASE ECX, OFFSET (GDT_EPROM) ESI, [ECX].table_linear EDI,EAX ECX, [ECX].table_lim APP_GDT_ram[EBX].table_lim,CX ECX EDX,EAX APP_GDT_ram[EBX].
PROCESSOR MANAGEMENT AND INITIALIZATION 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 MOV MOV MOV MOV MOV MOV ROL MOV MOV LSL INC MOV ADD REP MOVS ; move the TSS EDI,EAX EBX,TSS_INDEX*SIZE(DESC) ECX,GDT_DESC_OFF ;build linear address for TSS GS,CX DH,GS:[EBX].bas_24_31 DL,GS:[EBX].bas_16_23 EDX,16 DX,GS:[EBX].
PROCESSOR MANAGEMENT AND INITIALIZATION 289 290 291 292 293 PUSH MOV MOV MOV MOV DWORD PTR [EDX].EIP_reg AX,[EDX].DS_reg BX,[EDX].ES_reg DS,AX ; DS and ES no longer linear memory ES,BX 294 295 ; simulate far jump to initial task 296 IRETD 297 298 STARTUP_CODE ENDS *** WARNING #377 IN 298, (PASS 2) SEGMENT CONTAINS PRIVILEGED INSTRUCTION(S) 299 300 END STARTUP, DS:STARTUP_DATA, SS:STARTUP_DATA 301 302 ASSEMBLY COMPLETE, 9-30 Vol. 3A 1 WARNING, NO ERRORS.
PROCESSOR MANAGEMENT AND INITIALIZATION FFFF FFFFH START: [CS.BASE+EIP] FFFF 0000H • Jump near start • Construct TEMP_GDT • LGDT • Move to protected mode DS, ES = GDT[1] 4 GB Base Limit GDT [1] GDT [0] Base=0, Limit=4G 0 GDT_SCRATCH TEMP_GDT Figure 9-4. Constructing Temporary GDT and Switching to Protected Mode (Lines 162-172 of List File) Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION FFFF FFFFH TSS IDT GDT • Move the GDT, IDT, TSS from ROM to RAM • Fix Aliases • LTR TSS RAM IDT RAM GDT RAM RAM_START 0 Figure 9-5. Moving the GDT, IDT, and TSS from ROM to RAM (Lines 196-261 of List File) 9-32 Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION SS = TSS.SS ESP = TSS.ESP PUSH TSS.EFLAG PUSH TSS.CS PUSH TSS.EIP ES = TSS.ES DS = TSS.DS IRET • • EIP EFLAGS • • • ESP • ES CS SS DS GDT IDT Alias GDT Alias 0 TSS RAM IDT RAM GDT RAM RAM_START Figure 9-6. Task Switching (Lines 282-296 of List File) Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.10.3 MAIN.ASM Source Code The file MAIN.ASM shown in Example 9-2 defines the data and stack segments for this application and can be substituted with the main module task written in a highlevel language that is invoked by the IRET instruction executed by STARTUP.ASM. Example 9-2. MAIN.ASM NAME main_module data SEGMENT RW dw 1000 dup(?) DATA ENDS stack stackseg 800 CODE SEGMENT ER use32 PUBLIC main_start: nop nop nop CODE ENDS END main_start, ds:data, ss:stack 9.
PROCESSOR MANAGEMENT AND INITIALIZATION Example 9-4. Build File INIT_BLD_EXAMPLE; SEGMENT , ; *SEGMENTS(DPL = 0) startup.startup_code(BASE = 0FFFF0000H) TASK BOOT_TASK(OBJECT = startup, INITIAL,DPL = 0, NOT INTENABLED) PROTECTED_MODE_TASK(OBJECT = main_module,DPL = 0, NOT INTENABLED) , ; TABLE GDT ( LOCATION = GDT_EPROM , ENTRY = ( 10: PROTECTED_MODE_TASK , startup.startup_code , startup.startup_data , main_module.data , main_module.code , main_module.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-5 shows the relationship of each build item with an ASM source file. Table 9-5. Relationship Between BLD Item and ASM Source File Item ASM386 and Startup.A58 BLD386 Controls and BLD file Effect Bootstrap public startup startup: bootstrap start(startup) Near jump at 0FFFFFFF0H to start. GDT location public GDT_EPROM GDT_EPROM TABLE_REG <> TABLE GDT(location = GDT_EPROM) The location of the GDT will be programmed into the GDT_EPROM location.
PROCESSOR MANAGEMENT AND INITIALIZATION Update Loader New Update Update Blocks CPU BIOS Figure 9-7. Applying Microcode Updates 9.11.1 Microcode Update A microcode update consists of an Intel-supplied binary that contains a descriptive header and data. No executable code resides within the update. Each microcode update is tailored for a specific list of processor signatures. A mismatch of the processor’s signature with the signature contained in the update will result in a failure to load.
PROCESSOR MANAGEMENT AND INITIALIZATION NOTE The optional extended signature table is supported starting with processor family 0FH, model 03H. . Table 9-6. Microcode Update Field Definitions Field Name Offset (bytes) Length (bytes) Description Header Version 0 4 Version number of the update header. Update Revision 4 4 Unique version number for the update, the basis for the update signature provided by the processor to indicate the current update functioning within the processor.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-6. Microcode Update Field Definitions (Contd.) Field Name Offset (bytes) Length (bytes) Description Reserved 36 12 Reserved fields for future expansion Update Data 48 Data Size or 2000 Update data Extended Signature Count Data Size + 48 4 Specifies the number of extended signature structures (Processor Signature[n], processor flags[n] and checksum[n]) that exist in this microcode update.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-7.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.2 Optional Extended Signature Table The extended signature table is a structure that may be appended to the end of the encrypted data when the encrypted data only supports a single processor signature (optional case). The extended signature table will always be present when the encrypted data supports multiple processor steppings and/or models (required case).
PROCESSOR MANAGEMENT AND INITIALIZATION a processor signature embedded in the microcode update with the processor signature returned by CPUID will cause the BIOS to reject the update. Example 9-5 shows how to check for a valid processor signature match between the processor and microcode update. Example 9-5. Pseudo Code to Validate the Processor Signature ProcessorSignature ← CPUID(1):EAX If (Update.HeaderVersion == 00000001h) { // first check the ProcessorSignature field If (ProcessorSignature == Update.
PROCESSOR MANAGEMENT AND INITIALIZATION The three platform ID bits, when read as a binary coded decimal (BCD) number, indicate the bit position in the microcode update header’s processor flags field associated with the installed processor. The processor flags in the 48-byte header and the processor flags field associated with the extended processor signature structures may have multiple bits set. Each set bit represents a different platform ID that the update supports.
PROCESSOR MANAGEMENT AND INITIALIZATION Example 9-6. Pseudo Code Example of Processor Flags Test Flag ← 1 << IA32_PLATFORM_ID[52:50] If (Update.HeaderVersion == 00000001h) { If (Update.ProcessorFlags & Flag) { Load Update } Else { // // Assume the Data Size has been used to calculate the // location of Update.ProcessorSignature[N] and a match // on Update.ProcessorSignature[N] has already succeeded // If (Update.ProcessorFlags[n] & Flag) { Load Update } } } 9-44 Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.5 Microcode Update Checksum Each microcode update contains a DWORD checksum located in the update header. It is software’s responsibility to ensure that a microcode update is not corrupt. To check for a corrupt microcode update, software must perform a unsigned DWORD (32-bit) checksum of the microcode update. Even though some fields are signed, the checksum procedure treats all DWORDs as unsigned.
PROCESSOR MANAGEMENT AND INITIALIZATION Example 9-8.
PROCESSOR MANAGEMENT AND INITIALIZATION all processors in the system. If a system design permits multiple steppings of Pentium 4, Intel Xeon, and P6 family processors to exist concurrently; then the BIOS must verify individual processors against the update header information to ensure appropriate loading. Given these considerations, it is most practical to load the update during MP initialization. 9.11.6.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.7 Update Signature and Verification The Pentium 4, Intel Xeon, and P6 family processors provide capabilities to verify the authenticity of a particular update and to identify the current update revision. This section describes the model-specific extensions of processors that support this feature. The update verification method below assumes that the BIOS will only verify an update that is more recent than the revision currently loaded in the processor.
PROCESSOR MANAGEMENT AND INITIALIZATION IA32_BIOS_SIGN_ID Microcode Update Signature Register MSR Address: Default Value: Access: 08BH Accessed as a Qword XXXX XXXX XXXX XXXXh Read/Write The IA32_BIOS_SIGN_ID register is used to report the microcode update signature when CPUID executes. The signature is returned in the upper DWORD (Table 9-11). Table 9-11. Microcode Update Signature Bit 63:32 31:0 9.11.7.2 Description Microcode update signature.
PROCESSOR MANAGEMENT AND INITIALIZATION This authentication procedure relies upon the decoding provided by the processor to verify an update from a potentially hostile source. As an example, this mechanism in conjunction with other safeguards provides security for dynamically incorporating field updates into the BIOS. 9.11.
PROCESSOR MANAGEMENT AND INITIALIZATION may be implemented as a setup option to clear all NVRAM slots or as BIOS code that searches and eliminates unused entries during boot. NOTES For IA-32 processors starting with family 0FH and model 03H and Intel 64 processors, the microcode update may be as large as 16 KBytes. Thus, BIOS must allocate 8 update blocks for each microcode update. In a MP system, a common microcode update may be sufficient for each socket in the system.
PROCESSOR MANAGEMENT AND INITIALIZATION { Load Update.UpdateData into the Processor; Verify update was correctly loaded into the processor Go on to next processor Break; } Else If (Update.TotalSize > (Update.DataSize + 48)) { N ← 0 While (N < Update.ExtendedSignatureCount) { If ((Update.ProcessorSignature[N] == Processor Signature) && (Update.ProcessorFlags[N] & Platform Bits)) { Load Update.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.8.2 Responsibilities of the Calling Program This section of the document lists the responsibilities of a calling program using the interface specifications to load microcode update(s) into BIOS NVRAM. • The calling program should call the INT 15H, 0D042H functions from a pure real mode program and should be executing on a system that is running in pure real mode.
PROCESSOR MANAGEMENT AND INITIALIZATION Send Broadcast Message to all processors except self via APIC Have all processors execute CPUID and record the Processor Signature (i.e.
PROCESSOR MANAGEMENT AND INITIALIZATION } If (INVALID_REVISION) returned { Display Message: More recent update already loaded in NVRAM for this stepping continue } If any other error returned { Display Diagnostic exit } // // Verify the update was loaded correctly // Issue the ReadUpdate function If an error occurred { Display Diagnostic exit } // // Compare the Update read to that written // If (Update read != Update written) { Display Diagnostic exit } I ← I + (size of microcode update / 2048) } // // En
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.8.3 Microcode Update Functions Table 9-12 defines current Pentium 4, Intel Xeon, and P6 family processor microcode update functions. Table 9-12. Microcode Update Functions Microcode Update Function Function Number Presence test Description Required/Optional 00H Returns information about the supported functions. Required Write update data 01H Writes one of the update data areas (slots).
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.8.5 Function 00H—Presence Test This function verifies that the BIOS has implemented required microcode update functions. Table 9-13 lists the parameters and return codes for the function. Table 9-13. Parameters for the Presence Test Input AX Function Code 0D042H BL Sub-function 00H - Presence test CF Carry Flag Carry Set - Failure - AH contains status Carry Clear - All return values valid AH Return Code AL OEM Error Additional OEM information.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.8.6 Function 01H—Write Microcode Update Data This function integrates a new microcode update into the BIOS storage device. Table 9-14 lists the parameters and return codes for the function. Table 9-14. Parameters for the Write Update Data Function Input AX Function Code 0D042H BL Sub-function 01H - Write update ES:DI Update Address Real Mode pointer to the Intel Update structure.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-14. Parameters for the Write Update Data Function (Contd.) Input CPU_NOT_PRESENT The processor stepping does not currently exist in the system. INVALID_HEADER The update header contains a header or loader version that is not recognized by the BIOS. INVALID_HEADER_CS The update does not checksum correctly. SECURITY_FAILURE The processor rejected the update. INVALID_REVISION The same or more recent revision of the update exists in the storage device.
PROCESSOR MANAGEMENT AND INITIALIZATION Finally, before storing the proposed update in NVRAM, the BIOS must verify the authenticity of the update via the mechanism described in Section 9.11.6, “Microcode Update Loader.” This includes loading the update into the current processor, executing the CPUID instruction, reading MSR 08Bh, and comparing a calculated value with the update revision in the proposed update header for equality.
PROCESSOR MANAGEMENT AND INITIALIZATION Write Microcode Update Does Update Match A CPU in The System No Return CPU_NOT_PRESENT No Return INVALID_HEADER No Return INVALID_HEADER No Return INVALID_HEADER_CS Yes Valid Update Header Version? Yes Loader Revision Match BIOS’s Loader? Yes Does Update Checksum Correctly? 1 Figure 9-8. Microcode Update Write Operation Flow [1] Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION 1 Update Matching CPU Already In NVRAM? No Space Available in NVRAM? Yes Yes Update Revision Newer Than NVRAM Update? No Return INVALID_REVISION Replacement No policy implemented? No Return STORAGE_FULL Yes Update Pass Authenticity Test? Return SECURITY_FAILURE Yes Update NMRAM Record Return SUCCESS Figure 9-9. Microcode Update Write Operation Flow [2] 9-62 Vol.
PROCESSOR MANAGEMENT AND INITIALIZATION 9.11.8.7 Function 02H—Microcode Update Control This function enables loading of binary updates into the processor. Table 9-15 lists the parameters and return codes for the function. Table 9-15. Parameters for the Control Update Sub-function Input AX Function Code 0D042H BL Sub-function 02H - Control update BH Task See the description below.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-16. Mnemonic Values Mnemonic Value Meaning Enable 1 Enable the Update loading at initialization time. Query 2 Determine the current state of the update control without changing its status. The READ_FAILURE error code returned by this function has meaning only if the control function is implemented in the BIOS NVRAM. The state of this feature (enabled/disabled) can also be implemented using CMOS RAM bits where READ failure errors cannot occur. 9.11.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-17. Parameters for the Read Microcode Update Data Function (Contd.) Carry Clear - All return values are valid. AH Return Code Status of the Call AL OEM Error Additional OEM Information Return Codes (see Table 9-18 for code definitions) SUCCESS The function completed successfully. READ_FAILURE There was a failure because of the inability to read the storage device.
PROCESSOR MANAGEMENT AND INITIALIZATION Table 9-18. Return Code Definitions Return Code Value Description SUCCESS 00H The function completed successfully. NOT_IMPLEMENTED 86H The function is not implemented. ERASE_FAILURE 90H A failure because of the inability to erase the storage device. WRITE_FAILURE 91H A failure because of the inability to write the storage device. READ_FAILURE 92H A failure because of the inability to read the storage device.
CHAPTER 10 MEMORY CACHE CONTROL This chapter describes the memory cache and cache control mechanisms, the TLBs, and the store buffer in Intel 64 and IA-32 processors. It also describes the memory type range registers (MTRRs) introduced in the P6 family processors and how they are used to control caching of physical memory locations. 10.
MEMORY CACHE CONTROL Table 10-1. Characteristics of the Caches, TLBs, Store Buffer, and Write Combining Buffer in Intel 64 and IA-32 Processors Cache or Buffer 1 Trace Cache Characteristics - Pentium 4 and Intel Xeon processors: 12 Kμops, 8-way set associative. - Intel Core 2 Duo, Intel Core Duo, Intel Core Solo, Pentium M processor: not implemented. - P6 family and Pentium processors: not implemented. L1 Instruction Cache - Pentium 4 and Intel Xeon processors: not implemented.
MEMORY CACHE CONTROL Table 10-1. Characteristics of the Caches, TLBs, Store Buffer, and Write Combining Buffer in Intel 64 and IA-32 Processors (Contd.) Cache or Buffer Characteristics Instruction TLB (Large Pages) - Intel Core 2 Duo processors: 4 entries, 4 ways. - Pentium 4 and Intel Xeon processors: large pages are fragmented. - Intel Core Duo, Intel Core Solo, Pentium M processor: 2 entries, fully associative. - P6 family processors: 2 entries, fully associative.
MEMORY CACHE CONTROL The L2 and L3 caches are unified data and instruction caches located on the processor chip. Note that the L3 cache is only implemented on some Intel Xeon processors. • P6 family processors — The L1 cache is divided into two sections: one dedicated to caching instructions (pre-decoded instructions) and the other to caching data. The L2 cache is a unified data and instruction cache located on the processor chip. P6 family processors do not implement a trace cache.
MEMORY CACHE CONTROL The processor’s caches are for the most part transparent to software. When enabled, instructions and data flow through these caches without the need for explicit software control. However, knowledge of the behavior of these caches may be useful in optimizing software performance. For example, knowledge of cache dimensions and replacement algorithms gives an indication of how large of a data structure can be operated on at once without causing cache thrashing.
MEMORY CACHE CONTROL accesses to system memory and to their internal caches. They use this snooping ability to keep their internal caches consistent both with system memory and with the caches in other processors on the bus.
MEMORY CACHE CONTROL Table 10-2. Memory Types and Their Properties Memory Type and Mnemonic Cacheable Writeback Allows Cacheable Speculative Reads Memory Ordering Model Strong Uncacheable (UC) No No No Strong Ordering Uncacheable (UC-) No No No Strong Ordering. Can only be selected through the PAT. Can be overridden by WC in MTRRs. Write Combining (WC) No No Yes Weak Ordering. Available by programming MTRRs or by selecting it through the PAT.
MEMORY CACHE CONTROL idated. Write combining is allowed. This type of cache-control is appropriate for frame buffers or when there are devices on the system bus that access system memory, but do not perform snooping of memory accesses. It enforces coherency between caches in the processors and system memory. • Write-back (WB) — Writes and reads to and from system memory are cached. Reads come from cache lines on cache hits; read misses cause cache fills. Speculative reads are allowed.
MEMORY CACHE CONTROL 10.3.1 Buffering of Write Combining Memory Locations Writes to the WC memory type are not cached in the typical sense of the word cached. They are retained in an internal write combining buffer (WC buffer) that is separate from the internal L1, L2, and L3 caches and the store buffer. The WC buffer is not snooped and thus does not provide data coherency.
MEMORY CACHE CONTROL completely full WC buffer will always be propagated as a single 32-bit burst transaction using any chunk order. In a WC buffer eviction where data will be evicted as partials, all data contained in the same chunk (0 mod 8 aligned) will be propagated simultaneously. Likewise, for more recent processors starting with those based on Intel NetBurst microarchitectures, a full WC buffer will always be propagated as a single burst transactions, using any chunk order within a transaction.
MEMORY CACHE CONTROL write-back characteristics of data. These instructions allow software to use weakly ordered or processor ordered memory types to improve processor performance, but when necessary to force strong ordering on memory reads and/or writes. They also allow software greater control over the caching of data. For a description of these instructions and there intended use, see Section 10.5.5, “Cache Management Instructions.” 10.3.
MEMORY CACHE CONTROL Table 10-4. MESI Cache Line States Cache Line State M (Modified) This cache line is valid? Yes E (Exclusive) S (Shared) I (Invalid) Yes Yes No The memory copy is… Out of date Valid Valid — Copies exist in caches of other processors? No No Maybe Maybe A write to this line … Does not go to Does not go to Causes the the system bus. the system bus. processor to gain exclusive ownership of the line. Goes directly to the system bus.
MEMORY CACHE CONTROL 10.5.1 Cache Control Registers and Bits Figure 10-2 depicts cache-control mechanisms in IA-32 processors. Other than for the matter of memory address space, these work the same in Intel 64 processors. The Intel 64 and IA-32 architectures provide the following cache-control registers and bits for use in enabling or restricting caching to various pages or regions in memory: • CD flag, bit 30 of control register CR0 — Controls caching of system memory locations (see Section 2.
MEMORY CACHE CONTROL CR4 P G E Enables global pages designated with G flag CR3 P P C W D T Physical Memory FFFFFFFFH2 PAT4 Control caching of page directory PAT controls caching of virtual memory pages Page-Directory or Page-Table Entry CR0 P4 1 P P A G C W T D T C N D W CD and NW Flags control overall caching of system memory MTRRs3 PCD and PWT flags control page-level caching G flag controls pagelevel flushing of TLBs 0 MTRRs control caching of selected regions of physical memory IA32_MIS
MEMORY CACHE CONTROL Table 10-5. Cache Operating Modes CD NW Caching and Read/Write Policy L1 L2/L31 0 0 Normal Cache Mode. Highest performance cache operation. • Read hits access the cache; read misses may cause replacement. • Write hits update the cache. • Only writes to shared lines and write misses update system memory. Yes Yes Yes Yes Yes Yes • Write misses cause cache line fills.
MEMORY CACHE CONTROL Table 10-5. Cache Operating Modes CD NW Caching and Read/Write Policy 1 1 Memory coherency is not maintained.2 • (P6 family and Pentium processors.) State of the processor after a power up or reset. • Read hits access the cache; read misses do not cause replacement. • Write hits update the cache and change exclusive lines to modified. • Shared lines remain shared after write hit. • Write misses access memory.
MEMORY CACHE CONTROL enabled and the CD flag in control register CR0 is clear. The PCD flag enables caching of the page table or page when clear and prevents caching when set. • PWT flag in the page-directory and page-table entries — Controls the write policy for individual page tables and pages, respectively (see Section 3.7.6, “Page-Directory and Page-Table Entries”). This flag only has effect when paging is enabled and the NW flag in control register CR0 is clear.
MEMORY CACHE CONTROL 10.5.2 Precedence of Cache Controls The cache control flags and MTRRs operate hierarchically for restricting caching. That is, if the CD flag is set, caching is prevented globally (see Table 10-5). If the CD flag is clear, the page-level cache control flags and/or the MTRRs can be used to restrict caching. If there is an overlap of page-level and MTRR caching controls, the mechanism that prevents caching has precedence.
MEMORY CACHE CONTROL Table 10-6. Effective Page-Level Memory Type for Pentium Pro and Pentium II Processors MTRR Memory Type1 PCD Value PWT Value Effective Memory Type UC X X UC WC WT WP WB 0 0 WC 0 1 WC 1 0 WC 1 1 UC 0 X WT 1 X UC 0 0 WP 0 1 WP 1 0 WC 1 1 UC 0 0 WB 0 1 WT 1 X UC NOTE: 1.
MEMORY CACHE CONTROL 10.5.2.2 Selecting Memory Types for Pentium III and More Recent Processor Families The Intel Core 2 Duo, Intel Core Duo, Intel Core Solo, Pentium M, Pentium 4, Intel Xeon, and Pentium III processors use the PAT to select effective page-level memory types. Here, a memory type for a page is selected by the MTRRs and the value in a PAT entry that is selected with the PAT, PCD and PWT bits in a page-table or pagedirectory entry (see Section 10.12.
MEMORY CACHE CONTROL Table 10-7. Effective Page-Level Memory Types for Pentium III and More Recent Processor Families (Contd.) MTRR Memory Type PAT Entry Value Effective Memory Type WB UC UC2 UC- UC2 WC WC WT WT WB WB WP WP WP UC UC2 UC- WC3 WC WC WT WT3 WB WP WP WP NOTES: 1. The UC attribute comes from the MTRRs and the processors are not required to snoop their caches since the data could never have been cached. This attribute is preferred for performance reasons. 2.
MEMORY CACHE CONTROL 10.5.3 Preventing Caching To disable the L1, L2, and L3 caches after they have been enabled and have received cache fills, perform the following steps: 1. Enter the no-fill cache mode. (Set the CD flag in control register CR0 to 1 and the NW flag to 0. 2. Flush all caches using the WBINVD instruction. 3.
MEMORY CACHE CONTROL and CLFLUSH instructions and the non-temporal move instructions (MOVNTI, MOVNTQ, MOVNTDQ, MOVNTPS, and MOVNTPD), which were introduced in SSE/SSE2 extensions, offer more granular control over caching. The INVD and WBINVD instructions are used to invalidate the contents of the L1, L2, and L3 caches. The INVD instruction invalidates all internal cache entries, then generates a special-function bus cycle that indicates that external caches also should be invalidated.
MEMORY CACHE CONTROL 10.5.6.1 Adaptive Mode Adaptive mode facilitates L1 data cache sharing between logical processors. When running in adaptive mode, the L1 data cache is shared across logical processors in the same core if: • • CR3 control registers for logical processors sharing the cache are identical. The same paging mode is used by logical processors sharing the cache. In this situation, the entire L1 data cache is available to each logical processor (instead of being competitively shared).
MEMORY CACHE CONTROL CPUID instruction, before the modified instruction is executed, which will automatically resynchronize the instruction cache and prefetch queue. (See Section 7.1.3, “Handling Self- and Cross-Modifying Code,” for more information about the use of self-modifying code.
MEMORY CACHE CONTROL 10.8 EXPLICIT CACHING The Pentium III processor introduced four new instructions, the PREFETCHh instructions, that provide software with explicit control over the caching of data. These instructions provide “hints” to the processor that the data requested by a PREFETCHh instruction should be read into cache hierarchy now or as soon as possible, in anticipation of its use.
MEMORY CACHE CONTROL See Section 3.12, “Translation Lookaside Buffers (TLBs),” for additional information about the TLBs. 10.10 STORE BUFFER Intel 64 and IA-32 processors temporarily store each write (store) to memory in a store buffer. The store buffer improves processor performance by allowing the processor to continue executing instructions without having to wait until a write to memory and/or to a cache is complete.
MEMORY CACHE CONTROL type of memory that is contained in each range. Table 10-8 shows the memory types that can be specified and their properties; Figure 10-3 shows the mapping of physical memory with MTRRs. See Section 10.3, “Methods of Caching Available,” for a more detailed description of each memory type. Following a hardware reset, the P6 and more recent processor families disable all the fixed and variable MTRRs, which in effect makes all of physical memory uncachable.
MEMORY CACHE CONTROL Physical Memory FFFFFFFFH Address ranges not mapped by an MTRR are set to a default type 8 variable ranges (from 4 KBytes to maximum size of physical memory) 64 fixed ranges (4 KBytes each) 16 fixed ranges (16 KBytes each) 8 fixed ranges (64-KBytes each) 256 KBytes 256 KBytes 100000H FFFFFH C0000H BFFFFH 80000H 7FFFFH 512 KBytes 0 Figure 10-3. Mapping Physical Memory With MTRRs 10.11.1 MTRR Feature Identification The availability of the MTRR feature is model-specific.
MEMORY CACHE CONTROL • FIX (fixed range registers supported) flag, bit 8 — Fixed range MTRRs (IA32_MTRR_FIX64K_00000 through IA32_MTRR_FIX4K_0F8000) are supported when set; no fixed range registers are supported when clear. • WC (write combining) flag, bit 10 — The write-combining (WC) memory type is supported when set; the WC type is not supported when clear. Bit 9 and bits 11 through 63 in the IA32_MTRRCAP MSR are reserved.
MEMORY CACHE CONTROL nonexistent memory locations, it can either be specified as the default type in the Type field or be explicitly assigned with the fixed and variable MTRRs. 63 0 12 11 10 9 8 7 Reserved E F E Type E — MTRR enable/disable FE — Fixed-range MTRRs enable/disable Type — Default memory type Reserved Figure 10-5. IA32_MTRR_DEF_TYPE MSR • FE (fixed MTRRs enabled) flag, bit 10 — Fixed-range MTRRs are enabled when set; fixed-range MTRRs are disabled when clear.
MEMORY CACHE CONTROL C0000H to FFFFFH. This range is divided into sixty-four 4-KByte sub-ranges, 8 ranges per register. Table 10-9 shows the relationship between the fixed physical-address ranges and the corresponding fields of the fixed-range MTRRs; Table 10-8 shows memory type encoding for MTRRs. For the P6 family processors, the prefix for the fixed range MTRRs is MTRRfix. 10.11.2.
MEMORY CACHE CONTROL Figure 10-6 shows flags and fields in these registers. The functions of these flags and fields are: • Type field, bits 0 through 7 — Specifies the memory type for the range (see Table 10-8 for the encoding of this field). • PhysBase field, bits 12 through (MAXPHYADDR-1) — Specifies the base address of the address range.
MEMORY CACHE CONTROL IA32_MTRR_PHYSBASEn Register 63 MAXPHYADDR 12 11 Reserved 0 8 7 PhysBase Type PhysBase — Base address of range Type — Memory type for range IA32_MTRR_PHYSMASKn Register 63 MAXPHYADDR Reserved PhysMask 12 11 10 V 0 Reserved PhysMask — Sets range mask V — Valid Reserved MAXPHYADDR: The bit position indicated by MAXPHYADDR depends on the maximum physical address range supported by the processor. It is reported by CPUID leaf function 80000008H.
MEMORY CACHE CONTROL See Section 10.11.4.1, “MTRR Precedences,” for information on overlapping variable MTRR ranges. 10.11.3 Example Base and Mask Calculations The examples in this section apply to processors that support a maximum physical address size of 36 bits. The base and mask values entered in variable-range MTRR pairs are 24-bit values that the processor extends to 36-bits.
MEMORY CACHE CONTROL IA32_MTRR_PHYSBASE1 = 0000 0000 0400 0006H IA32_MTRR_PHYSMASK1 = 0000 000F FE00 0800H Caches 64-96 MByte as WB cache type. IA32_MTRR_PHYSBASE2 = 0000 0000 0600 0006H IA32_MTRR_PHYSMASK2 = 0000 000F FFC0 0800H Caches 96-100 MByte as WB cache type. IA32_MTRR_PHYSBASE3 = 0000 0000 0400 0000H IA32_MTRR_PHYSMASK3 = 0000 000F FFC0 0800H Caches 64-68 MByte as UC cache type.
MEMORY CACHE CONTROL IA32_MTRR_PHYSBASE2 = 0000 0000 0600 0006H IA32_MTRR_PHYSMASK2 = 0000 00FF FFC0 0800H Caches 96-100 MByte as WB cache type. IA32_MTRR_PHYSBASE3 = 0000 0000 0400 0000H IA32_MTRR_PHYSMASK3 = 0000 00FF FFC0 0800H Caches 64-68 MByte as UC cache type. IA32_MTRR_PHYSBASE4 = 0000 0000 00F0 0000H IA32_MTRR_PHYSMASK4 = 0000 00FF FFF0 0800H Caches 15-16 MByte as UC cache type.
MEMORY CACHE CONTROL c. If two or more variable memory ranges match and one of the memory types is UC, the UC memory type used. d. If two or more variable memory ranges match and the memory types are WT and WB, the WT memory type is used. e. For overlaps not defined by the above rules, processor behavior is undefined. 3. If no fixed or variable memory range matches, the processor uses the default memory type. 10.11.
MEMORY CACHE CONTROL 3. A memory type that views write data as not necessarily stored and read back by a subsequent read, such as the write-protected type, can only be mapped to another type with the same behaviour (and there are no others for the Pentium 4, Intel Xeon, and P6 family processors) or to the uncacheable type. In many specific cases, a system designer can have additional information about how a memory type is used, allowing additional mappings.
MEMORY CACHE CONTROL FI; ROF; return FirstType; ELSE return UNSUPPORTED; FI; If the processor does not support MTRRs, the function returns UNSUPPORTED. If the MTRRs are not enabled, then the UC memory type is returned. If more than one memory type corresponds to the specified range, a status of MIXED_TYPES is returned. Otherwise, the memory type defined for the range (UC, WC, WT, WB, or WP) is returned.
MEMORY CACHE CONTROL Example 10-6. MemTypeSet Pseudocode IF CPU_FEATURES.MTRR (* processor supports MTRRs *) THEN IF BASE and SIZE are not 4-KByte aligned or size is 0 THEN return INVALID; FI; IF (BASE + SIZE) wrap 4-GByte address space THEN return INVALID; FI; IF TYPE is invalid for Pentium 4, Intel Xeon, and P6 family processors THEN return UNSUPPORTED; FI; IF TYPE is WC and not supported THEN return UNSUPPORTED; FI; IF IA32_MTRRCAP.
MEMORY CACHE CONTROL pre_mtrr_change() BEGIN disable interrupts; Save current value of CR4; disable and flush caches; flush TLBs; disable MTRRs; IF multiprocessing THEN maintain consistency through IPIs; FI; END post_mtrr_change() BEGIN flush caches and TLBs; enable MTRRs; enable caches; restore value of CR4; enable interrupts; END The physical address to variable range mapping algorithm in the MemTypeSet function detects conflicts with current variable range registers by cycling through them and determini
MEMORY CACHE CONTROL 10.11.8 MTRR Considerations in MP Systems In MP (multiple-processor) systems, the operating systems must maintain MTRR consistency between all the processors in the system. The Pentium 4, Intel Xeon, and P6 family processors provide no hardware support to maintain this consistency. In general, all processors must have the same MTRR values.
MEMORY CACHE CONTROL 13. Set PGE flag in control register CR4, if cleared in Step 6 (above). 14. Wait for all processors to reach this point. 15. Enable interrupts. 10.11.9 Large Page Size Considerations The MTRRs provide memory typing for a limited number of regions that have a 4 KByte granularity (the same granularity as 4-KByte pages). The memory type for a given page is cached in the processor’s TLBs.
MEMORY CACHE CONTROL PWT bits in page tables to allow all five of the memory types that can be assigned with the MTRRs (plus one additional memory type) to also be assigned dynamically to pages of the linear address space. The PAT was introduced to IA-32 architecture on the Pentium III processor. It is also available in the Pentium 4 and Intel Xeon processors. 10.12.
MEMORY CACHE CONTROL Table 10-10. Memory Types That Can Be Encoded With PAT Encoding Mnemonic 00H Uncacheable (UC) 01H Write Combining (WC) 02H Reserved* 03H Reserved* 04H Write Through (WT) 05H Write Protected (WP) 06H Write Back (WB) 07H Uncached (UC-) 08H - FFH Reserved* NOTE: * Using these encodings will result in a general-protection exception (#GP). 10.12.
MEMORY CACHE CONTROL 10.12.4 Programming the PAT Table 10-12 shows the default setting for each PAT entry following a power up or reset of the processor. The setting remain unchanged following a soft reset (INIT reset). Table 10-12.
MEMORY CACHE CONTROL 2. Flush the TLBs of processors that may have used the mapping, even speculatively. 3. Create a new mapping to the same physical address with a new memory type, for instance, WC. 4. Flush the caches on all processors that may have used the mapping previously. Note on processors that support self-snooping, CPUID feature flag bit 27, this step is unnecessary.
CHAPTER 11 INTEL MMX TECHNOLOGY SYSTEM PROGRAMMING ® ™ This chapter describes those features of the Intel® MMX™ technology that must be considered when designing or enhancing an operating system to support MMX technology. It covers MMX instruction set emulation, the MMX state, aliasing of MMX registers, saving MMX state, task and context switching considerations, exception handling, and debugging. 11.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING result, the MMX register mapping is fixed and is not affected by value in the Top Of Stack (TOS) field in the floating-point status word (bits 11 through 13). x87 FPU Tag 79 Register 64 63 Floating-Point Registers 0 00 R7 00 R6 00 R5 00 R4 00 R3 00 R2 00 R1 00 R0 x87 FPU Status Register 13 11 000 63 TOS MMX Registers 0 MM7 MM6 MM5 MM4 MM3 MM2 MM1 TOS = 0 MM0 Figure 11-1.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING • When the EMMS instruction is executed, each tag field in the x87 FPU tag word is set to 11B (empty). • Each time an MMX instruction is executed, the TOS value is set to 000B.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING Table 11-3. Effect of the MMX, x87 FPU, and FXSAVE/FXRSTOR Instructions on the x87 FPU Tag Word Instruction Type Instruction x87 FPU Tag Word Image of x87 FPU Tag Word Stored in Memory MMX All (except EMMS) All tags are set to 00B (valid). Not affected. MMX EMMS All tags are set to 11B (empty). Not affected. x87 FPU All (except FSAVE, FSTENV, FRSTOR, FLDENV) Tag for modified floatingpoint register is set to 00B or 11B. Not affected.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING • Execute eight MOVQ instructions to save the contents of the MMX0 through MMX7 registers to memory. An EMMS instruction may then (optionally) be executed to clear the MMX state in the x87 FPU. • Execute eight MOVQ instructions to read the saved contents of MMX registers from memory into the MMX0 through MMX7 registers. NOTE The IA-32 architecture does not support scanning the x87 FPU tag word and then only saving valid entries. 11.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING • System exceptions: — Invalid Opcode (#UD), if the EM flag in control register CR0 is set when an MMX instruction is executed (see Section 11.1, “Emulation of the MMX Instruction Set”). — Device not available (#NM), if an MMX instruction is executed when the TS flag in control register CR0 is set. (See Section 12.5.1., “Using the TS Flag to Control the Saving of the x87 FPU, MMX, SSE, SSE2, SSE3 and SSSE3 State.”) • Floating-point error (#MF). (See Section 11.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING When the TOS equals 2 (case B in Figure 11-2), ST0 points to the physical location R2. MM0 maps to ST6, MM1 maps to ST7, MM2 maps to ST0, and so on.
INTEL® MMX™ TECHNOLOGY SYSTEM PROGRAMMING 11-8 Vol.
CHAPTER 12 SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS This chapter describes features of the streaming SIMD extensions (SSE), streaming SIMD extensions 2 (SSE2), streaming SIMD extensions 3 (SSE3), and Supplemental SSE3 (SSSE3). These must be considered when designing or enhancing an operating system to support Intel 64 and IA-32 processors.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS The following sections describe how to implement each of these guidelines. 12.1.2 Checking for SSE/SSE2/SSE3/SSSE3 Extension Support If the processor attempts to execute an unsupported SSE/SSE2/SSE3/SSSE3 instruction, the processor generates an invalid-opcode exception (#UD). Before an operating system or executive attempts to use SSE/SSE2/SSE3/SSSE3 extensions, it should check that support is present. Make sure: • • • • CPUID.1:EDX.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS NOTE The OSFXSR and OSXMMEXCPT bits in control register CR4 must be set by the operating system. The processor has no other way of detecting operating-system support for the FXSAVE and FXRSTOR instructions or for handling SIMD floating-point exceptions. 3. Clear CR0.EM[bit 2] = 0. This action disables emulation of the x87 FPU, which is required when executing SSE/SSE2/SSE3/SSSE3 instructions (see Section 2.5, “Control Registers”). 4. Set CR0.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS Table 12-2. Action Taken for Combinations of OSFXSR, SSSE3, EM, and TS CR4 CPUID CR0 Flags OSFXSR SSSE3 EM TS 0 X1 X X #UD exception. 1 0 X X #UD exception. 1 1 1 X #UD exception. 1 1 0 1 #NM exception. Action NOTES: 1. X — Don’t care.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS — Alignment check (#AC). When enabled, this type of alignment check operates on operands that are less than 128-bits in size: 16-bit, 32-bit, and 64-bit. To enable the generation of alignment check exceptions, do the following: • • • Set the AM flag (bit 18 of control register CR0) Set the AC flag (bit 18 of the EFLAGS register) CPL must be 3.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS 12.1.6 Providing an Handler for the SIMD Floating-Point Exception (#XF) SSE/SSE2/SSE3/SSSE3 instructions do not generate numeric exceptions on packed integer operations. They can generate the following numeric (SIMD floating-point) exceptions on packed and scalar single-precision and double-precision floating-point operations.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS 12.2 EMULATION OF SSE/SSE2/SSE3/SSSE3 EXTENSIONS The Intel 64 and IA-32 architecture does not support emulation of the SSE/SSE2/SSE3/SSSE3 instructions, as they do for x87 FPU instructions. The EM flag in control register CR0 (provided to invoke emulation of x87 FPU instructions) cannot be used to invoke emulation of SSE/SSE2/SSE3/SSSE3 instructions. If an SSE/SSE2/SSE3/SSSE3 instruction is executed when CR0.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS method for saving and restoring this state. See Section 12.3, “Saving and Restoring the SSE/SSE2/SSE3/SSSE3 State.” These instructions offer the added benefit of saving x87 FPU and MMX state as well. Guidelines for writing such procedures are in Section 12.5, “Designing OS Facilities for AUTOMATICALLY Saving x87 FPU, MMX, and SSE/SSE2/SSE3/SSSE3 state on Task or Context Switches.” 12.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3 instruction needs to be executed in the new task. (See Section 12.5.1., “Using the TS Flag to Control the Saving of the x87 FPU, MMX, SSE, SSE2, SSE3 and SSSE3 State,” for more information.) 12.5.1. Using the TS Flag to Control the Saving of the x87 FPU, MMX, SSE, SSE2, SSE3 and SSSE3 State Saving the x87 FPU/MMX/SSE/SSE2/SSE3/SSSE3 state using FXSAVE requires processor overhead.
SYSTEM PROGRAMMING FOR STREAMING SIMD INSTRUCTION SETS Task A Application Owner of x87 FPU, MMX, XMM, MXCSR State Task B Operating System Task A x87 FPU/MMX/ XMM/MXCSR State Save Area CR0.TS=1 and x87 FPU MMX, SSEx Instruction is encountered Task B x87 FPU/MMX/ XMM/MXCSR State Save Area Operating System Task Switching Code Saves Task A x87 FPU/MMX/ XMM/MXCSR State Device-Not-Available Exception Handler Loads Task B x87 FPU/MMX/ XMM/MXCSR State Figure 12-1.
CHAPTER 13 POWER AND THERMAL MANAGEMENT This chapter describes facilities of IA-32 architecture used for power management and thermal monitoring. 13.1 ENHANCED INTEL SPEEDSTEP® TECHNOLOGY Enhanced Intel SpeedStep® Technology was introduced in the Pentium M processor; it is available in Pentium 4, Intel Xeon, Intel® Core™ Solo and Intel® Core™ Duo processors. The technology manages processor power consumption using performance state transitions.
POWER AND THERMAL MANAGEMENT tools can access model-specific events and report the occurrences of state transitions. 13.2 P-STATE HARDWARE COORDINATION The Advanced Configuration and Power Interface (ACPI) defines performance states (P-state) that are used facilitate system software’s ability to manage processor power consumption. Different P-state correspond to different performance levels that are applied while the processor is actively executing instructions.
POWER AND THERMAL MANAGEMENT • IA32_APERF MSR (0xE8) increments in proportion to actual performance, while accounting for hardware coordination of P-state and TM1/TM2; or software initiated throttling. • The MSRs are per logical processor; they measure performance only when the targeted processor is in the C0 state. • Only the IA32_APERF/IA32_MPERF ratio is architecturally defined; software should not attach meaning to the content of the individual of IA32_APERF or IA32_MPERF MSRs.
POWER AND THERMAL MANAGEMENT if (TargetPstate != currentPstate) { SetPState(TargetPstate); } WRMSR(IA32_MPERF, 0); WRMSR(IA32_APERF, 0); 13.3 MWAIT EXTENSIONS FOR ADVANCED POWER MANAGEMENT IA-32 processors may support a number of C-states1 that reduce power consumption for inactive states. Intel Core Solo and Intel Core Duo processors support both deeper C-state and MWAIT extensions that can be used by OS to implement power management policy.
POWER AND THERMAL MANAGEMENT Executing MWAIT generates an exception on processors operating at a privilege level where MONITOR/MWAIT are not supported. NOTE If MWAIT is used to enter a C-state (including sub C-state) that is numerically higher than C1, a store to the address range armed by MONITOR instruction will cause the processor to exit MWAIT if the store was originated by other processor agents. A store from nonprocessor agent may not cause the processor to exit MWAIT. 13.
POWER AND THERMAL MANAGEMENT Clock Applied to Processor Stop-Clock Duty Cycle 25% Duty Cycle (example only) Figure 13-2. Processor Modulation Through Stop-Clock Mechanism For previous automatic thermal monitoring mechanisms, software controlled mechanisms that changed processor operating parameters to impact changes in thermal conditions.
POWER AND THERMAL MANAGEMENT 13.4.2.1 Thermal Monitor 1 The Pentium 4 processor uses the second temperature sensor in conjunction with a mechanism called TM1 (Thermal Monitor 1) to control the core temperature of the processor. TM1 controls the processor’s temperature by modulating the duty cycle of the processor clock. Modulation of duty cycles is processor model specific. Note that the processors STPCLK# pin is not used here; the stop-clock circuitry is controlled internally.
POWER AND THERMAL MANAGEMENT 31 0 16 Reserved Reserved TM_SELECT Figure 13-3. MSR_THERM2_CTL Register On Processors with CPUID Family/Model/Stepping Signature Encoded as 0x69n or 0x6Dn On processors introduced after the Pentium 4 processor (this includes most Pentium M processors), the method used to enable TM2 is different. TM2 is enable by setting bit 13 of IA32_MISC_ENABLE register to 1.
POWER AND THERMAL MANAGEMENT 13.4.2.5 Thermal Status Information The status of the temperature sensor that triggers the thermal monitor (TM1/TM2) is indicated through the thermal status flag and thermal status log flag in the IA32_THERM_STATUS MSR (see Figure 13-5).
POWER AND THERMAL MANAGEMENT 63 210 Reserved Low-Temperature Interrupt Enable High-Temperature Interrupt Enable Figure 13-6. IA32_THERM_INTERRUPT MSR • High-Temperature Interrupt Enable flag, bit 0 — Enables an interrupt to be generated on the transition from a low-temperature to a high-temperature when set; disables the interrupt when clear.(R/W).
POWER AND THERMAL MANAGEMENT The IA32_CLOCK_MODULATION MSR contains the following flag and field used to enable software-controlled clock modulation and to select the clock modulation duty cycle: • On-Demand Clock Modulation Enable, bit 4 — Enables on-demand software controlled clock modulation when set; disables software-controlled clock modulation when clear. • On-Demand Clock Modulation Duty Cycle, bits 1 through 3 — Selects the on-demand clock modulation duty cycle (see Table 13-1).
POWER AND THERMAL MANAGEMENT 13.4.4 Detection of Thermal Monitor and Software Controlled Clock Modulation Facilities The ACPI flag (bit 22) of the CPUID feature flags indicates the presence of the IA32_THERM_STATUS, IA32_THERM_INTERRUPT, IA32_CLOCK_MODULATION MSRs, and the xAPIC thermal LVT entry. The TM1 flag (bit 29) of the CPUID feature flags indicates the presence of the automatic thermal monitoring facilities that modulate clock duty cycles. 13.4.
POWER AND THERMAL MANAGEMENT • Thermal Status Log (bit 1, R/WC0) — This is a sticky bit that indicates the history of the thermal sensor high temperature output signal (PROCHOT#). Bit 1 = 1 if PROCHOT# has been asserted since a previous RESET or the last time software cleared the bit. Software may clear this bit by writing a zero. • PROCHOT# or FORCEPR# Event (bit 2, RO) — Indicates whether PROCHOT# or FORCEPR# is being asserted. If bit 2 = 1, PROCHOT# or FORCEPR# has been asserted.
POWER AND THERMAL MANAGEMENT this bit or a reset. If bit 7 = 1, the Threshold #1 has been reached. Software may clear this bit by writing a zero. • Thermal Threshold #2 Status (bit 8, RO) — Indicates whether actual temperature is currently higher than the value set in Thermal Threshold #2. If bit 8 = 0, the actual temperature is lower. If bit 8 = 0, the actual temperature is greater than or equal to TT#2.
POWER AND THERMAL MANAGEMENT See Figure 13-9 for the layout of IA32_THERM_INTERRUPT MSR. Bit fields include: • High-Temperature Interrupt Enable (bit 0, R/W) — This bit allows the BIOS to enable the generation of an interrupt on the transition from low-temperature to a high-temperature threshold. Bit 0 = 0 (default) disables interrupts; bit 0 = 1 enables interrupts.
POWER AND THERMAL MANAGEMENT 13-16 Vol.
CHAPTER 14 MACHINE-CHECK ARCHITECTURE This chapter describes the machine-check architecture and machine-check exception mechanism found in the Pentium 4, Intel Xeon, and P6 family processors. See Chapter 5, “Interrupt 18—Machine-Check Exception (#MC),” for more information on machine-check exceptions. A brief description of the Pentium processor’s machine check capability is also given. 14.
MACHINE-CHECK ARCHITECTURE See Section 14.3.3, “Mapping of the Pentium Processor Machine-Check Errors to the Machine-Check Architecture,” and Section 14.8.3, “Pentium Processor MachineCheck Exception Handling,” for information on compatibility between machine-check code written to run on the Pentium processors and code written to run on P6 family processors. 14.
MACHINE-CHECK ARCHITECTURE 14.3.1.1 IA32_MCG_CAP MSR The IA32_MCG_CAP MSR is a read-only register that provides information about the machine-check architecture of the processor. Figure 14-2 shows the structure of the register in Pentium 4, Intel Xeon, and P6 family processors. 63 24 23 16 15 12 11 10 9 8 7 0 Count MCG_EXT_CNT (23:16) MCG_TES_P (11) MCG_EXT_P (9) Reserved MCG_CTL_P (8) Figure 14-2.
MACHINE-CHECK ARCHITECTURE 14.3.1.2 IA32_MCG_STATUS MSR The IA32_MCG_STATUS MSR describes the current state of the processor after a machine-check exception has occurred (see Figure 14-3). 63 3 2 1 0 Reserved M C I P E R I I P P V V MCIP—Machine check in progress flag EIPV—Error IP valid flag RIPV—Restart IP valid flag Figure 14-3.
MACHINE-CHECK ARCHITECTURE 14.3.2 Error-Reporting Register Banks Each error-reporting register bank can contain the IA32_MCi_CTL, IA32_MCi_STATUS, IA32_MCi_ADDR, and IA32_MCi_MISC MSRs. The Pentium 4 and Intel Xeon processors provide four or five banks; the P6 family processors provide five banks. The first error-reporting register (IA32_MC0_CTL) always starts at address 400H.
MACHINE-CHECK ARCHITECTURE NOTE Figure 14-5 depicts the IA32_MCi_STATUS MSR when IA32_MCG_CAP[11] = 1. When IA32_MCG_CAP[11] = 0, bits 56:53 are part of the “Other Information” field. The use of bits 54:53 for threshold-based error reporting began with Core Duo processors, and is currently used for cache memory. See Section 14.4, “Enhanced Cache Error reporting,” for more information.
MACHINE-CHECK ARCHITECTURE • Model-specific error code field, bits 31:16 — Specifies the model-specific error code that uniquely identifies the machine-check error condition detected. The model-specific error codes may differ among IA-32 processors for the same machine-check error condition. See Appendix E, “Interpreting Machine-Check Error Codes”for information on model-specific error codes.
MACHINE-CHECK ARCHITECTURE the address where the error occurred. Do not read these registers if they are not implemented in the processor. • MISCV (IA32_MCi_MISC register valid) flag, bit 59 — Indicates (when set) that the IA32_MCi_MISC register contains additional information regarding the error. When clear, this flag indicates that the IA32_MCi_MISC register is either not implemented or does not contain additional information regarding the error.
MACHINE-CHECK ARCHITECTURE Table 14-2.
MACHINE-CHECK ARCHITECTURE Processor Without Support For Intel 64 Architecture 63 0 36 35 Address Reserved Processor With Support for Intel 64 Architecture 63 0 Address* * Useful bits in this field depend on the address methodology in use when the the register state is saved. Figure 14-6. IA32_MCi_ADDR MSR 14.3.2.4 IA32_MCi_MISC MSRs The IA32_MCi_MISC MSR contains additional information describing the machinecheck error if the MISCV flag in the IA32_MCi_STATUS register is set.
MACHINE-CHECK ARCHITECTURE Table 14-3. Extended Machine Check State MSRs in Processors Without Support for Intel 64 Architecture MSR Address Description IA32_MCG_ECX 182H Contains state of the ECX register at the time of the machine-check error. IA32_MCG_EDX 183H Contains state of the EDX register at the time of the machine-check error. IA32_MCG_ESI 184H Contains state of the ESI register at the time of the machinecheck error.
MACHINE-CHECK ARCHITECTURE Table 14-4. Extended Machine Check State MSRs In Processors With Support For Intel 64 Architecture (Contd.) MSR Address Description IA32_MCG_RSI 184H Contains state of the RSI register at the time of the machinecheck error. IA32_MCG_RDI 185H Contains state of the RDI register at the time of the machinecheck error. IA32_MCG_RBP 186H Contains state of the RBP register at the time of the machinecheck error.
MACHINE-CHECK ARCHITECTURE and the R/EIP in these extended machine-check state MSRs. This information can be used by a debugger to analyze the error. These registers are read/write to zero registers. This means software can read them; but if software writes to them, only all zeros is allowed. If software attempts to write a non-zero value into one of these registers, a general-protection (#GP) exception is generated.
MACHINE-CHECK ARCHITECTURE repeated corrections is at or below a pre-defined threshold, and a “yellow” status when the number of affected lines exceeds the threshold. Yellow status means that the cache reporting the event is operating correctly, but you should schedule the system for servicing within a few weeks. Intel recommends that you rely on this mechanism for structures supported by threshold-base error reporting.
MACHINE-CHECK ARCHITECTURE Example 14-1. Machine-Check Initialization Pseudocode Check CPUID Feature Flags for MCE and MCA support IF CPU supports MCE THEN IF CPU supports MCA THEN IF (IA32_MCG_CAP.MCG_CTL_P = 1) (* IA32_MCG_CTL register is present *) THEN IA32_MCG_CTL ← FFFFFFFFFFFFFFFFH; (* enables all MCA features *) FI (* Determine number of error-reporting banks supported *) COUNT← IA32_MCG_CAP.
MACHINE-CHECK ARCHITECTURE DO IA32_MCi_STATUS ← 0; OD ELSE FOR error-reporting banks (0 through MAX_BANK_NUMBER) DO (Optional for BIOS and OS) Log valid errors (OS only) IA32_MCi_STATUS ← 0; OD FI FI FI Setup the Machine Check Exception (#MC) handler for vector 18 in IDT Set the MCE bit (bit 6) in CR4 register to enable Machine-Check Exceptions FI 14.7.
MACHINE-CHECK ARCHITECTURE Table 14-5. IA32_MCi_Status [15:0] Simple Error Code Encoding Error Code Binary Encoding Meaning No Error 0000 0000 0000 0000 No error has been reported to this bank of error-reporting registers. Unclassified 0000 0000 0000 0001 This error has not been classified into the MCA error classes.
MACHINE-CHECK ARCHITECTURE The “Interpretation” column in the table indicates the name of a compound error. The name is constructed by substituting mnemonics for the sub-field names given within curly braces. For example, the error code ICACHEL1_RD_ERR is constructed from the form: {TT}CACHE{LL}_{RRRR}_ERR, where {TT} is replaced by I, {LL} is replaced by L1, and {RRRR} is replaced by RD. For more information on the “Form” and “Interpretation” columns, see Sections Section 14.7.2.
MACHINE-CHECK ARCHITECTURE 14.7.2.3 Level (LL) Sub-Field The 2-bit LL sub-field (see Table 14-8) indicates the level in the memory hierarchy where the error occurred (level 0, level 1, level 2, or generic). The LL sub-field also applies to the TLB, cache, and interconnect error conditions. The Pentium 4, Intel Xeon, and P6 family processors support two levels in the cache hierarchy and one level in the TLBs. Again, the generic type is reported when the processor cannot determine the hierarchy level.
MACHINE-CHECK ARCHITECTURE 14.7.2.5 Bus and Interconnect Errors The bus and interconnect errors are defined with the 2-bit PP (participation), 1-bit T (time-out), and 2-bit II (memory or I/O) sub-fields, in addition to the LL and RRRR sub-fields (see Table 14-10). The bus error conditions are implementation dependent and related to the type of bus implemented by the processor.
MACHINE-CHECK ARCHITECTURE 14.8 GUIDELINES FOR WRITING MACHINE-CHECK SOFTWARE The machine-check architecture and error logging can be used in two different ways: • To detect machine errors during normal instruction execution, using the machine-check exception (#MC). • To periodically check and log machine errors. To use the machine-check exception, the operating system or executive software must provide a machine-check exception handler.
MACHINE-CHECK ARCHITECTURE the MCA Error Codes,” for information that can be used to write an algorithm to interpret this field. • The RIPV, PCC, and OVER flags in each IA32_MCi_STATUS register indicate whether recovery from the error is possible. If one of these fields is set, recovery is not possible. The OVER field indicates that two or more machine-check errors occurred. When recovery is not possible, the handler typically records the error information and signals an abort to the operating system.
MACHINE-CHECK ARCHITECTURE Example 14-2. Machine-Check Exception Handler Pseudocode IF CPU supports MCE THEN IF CPU supports MCA THEN call errorlogging routine; (* returns restartability *) FI; ELSE (* Pentium(R) processor compatible *) READ P5_MC_ADDR READ P5_MC_TYPE; report RESTARTABILITY to console; FI; IF error is not restartable THEN report RESTARTABILITY to console; abort system; FI; CLEAR MCIP flag in IA32_MCG_STATUS; 14.8.
MACHINE-CHECK ARCHITECTURE • A user-initiated application that polls the register banks and records the exceptions. Here, the actual polling service is provided by an operating-system driver or through the system call interface. Example 14-3 gives pseudocode for an error logging utility. Example 14-3.
MACHINE-CHECK ARCHITECTURE When the MCIP flag is set in the IA32_MCG_STATUS register, a machine-check exception is in progress and the machine-check exception handler has called the exception logging routine.
MACHINE-CHECK ARCHITECTURE 14-26 Vol.
CHAPTER 15 8086 EMULATION IA-32 processors (beginning with the Intel386 processor) provide two ways to execute new or legacy programs that are assembled and/or compiled to run on an Intel 8086 processor: • • Real-address mode. Virtual-8086 mode. Figure 2-3 shows the relationship of these operating modes to protected mode and system management mode (SMM). When the processor is powered up or reset, it is placed in the real-address mode.
8086 EMULATION The following is a summary of the core features of the real-address mode execution environment as would be seen by a program written for the 8086: • The processor supports a nominal 1-MByte physical address space (see Section 15.1.1, “Address Translation in Real-Address Mode”, for specific details). This address space is divided into segments, each of which can be up to 64 KBytes in length.
8086 EMULATION • A single interrupt table, called the “interrupt vector table” or “interrupt table,” is provided for handling interrupts and exceptions (see Figure 15-2). The interrupt table (which has 4-byte entries) takes the place of the interrupt descriptor table (IDT, with 8-byte entries) used when handling protected-mode interrupts and exceptions. Interrupt and exception vector numbers provide an index to entries in the interrupt table.
8086 EMULATION in real-address mode, however, the processor does not truncate such an address and uses it as a physical address. (Note, however, that for IA-32 processors beginning with the Intel486 processor, the A20M# signal can be used in real-address mode to mask address line A20, thereby mimicking the 20-bit wrap-around behavior of the 8086 processor.) Care should be take to ensure that A20M# based address wrapping is handled correctly in multiprocessor based system.
8086 EMULATION these instructions should be used in a new program written to run in real-address mode. • Move (MOV) instructions that move operands between general-purpose registers, segment registers, and between memory and general-purpose registers. • • • The exchange (XCHG) instruction. • • • Logical instructions AND, OR, XOR, and NOT. • • • • • • Type conversion instructions CWD, CDQ, CBW, and CWDE. • • • • • • I/O instructions IN, INS, OUT, and OUTS.
8086 EMULATION • • • Exchange instructions CMPXCHG, CMPXCHG8B, and XADD. • • • • • • Double shift instructions SHLD and SHRD. String instructions MOVS, CMPS, SCAS, LODS, and STOS. Bit test and bit scan instructions BT, BTS, BTR, BTC, BSF, and BSR; the byte-seton condition instruction SETcc; and the byte swap (BSWAP) instruction. EFLAGS control instructions PUSHF and POPF. ENTER and LEAVE control instructions. BOUND instruction. CPU identification (CPUID) instruction.
8086 EMULATION An IRET instruction at the end of the handler procedure reverses these steps to return program control to the interrupted program. Exceptions do not return error codes in real-address mode. The interrupt vector table is an array of 4-byte entries (see Figure 15-2). Each entry consists of a far pointer to a handler procedure, made up of a segment selector and an offset. The processor scales the interrupt or exception vector by 4 to obtain an offset into the interrupt table.
8086 EMULATION 15.2 VIRTUAL-8086 MODE Virtual-8086 mode is actually a special type of a task that runs in protected mode. When the operating-system or executive switches to a virtual-8086-mode task, the processor emulates an Intel 8086 processor. The execution environment of the processor while in the 8086-emulation state is the same as is described in Section 15.1, “Real-Address Mode” for real-address mode, including the extensions.
8086 EMULATION Table 15-1. Real-Address Mode Exceptions and Interrupts (Contd.) Vector No. 19-31 32255 Description (Intel reserved. Do not use.) User Defined Interrupts Real-Address Mode Virtual-8086 Mode Intel 8086 Processor Reserved Reserved Reserved Yes Yes Yes NOTE: * In the real-address mode, vector 13 is the segment overrun exception.
8086 EMULATION The processor enters virtual-8086 mode to run the 8086 program and returns to protected mode to run the virtual-8086 monitor. The virtual-8086 monitor is a 32-bit protected-mode code module that runs at a CPL of 0. The monitor consists of initialization, interrupt- and exception-handling, and I/O emulation procedures that emulate a personal computer or other 8086-based platform.
8086 EMULATION Paging is not necessary for a single virtual-8086-mode task, but paging is useful or necessary in the following situations: • When running multiple virtual-8086-mode tasks. Here, paging allows the lower 1 MByte of the linear address space for each virtual-8086-mode task to be mapped to a different physical address location. • When emulating the 8086 address-wraparound that occurs at 1 MByte.
8086 EMULATION When a task switch is used to enter virtual-8086 mode, the TSS for the virtual-8086mode task must be a 32-bit TSS. (If the new TSS is a 16-bit TSS, the upper word of the EFLAGS register is not in the TSS, causing the processor to clear the VM flag when it loads the EFLAGS register.) The processor updates the VM flag prior to loading the segment registers from their images in the new TSS.
8086 EMULATION Real Mode Code Real-Address Mode PE=0 or RESET PE=1 Protected Mode ProtectedMode Tasks Task Switch1 Task Switch VM=0 ProtectedMode Interrupt and Exception Handlers CALL Virtual-8086 Monitor RET VM = 0 VM = 1 Interrupt or Exception2 Virtual-8086 Mode RESET Virtual-8086 Mode Tasks (8086 Programs) #GP Exception3 IRET4 IRET5 Redirect Interrupt to 8086 Program Interrupt or Exception Handler6 NOTES: 1.
8086 EMULATION 15.2.6 Leaving Virtual-8086 Mode The processor can leave the virtual-8086 mode only through an interrupt or exception. The following are situations where an interrupt or exception will lead to the processor leaving virtual-8086 mode (see Figure 15-3): • The processor services a hardware interrupt generated to signal the suspension of execution of the virtual-8086 application. This hardware interrupt may be generated by a timer or other external mechanism.
8086 EMULATION See Section 15.3, “Interrupt and Exception Handling in Virtual-8086 Mode”, for information on leaving virtual-8086 mode to handle an interrupt or exception generated in virtual-8086 mode. 15.2.7 Sensitive Instructions When an IA-32 processor is running in virtual-8086 mode, the CLI, STI, PUSHF, POPF, INT n, and IRET instructions are sensitive to IOPL. The IN, INS, OUT, and OUTS instructions, which are sensitive to IOPL in protected mode, are not sensitive in virtual-8086 mode.
8086 EMULATION See Chapter 13, “Input/Output”, in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information about the I/O permission bit map. 15.2.8.2 Memory-Mapped I/O In systems which use memory-mapped I/O, the paging facilities of the processor can be used to generate exceptions for attempts to access I/O ports.
8086 EMULATION • Class 1 — All processor-generated exceptions and all hardware interrupts, including the NMI interrupt and the hardware interrupts sent to the processor’s external interrupt delivery pins. All class 1 exceptions and interrupts are handled by the protected-mode exception and interrupt handlers. • Class 2 — Special case for maskable hardware interrupts (Section 5.3.2, “Maskable Hardware Interrupts”) when the virtual mode extensions are enabled.
8086 EMULATION • Protected-mode interrupt and exceptions handlers — These are the standard handlers that the processor calls through the protected-mode IDT. • Virtual-8086 monitor interrupt and exception handlers — These handlers are resident in the virtual-8086 monitor, and they are commonly accessed through a general-protection exception (#GP, interrupt 13) that is directed to the protected-mode general-protection exception handler.
8086 EMULATION 8086-mode task, can use the same code sequences for saving and restoring the registers for any task. Clearing these registers before execution of the IRET instruction does not cause a trap in the interrupt handler. Interrupt procedures that expect values in the segment registers or that return values in the segment registers must use the register images saved on the stack for privilege level 0. 4. Clears VM, NT, RF and TF flags (in the EFLAGS register).
8086 EMULATION • The protected-mode interrupt or exception handler can call the virtual-8086 monitor to handle the interrupt or exception. • The virtual-8086 monitor (if called) can in turn pass control back to the 8086 program’s interrupt and exception handler. If the interrupt or exception is handled with a protected-mode handler, the handler can return to the interrupted program in virtual-8086 mode by executing an IRET instruction.
8086 EMULATION 5. When the IRET instruction from the privilege-level 3 handler triggers a generalprotection exception (#GP) and thus effectively again calls the virtual-8086 monitor, restore the return link on the privilege-level 0 stack to point to the original, interrupted, privilege-level 3 procedure. 6.
8086 EMULATION 15.3.2 Class 2—Maskable Hardware Interrupt Handling in Virtual-8086 Mode Using the Virtual Interrupt Mechanism Maskable hardware interrupts are those interrupts that are delivered through the INTR# pin or through an interrupt request to the local APIC (see Section 5.3.2, “Maskable Hardware Interrupts”). These interrupts can be inhibited (masked) from interrupting an executing program or task by clearing the IF flag in the EFLAGS register.
8086 EMULATION program. But actually the IF flag, managed by the operating system, always controls whether maskable hardware interrupts are enabled. Also, if under these circumstances an 8086 program tries to read or change the IF flag using the PUSHF or POPF instructions, the processor will change the VIF flag instead, leaving IF unchanged. The VIP flag provides software a means of recording the existence of a deferred (or pending) maskable hardware interrupt.
8086 EMULATION rupts). Prior to setting the VIF flag, the processor automatically checks the VIP flag and does one of the following, depending on the state of the flag: • If the VIP flag is clear (indicating no pending interrupts), the processor sets the VIF flag. • If the VIP flag is set (indicating a pending interrupt), the processor generates a general-protection exception (#GP).
8086 EMULATION • Speeds up the handling of software-generated interrupts in virtual-8086 mode by allowing the processor to bypass the virtual-8086 monitor and redirect software interrupts back to the interrupt handlers that are part of the currently running 8086 program. • Supports virtual interrupts for software written to run on the 8086 processor.
8086 EMULATION Table 15-2. Software Interrupt Handling Methods While in Virtual-8086 Mode Method VME IOPL Bit in Redir.
8086 EMULATION Last byte of bit map must be followed by a 31 24 23 Task-State Segment (TSS) 0 1 1 1 1 1 1 1 1 I/O Permission Bit Map Software Interrupt Redirection Bit Map (32 Bytes) I/O map base must not exceed DFFFH. I/O Map Base 64H 0 Figure 15-5. Software Interrupt Redirection Bit Map in TSS Redirecting software interrupts back to the 8086 program potentially speeds up interrupt handling because a switch back and forth between virtual-8086 mode and protected mode is not required.
8086 EMULATION Section 15.3.1, “Class 1—Hardware Interrupt and Exception Handling in Virtual-8086 Mode”, for a complete description of this mechanism and its possible uses. 15.3.3.2 Methods 2 and 3: Software Interrupt Handling When a software interrupt occurs in virtual-8086 mode and the method 2 or 3 conditions are present, the processor generates a general-protection exception (#GP). Method 2 is enabled when the VME flag is set to 0 and the IOPL value is less than 3.
8086 EMULATION 4. Clears the TF flag, in the EFLAGS register. 5. Locates the 8086 program interrupt vector table at linear address 0 for the 8086mode task. 6. Loads the CS and EIP registers with values from the interrupt vector table entry pointed to by the interrupt vector number. Only the 16 low-order bits of the EIP are loaded and the 16 high-order bits are set to 0. The interrupt vector table is assumed to be at linear address 0 of the current virtual-8086 task. 7.
8086 EMULATION 8086 mode task. Also, because the IOPL value is less than 3 and the VIF flag is enabled, the information pushed on the stack by the processor when invoking the interrupt handler is slightly different between methods 5 and 6 (see Table 15-2). 15.
8086 EMULATION to a protected-mode interrupt handler (typically the general-protection exception handler, which in turn calls the virtual 8086-mode monitor). In both cases, the EFLAGS register is saved and restored. This is not true, however, in protected mode when the PVI flag is set and the processor is not in virtual-8086 mode. Here, it is possible to call a procedure at a different privilege level, in which case the EFLAGS register is not saved or modified.
8086 EMULATION 15-32 Vol.
CHAPTER 16 MIXING 16-BIT AND 32-BIT CODE Program modules written to run on IA-32 processors can be either 16-bit modules or 32-bit modules. Table 16-1 shows the characteristic of 16-bit and 32-bit modules. Table 16-1.
MIXING 16-BIT AND 32-BIT CODE 16.1 DEFINING 16-BIT AND 32-BIT PROGRAM MODULES The following IA-32 architecture mechanisms are used to distinguish between and support 16-bit and 32-bit segments and operations: • • • • • The D (default operand and address size) flag in code-segment descriptors. The B (default stack size) flag in stack-segment descriptors. 16-bit and 32-bit call gates, interrupt gates, and trap gates. Operand-size and address-size instruction prefixes.
MIXING 16-BIT AND 32-BIT CODE These prefixes reverse the default size selected by the D flag in the code-segment descriptor. For example, the processor can interpret the (MOV mem, reg) instruction in any of four ways: • In a 32-bit code segment: — Moves 32 bits from a 32-bit register to memory using a 32-bit effective address. — If preceded by an operand-size prefix, moves 16 bits from a 16-bit register to memory using a 32-bit effective address.
MIXING 16-BIT AND 32-BIT CODE 16.3 SHARING DATA AMONG MIXED-SIZE CODE SEGMENTS Data segments can be accessed from both 16-bit and 32-bit code segments. When a data segment that is larger than 64 KBytes is to be shared among 16- and 32-bit code segments, the data that is to be accessed from the 16-bit code segments must be located within the first 64 KBytes of the data segment. The reason for this is that 16-bit pointers by definition can only point to the first 64 KBytes of a segment.
MIXING 16-BIT AND 32-BIT CODE Likewise, there are three ways for procedure in a 32-bit code segment to safely make a call to a 16-bit code segment: • Make the call through a 16-bit call gate. Here, the EIP value at the CALL instruction cannot exceed FFFFH. • Make a 32-bit call to a 16-bit interface procedure. The interface procedure then makes a 16-bit call to the intended destination. • Modify the 32-bit procedure, inserting an operand-size prefix before the call, changing it to a 16-bit call.
MIXING 16-BIT AND 32-BIT CODE instruction (see Figure 16-1). On a 16-bit call, the processor pushes the contents of the 16-bit IP register and (for calls between privilege levels) the 16-bit SP register. The matching RET instruction must also use a 16-bit operand size to pop these 16-bit values from the stack into the 16-bit registers. A 32-bit CALL instruction pushes the contents of the 32-bit EIP register and (for inter-privilege-level calls) the 32-bit ESP register.
MIXING 16-BIT AND 32-BIT CODE While executing 32-bit code, if a call is made to a 16-bit code segment which is at the same or a more privileged level (that is, the DPL of the called code segment is less than or equal to the CPL of the calling code segment) through a 16-bit call gate, then the upper 16-bits of the ESP register may be unreliable upon returning to the 32-bit code segment (that is, after executing a RET in the 16-bit code segment).
MIXING 16-BIT AND 32-BIT CODE • Relink the CALL instruction to point to 32-bit call gates (see Section 16.4.2.2, “Passing Parameters With a Gate”). • Add a 32-bit operand-size prefix to each CALL instruction. 16.4.2.2 Passing Parameters With a Gate When referencing 32-bit gates with 16-bit procedures, it is important to consider the number of parameters passed in each procedure call.
MIXING 16-BIT AND 32-BIT CODE 16.4.5 Writing Interface Procedures Placing interface code between 32-bit and 16-bit procedures can be the solution to the following interface problems: • Allowing procedures in 16-bit code segments to call procedures with offsets greater than FFFFH in 32-bit code segments. • • Matching operand-size attributes between companion CALL and RET instructions. • The possible invalidation of the upper bits of the ESP register.
MIXING 16-BIT AND 32-BIT CODE 16-10 Vol.
CHAPTER 17 ARCHITECTURE COMPATIBILITY Intel 64 and IA-32 processors are binary compatible. Compatibility means that, within limited constraints, programs that execute on previous generations of processors will produce identical results when executed on later processors. The compatibility constraints and any implementation differences between the Intel 64 and IA-32 processors are described in this chapter.
ARCHITECTURE COMPATIBILITY 17.2 RESERVED BITS Throughout this manual, certain bits are marked as reserved in many register and memory layout descriptions. When bits are marked as undefined or reserved, it is essential for compatibility with future processors that software treat these bits as having a future, though unknown effect.
ARCHITECTURE COMPATIBILITY 17.4 DETECTING THE PRESENCE OF NEW FEATURES THROUGH SOFTWARE Software can check for the presence of new architectural features and extensions in either of two ways: 1. Test for the presence of the feature or extension. Software can test for the presence of new flags in the EFLAGS register and control registers. If these flags are reserved (meaning not present in the processor executing the test), an exception is generated.
ARCHITECTURE COMPATIBILITY 17.7 STREAMING SIMD EXTENSIONS 2 (SSE2) The Streaming SIMD Extensions 2 (SSE2) were introduced in the Pentium 4 and Intel Xeon processors. They consist of a new set of instructions that operate on the XXM and MXCSR registers and perform SIMD operations on double-precision floatingpoint values and on integer values. Several of these new instructions also operate in the MMX registers.
ARCHITECTURE COMPATIBILITY 17.11 SPECIFIC FEATURES OF DUAL-CORE PROCESSOR Dual-core processors may have some processor-specific features. Use CPUID feature flags to detect the availability features. Note the following: • CPUID Brand String — On Pentium processor Extreme Edition, the process will report the correct brand string only after the correct microcode updates are loaded.
ARCHITECTURE COMPATIBILITY Table 17-1. New Instruction in the Pentium Processor and Later IA-32 Processors (Contd.) Instruction CPUID Identification Bits CMPXCHG8B (compare and exchange 8 bytes) EDX, Bit 8 CPUID (CPU identification) None; see Note 2 RDTSC (read time-stamp counter) EDX, Bit 4 RDMSR (read model-specific register) EDX, Bit 5 WRMSR (write model-specific register) EDX, Bit 5 MMX Instructions EDX, Bit 23 Introduced In Pentium processor NOTES: 1.
ARCHITECTURE COMPATIBILITY 17.13 OBSOLETE INSTRUCTIONS The MOV to and from test registers instructions were removed from the Pentium processor and future IA-32 processors. Execution of these instructions generates an invalid-opcode exception (#UD). 17.14 UNDEFINED OPCODES All new instructions defined for IA-32 processors use binary encodings that were reserved on earlier-generation processors. Attempting to execute a reserved opcode always results in an invalid-opcode (#UD) exception being generated.
ARCHITECTURE COMPATIBILITY family or Pentium processor. The CPUID instruction can then be used to determine which processor. • Bits 19 (the VIF flag) and 20 (the VIP flag) will always be zero on processors that do not support virtual mode extensions, which includes all 32-bit processors prior to the Pentium processor.
ARCHITECTURE COMPATIBILITY 17.17 X87 FPU This section addresses the issues that must be faced when porting floating-point software designed to run on earlier IA-32 processors and math coprocessors to a Pentium 4, Intel Xeon, P6 family, or Pentium processor with integrated x87 FPU. To software, a Pentium 4, Intel Xeon, or P6 family processor looks very much like a Pentium processor.
ARCHITECTURE COMPATIBILITY 17.17.2 x87 FPU Status Word This section identifies differences to the x87 FPU status word for the different IA-32 processors and math coprocessors, the reason for the differences, and their impact on software. 17.17.2.1 Condition Code Flags (C0 through C3) The following information pertains to differences in the use of the condition code flags (C0 through C3) located in bits 8, 9, 10, and 14 of the x87 FPU status word.
ARCHITECTURE COMPATIBILITY processors, but has no effect. This change was made to conform to the IEEE Standard 754 for Binary Floating-Point Arithmetic. On a 16-bit IA-32 math coprocessor, both affine and projective closures are supported, as determined by the setting of bit 12. After a hardware reset, the default value of bit 12 is projective. Software that requires projective infinity arithmetic may give different results. 17.17.
ARCHITECTURE COMPATIBILITY exception upon encountering a QNaN. An invalid-operation exception (#I) is generated only upon encountering a SNaN, except for the FCOM, FIST, and FBSTP instructions, which also generates an invalid-operation exceptions for a QNaNs. This behavior matches IEEE Standard 754. The 16-bit IA-32 math coprocessors only generate one kind of NaN (the equivalent of a QNaN), but the raise an invalid-operation exception upon encountering any kind of NaN.
ARCHITECTURE COMPATIBILITY for these instructions on the 32-bit x87 FPUs. The exception handlers ported to these latter processors need to be changed only if the handlers gives special treatment to different opcodes. 17.17.6.2 Numeric Overflow Exception (#O) On the 32-bit x87 FPUs, when the numeric overflow exception is masked and the rounding mode is set to chop (toward 0), the result is the largest positive or smallest negative number.
ARCHITECTURE COMPATIBILITY coprocessor if the result is stored on the stack. The difference is only in the least significant bit of the significand and is apparent only to the exception handler. 17.17.6.4 Exception Precedence There is no difference in the precedence of the denormal-operand exception on the 32-bit x87 FPUs, whether it be masked or not. When the denormal-operand exception is not masked on the 16-bit IA-32 math coprocessors, it takes precedence over all other exceptions.
ARCHITECTURE COMPATIBILITY caused the exception. For the Pentium and Intel486 processors, an unmasked floating-point exception may cause the FERR# pin to be asserted either at the end of the instruction causing the exception or immediately before execution of the next floating-point instruction. (Note that the next floating-point instruction would not be executed until the pending unmasked exception has been handled.
ARCHITECTURE COMPATIBILITY 17.17.6.12 Coprocessor Segment Overrun Exception The coprocessor segment overrun exception (interrupt 9) does not occur in the P6 family, Pentium, and Intel486 processors. In situations where the Intel 387 math coprocessor would cause an interrupt 9, the P6 family, Pentium, and Intel486 processors simply abort the instruction. To avoid undetected segment overruns, it is recommended that the floating-point save area be placed in the same page as the TSS.
ARCHITECTURE COMPATIBILITY precision exception is signaled. With the 16-bit IA-32 math coprocessors, the range of the scaling operand is restricted. If (0 < | ST(1) | < 1), the result is undefined and no exception is signaled. The impact of this difference on exiting software is that different results are delivered on the 32-bit and 16-bit FPUs and math coprocessors when (0 < | ST(1) | < 1). 17.17.7.3 FPREM1 Instruction The 32-bit x87 FPUs compute a partial remainder according to IEEE Standard 754.
ARCHITECTURE COMPATIBILITY 17.17.7.8 FSIN, FCOS, and FSINCOS Instructions On the 32-bit x87 FPUs, these instructions perform three common trigonometric functions. These instructions do not exist on the 16-bit IA-32 math coprocessors. The availability of these instructions has no impact on existing software, but using them provides a performance upgrade. 17.17.7.9 FPATAN Instruction On the 32-bit x87 FPUs, the range of operands for the FPATAN instruction is unrestricted.
ARCHITECTURE COMPATIBILITY +∞, the invalid-operation exception is reported. These differences have no impact on existing software. Software usually bypasses 0 and ∞. This change is due to the IEEE Standard 754 recommendation to fully support the “logb” function. 17.17.7.13 Load Constant Instructions On 32-bit x87 FPUs, rounding control is in effect for the load constant instructions. Rounding control is not in effect for the 16-bit IA-32 math coprocessors.
ARCHITECTURE COMPATIBILITY the x87 FPU,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1). Condition code flag C1 of the status word may differ as a result. The exact threshold for underflow and overflow will vary by a few ulps. The P6 family and Pentium processors’ results will have a worst case error of less than 1 ulp when rounding to the nearest-even and less than 1.5 ulps when rounding in other modes.
ARCHITECTURE COMPATIBILITY instruction to ensure synchronization. Although 8087 programs having explicit WAIT instructions execute perfectly on the 32-bit IA-32 processors without reassembly, these WAIT instructions are unnecessary. 17.
ARCHITECTURE COMPATIBILITY 17.19.2 Intel486 SX Processor and Intel 487 SX Math Coprocessor Initialization When initializing an Intel486 SX processor and an Intel 487 SX math coprocessor, the initialization routine should check the presence of the math coprocessor and should set the FPU related flags (EM, MP, and NE) in control register CR0 accordingly (see Section 2.5, “Control Registers,” for a complete description of these flags).
ARCHITECTURE COMPATIBILITY If the Intel 487 SX math coprocessor is not present, the following code can be run to set the CR0 register for the Intel486 SX processor. mov eax, cr0 and eax, fffffffdh or eax, 0024h mov cr0, eax ;make MP=0 ;make EM=1, NE=1 This initialization will cause any floating-point instruction to generate a device not available exception (#NH), interrupt 7. The software emulation will then take control to execute these instructions.
ARCHITECTURE COMPATIBILITY • VME — Virtual-8086 mode extensions. Enables support for a virtual interrupt flag in virtual-8086 mode (see Section 15.3, “Interrupt and Exception Handling in Virtual-8086 Mode”). • PVI — Protected-mode virtual interrupts. Enables support for a virtual interrupt flag in protected mode (see Section 15.4, “Protected-Mode Virtual Interrupts”). • TSD — Time-stamp disable. Restricts the execution of the RDTSC instruction to procedures running at privileged level 0.
ARCHITECTURE COMPATIBILITY 17.21 MEMORY MANAGEMENT FACILITIES The following sections describe the new memory management facilities available in the various IA-32 processors and some compatibility differences. 17.21.1 New Memory Management Control Flags The Pentium Pro processor introduced three new memory management features: physical memory addressing extension, the global bit in page-table entries, and general support for larger page sizes.
ARCHITECTURE COMPATIBILITY 17.21.2 CD and NW Cache Control Flags The CD and NW flags in control register CR0 were introduced in the Intel486 processor. In the P6 family and Pentium processors, these flags are used to implement a writeback strategy for the data cache; in the Intel486 processor, they implement a write-through strategy. See Table 10-5 for a comparison of these bits on the P6 family, Pentium, and Intel486 processors. For complete information on caching, see Chapter 10, “Memory Cache Control.
ARCHITECTURE COMPATIBILITY 01 Break on data writes only. 10 Undefined if the DE flag in control register CR4 is cleared; break on I/O reads or writes but not instruction fetches if the DE flag in control register CR4 is set. 11 Break on data reads or writes but not instruction fetches. On the P6 family and Pentium processors, reserved bits 11, 12, 14 and 15 are hardwired to 0. On the Intel486 processor, however, bit 12 can be set.
ARCHITECTURE COMPATIBILITY occurred. When an exception associated with the XMM registers occurs, an interrupt is generated. • SIMD floating-point exception (#XF, interrupt 19) — New exceptions associated with the SIMD floating-point registers and resulting computations. No new exceptions were added with the Pentium Pro and Pentium II processors. The set of available exceptions is the same as for the Pentium processor.
ARCHITECTURE COMPATIBILITY • Invalid-opcode exception (#UD, interrupt 6) — New exception condition added. Improper use of the LOCK instruction prefix can generate an invalid-opcode exception. • Page-fault exception (#PF, interrupt 14) — New exception condition added. If paging is enabled in a 16-bit program, a page-fault exception can be generated as follows. Paging can be used in a system with 16-bit tasks if all tasks use the same page directory.
ARCHITECTURE COMPATIBILITY 17.25.1 Interrupt Propagation Delay External hardware interrupts may be recognized on different instruction boundaries on the P6 family, Pentium, Intel486, and Intel386 processors, due to the superscaler designs of the P6 family and Pentium processors. Therefore, the EIP pushed onto the stack when servicing an interrupt may be different for the P6 family, Pentium, Intel486, and Intel386 processors. 17.25.
ARCHITECTURE COMPATIBILITY INIT, SMI, NMI, and start-up IPIs. In the 82489DX, when the local unit is disabled, all the internal registers including the IRR, ISR and TMR are cleared and the mask bits in the LVT are set. In this state, the 82489DX local unit will accept only the reset deassert message. • In the local APIC, NMI and INIT (except for INIT deassert) are always treated as edge triggered interrupts, even if programmed otherwise. In the 82489DX, these interrupts are always level triggered.
ARCHITECTURE COMPATIBILITY 17.26.3 New Features Incorporated in the Local APIC of the Pentium 4 and Intel Xeon Processors The local APIC in the Pentium 4 and Intel Xeon processors has the following new features not found in the P6 family and Pentium processors and in the 82489DX. • • The local APIC ID is extended to 8 bits. • The the ability to deliver lowest-priority interrupts to a focus processor is no longer supported. • The flat cluster logical destination mode is not supported.
ARCHITECTURE COMPATIBILITY 17.27.4 Using A 16-Bit TSS with 32-Bit Constructs Task switches using 16-bit TSSs should be used only for pure 16-bit code. Any new code written using 32-bit constructs (operands, addressing, or the upper word of the EFLAGS register) should use only 32-bit TSSs. This is due to the fact that the 32-bit processors do not save the upper 16 bits of EFLAGS to a 16-bit TSS.
ARCHITECTURE COMPATIBILITY Intel486 Processor P6 family and Pentium Processors FFFFH + 10H = Outside Segment for I/O Validation FFFFH I/O Map Base Addres FFFFH FFFFH I/O Map Base Addres FFFFH FFFFH + 10H = FH for I/O Validation 0H I/O access at port 10H checks bitmap at I/O map base address FFFFH + 10H = offset 10H. Offset FH from beginning of TSS segment results because wraparound occurs. 0H I/O access at port 10H checks bitmap at I/O address FFFFH + 10H, which exceeds segment limit.
ARCHITECTURE COMPATIBILITY The P6 family and Pentium processors support page-level cache management in the same manner as the Intel486 processor by using the PCD and PWT flags in control register CR3, the page-directory entries, and the page-table entries. The Intel486 processor, however, is not affected by the state of the PWT flag since the internal cache of the Intel486 processor is a write-through cache. 17.28.
ARCHITECTURE COMPATIBILITY 17.29 PAGING This section identifies enhancements made to the paging mechanism and implementation differences in the paging mechanism for various IA-32 processors. 17.29.1 Large Pages The Pentium processor extended the memory management/paging facilities of the IA-32 to allow large (4 MBytes) pages sizes (see Section 3.6.1, “Paging Options”).
ARCHITECTURE COMPATIBILITY For the P6 family processors, the MOV CR0, REG instruction is serializing, so the jump operation is not required. However, for backwards compatibility, the JMP instruction should still be included. 17.30 STACK OPERATIONS This section identifies the differences in the stack mechanism for the various IA-32 processors. 17.30.
ARCHITECTURE COMPATIBILITY 17.30.2 Error Code Pushes The Intel486 processor implements the error code pushed on the stack as a 16-bit value. When pushed onto a 32-bit stack, the Intel486 processor only pushes 2 bytes and updates ESP by 4. The P6 family and Pentium processors’ error code is a full 32 bits with the upper 16 bits set to zero. The P6 family and Pentium processors, therefore, push 4 bytes and update ESP by 4.
ARCHITECTURE COMPATIBILITY The segment descriptors for data segments, code segments, local descriptor tables (there are no descriptors for global descriptor tables), and task gates are the same for the 16- and 32-bit processors. Other 16-bit descriptors (TSS segment, call gate, interrupt gate, and trap gate) are supported by the 32-bit processors. The 32-bit processors also have descriptors for TSS segments, call gates, interrupt gates, and trap gates that support the 32-bit architecture.
ARCHITECTURE COMPATIBILITY • A general-protection exception (#GP) if the segment is a data segment (that is, if the CS, DS, ES, FS, or GS register is being used to address the segment). • A stack-fault exception (#SS) if the segment is a stack segment (that is, if the SS register is being used). An exception to this behavior occurs when a stack access is data aligned, and the stack pointer is pointing to the last aligned piece of data that size at the top of the stack (ESP is FFFFFFFCH).
ARCHITECTURE COMPATIBILITY The Pentium 4, Intel Xeon, and P6 family processors use processor ordering to maintain consistency in the order that data is read (loaded) and written (stored) in a program and the order the processor actually carries out the reads and writes. With this type of ordering, reads can be carried out speculatively and in any order, reads can pass buffered writes, and writes to memory are always carried out in program order. (See Section 7.
ARCHITECTURE COMPATIBILITY locking specific to the Intel 286 processor may not run properly when run on later processors. A locked instruction is guaranteed to lock only the area of memory defined by the destination operand, but may lock a larger memory area. For example, typical 8086 and Intel 286 configurations lock the entire physical memory space. Programmers should not depend on this. On the Intel 286 processor, the LOCK prefix is sensitive to IOPL.
ARCHITECTURE COMPATIBILITY available MSRs. The new registers control the debug extensions, the performance counters, the machine-check exception capability, the machine-check architecture, and the MTRRs. These registers are accessible using the RDMSR and WRMSR instructions. Specific information on some of these new MSRs is provided in the following sections.
ARCHITECTURE COMPATIBILITY The P6 family processors extend the types of errors that can be detected and that generate a machine-check exception. It also provides a new machine-check architecture for recording information about a machine-check error and provides extended recovery capability. The machine-check architecture provides several banks of reporting registers for recording machine-check errors. Each bank of registers is associated with a specific hardware unit in the processor.
ARCHITECTURE COMPATIBILITY in a 32-bit software system should have 32-bit TSSs. It is not necessary to change the 16-bit object modules themselves; TSSs are usually constructed by the operating system, by the loader, or by the system builder. See Chapter 16, “Mixing 16-Bit and 32-Bit Code,” for more detailed information about mixing 16-bit and 32-bit code.
ARCHITECTURE COMPATIBILITY 17-46 Vol.
INTEL SALES OFFICES ASIA PACIFIC Australia Intel Corp. Level 2 448 St Kilda Road Melbourne VIC 3004 Australia Fax:613-9862 5599 China Intel Corp. Rm 709, Shaanxi Zhongda Int'l Bldg No.30 Nandajie Street Xian AX710002 China Fax:(86 29) 7203356 Intel Corp. Room 0724, White Rose Hotel No 750, MinZhu Road WuChang District Wuhan UB 430071 China Viet Nam Intel Corp. Hanoi Tung Shing Square, Ste #1106 2 Ngo Quyen St Hoan Kiem District Hanoi Viet Nam India Intel Corp.
Intel Corp. 999 CANADA PLACE, Suite 404,#11 Vancouver BC V6C 3E2 Canada Fax:604-844-2813 Intel Corp. 2650 Queensview Drive, Suite 250 Ottawa ON K2B 8H6 Canada Fax:613-820-5936 Intel Corp. 190 Attwell Drive, Suite 500 Rexcdale ON M9W 6H8 Canada Fax:416-675-2438 Intel Corp. 171 St. Clair Ave. E, Suite 6 Toronto ON Canada Intel Corp. 1033 Oak Meadow Road Oakville ON L6M 1J6 Canada USA California Intel Corp. 551 Lundy Place Milpitas CA 95035-6833 USA Fax:408-451-8266 Intel Corp. 1551 N.