Intel® 64 and IA-32 Architectures Software Developer’s Manual Volume 3B: System Programming Guide, Part 2 NOTE: The Intel® 64 and IA-32 Architectures Software Developer's Manual consists of five volumes: Basic Architecture, Order Number 253665; Instruction Set Reference A-M, Order Number 253666; Instruction Set Reference N-Z, Order Number 253667; System Programming Guide, Part 1, Order Number 253668; System Programming Guide, Part 2, Order Number 253669.
INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT.
CHAPTER 18 DEBUGGING AND PERFORMANCE MONITORING Intel 64 and IA-32 architectures provide debug facilities for use in debugging code and monitoring performance. These facilities are valuable for debugging application software, system software, and multitasking operating systems. Debug support is accessed using debug registers (DB0 through DB7) and model-specific registers (MSRs): • Debug registers hold the addresses of memory and I/O locations called breakpoints.
DEBUGGING AND PERFORMANCE MONITORING when more than four breakpoints are desired, or when breakpoints are being placed in the source code. • Last branch recording facilities — Store branch records in the last branch record (LBR) stack MSRs for the most recent taken branches, interrupts, and/or exceptions in MSRs. A branch record consist of a branch-from and a branch-to instruction address. Send branch records out on the system bus as branch trace messages (BTMs).
DEBUGGING AND PERFORMANCE MONITORING • Whether the breakpoint condition was present when the debug exception was generated.
DEBUGGING AND PERFORMANCE MONITORING 18.2.1 Debug Address Registers (DR0-DR3) Each of the debug-address registers (DR0 through DR3) holds the 32-bit linear address of a breakpoint (see Figure 18-1). Breakpoint comparisons are made before physical address translation occurs. The contents of debug register DR7 further specifies breakpoint conditions. 18.2.
DEBUGGING AND PERFORMANCE MONITORING Certain debug exceptions may clear bits 0-3. The remaining contents of the DR6 register are never cleared by the processor. To avoid confusion in identifying debug exceptions, debug handlers should clear the register before returning to the interrupted task. 18.2.4 Debug Control Register (DR7) The debug control register (DR7) enables or disables breakpoints and sets breakpoint conditions (see Figure 18-1).
DEBUGGING AND PERFORMANCE MONITORING 00 01 10 11 — — — — Break Break Break Break on on on on instruction execution only. data writes only. I/O reads or writes. data reads or writes but not instruction fetches. When the DE flag is clear, the processor interprets the R/Wn bits the same as for the Intel386™ and Intel486™ processors, which is as follows: 00 01 10 11 • — — — — Break on instruction execution only. Break on data writes only. Undefined.
DEBUGGING AND PERFORMANCE MONITORING bits, for comparison with the breakpoint address in the selected debug register). These requirements are enforced by the processor; it uses LENn field bits to mask the lower address bits in the debug registers. Unaligned data or I/O breakpoint addresses do not yield valid results. A data breakpoint for reading or writing data is triggered if any of the bytes participating in an access is within the range defined by a breakpoint address register and its LENn field.
DEBUGGING AND PERFORMANCE MONITORING Table 18-1. Breakpoint Examples (Contd.) Debug Register Setup Debug Register R/Wn Data operations that do not trap - Read or write - Read - Read or write - Read or write - Read - Read or write 18.2.6 Breakpoint Address LENn A0000H A0002H A0003H B0000H C0000H C0004H 1 1 4 2 2 4 Debug Registers and Intel® 64 Processors For Intel 64 architecture processors, debug registers DR0–DR7 are 64 bits.
DEBUGGING AND PERFORMANCE MONITORING 63 32 DR7 31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 LEN R/W LEN R/W LEN R/W LEN R/W 0 0 G 0 0 1 G L G L G L G L G L DR7 3 3 2 2 1 1 0 0 D E E 3 3 2 2 1 1 0 0 63 32 DR6 31 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved (set to 1) B B B 0 1 1 1 1 1 1 1 1 1 B B B B DR6 T S D 3 2 1 0 Reserved Figure 18-2. DR6/DR7 Layout on Processors Supporting Intel 64 Technology 18.
DEBUGGING AND PERFORMANCE MONITORING See also: Chapter 5, “Interrupt 1—Debug Exception (#DB),” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A. Table 18-2.
DEBUGGING AND PERFORMANCE MONITORING (resume flag) in the EFLAGS register (see Section 2.3, “System Flags and Fields in the EFLAGS Register,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A). When the RF flag is set, the processor ignores instruction breakpoints. All Intel 64 and IA-32 processors manage the RF flag as follows. The RF Flag is cleared at the start of the instruction after the check for code breakpoint, CS limit violation and FP exceptions.
DEBUGGING AND PERFORMANCE MONITORING 18.3.1.2 Data Memory and I/O Breakpoint Exception Conditions Data memory and I/O breakpoints are reported when the processor attempts to access a memory or I/O address specified in a breakpoint-address register (DB0 through DR3) that has been set up to detect data or I/O accesses (R/W flag is set to 1, 2, or 3).
DEBUGGING AND PERFORMANCE MONITORING single-step trap does not occur until after the instruction that follows the POPF instruction. The processor clears the TF flag before calling the exception handler. If the TF flag was set in a TSS at the time of a task switch, the exception occurs after the first instruction is executed in the new task. The TF flag normally is not cleared by privilege changes inside a task. The INT n and INTO instructions, however, do clear this flag.
DEBUGGING AND PERFORMANCE MONITORING 18.4 LAST BRANCH RECORDING OVERVIEW P6 family processors introduced the ability to set breakpoints on taken branches, interrupts, and exceptions, and to single-step from one branch to the next. This capability has been modified and extended in the Pentium 4, Intel Xeon, Pentium M, Intel® Core™ Solo, Intel® Core™ Duo, Intel® Core™2 Duo and Intel® Atom™ processors to allow logging of branch trace messages in a branch trace store (BTS) buffer in memory.
DEBUGGING AND PERFORMANCE MONITORING • Last branch record (LBR) stack — There are a collection of MSR pairs that store the source and destination addresses related to recently executed branches. See Section 18.5.2. • Monitoring and single-stepping of branches, exceptions, and interrupts — See Section 18.7.4 and Section 18.7.5. In addition, the ability to freeze the LBR stack on a PMI request is available. • • • Branch trace messages — See Section 18.7.6. Last exception records — See Section 18.7.7.
DEBUGGING AND PERFORMANCE MONITORING 31 14 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved FREEZE_WHILE_SMM_EN FREEZE_PERFMON_ON_PMI FREEZE_LBRS_ON_PMI BTS_OFF_USR — BTS off in user code BTS_OFF_OS — BTS off in OS BTINT — Branch trace interrupt BTS — Branch trace store TR — Trace messages enable Reserved BTF — Single-step on branches LBR — Last branch/interrupt/exception Figure 18-3.
DEBUGGING AND PERFORMANCE MONITORING 18.5.2 LBR Stack The last branch record stack and top-of-stack (TOS) pointer MSRs are supported across Intel Core 2, Intel Xeon and Intel Atom processor families. However, the number of MSRs in the LBR stack and the valid range of TOS pointer value can vary between different processor families.
DEBUGGING AND PERFORMANCE MONITORING Software should query an architectural MSR IA32_PERF_CAPABILITIES[5:0] about the format of the address that is stored in the LBR stack.
DEBUGGING AND PERFORMANCE MONITORING Software must re-enable counts by writing 1s to the corresponding enable bits in MSR_PERF_GLOBAL_CTRL before leaving a PMI service routine to continue counter operation. Freezing LBRs and PMCs on PMIs occur when: • A performance counter had an overflow and was programmed to signal a PMI in case of an overflow. — For the general-purpose counters; this is done by setting bit 20 of the IA32_PERFEVTSELx register.
DEBUGGING AND PERFORMANCE MONITORING 18.6 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (INTEL® CORE™I7 PROCESSOR FAMILY) The Intel Core i7 processor family and Intel Xeon processors based on Intel microarchitecture (Nehalem) support last branch interrupt and exception recording.
DEBUGGING AND PERFORMANCE MONITORING 31 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 Reserved FREEZE_WHILE_SMM_EN UNCORE_PMI_EN FREEZE_PERFMON_ON_PMI FREEZE_LBRS_ON_PMI BTS_OFF_USR — BTS off in user code BTS_OFF_OS — BTS off in OS BTINT — Branch trace interrupt BTS — Branch trace store TR — Trace messages enable Reserved BTF — Single-step on branches LBR — Last branch/interrupt/exception Figure 18-5. IA32_DEBUGCTL MSR for Processors based on Intel microarchitecture (Nehalem) 18.6.
DEBUGGING AND PERFORMANCE MONITORING 18.6.2 Filtering of Last Branch Records MSR_LBR_SELECT is cleared to zero at RESET, and LBR filtering is disabled, i.e. all branches will be captured. MSR_LBR_SELECT provides bit fields to specify the conditions of subsets of branches that will not be captured in the LBR. The layout of MSR_LBR_SELECT is shown in Table 18-6. Table 18-6.
DEBUGGING AND PERFORMANCE MONITORING • MSR_DEBUGCTLA MSR — Enables last branch, interrupt, and exception recording; single-stepping on taken branches; branch trace messages (BTMs); and branch trace store (BTS). This register is named DebugCtlMSR in the P6 family processors. • Debug store (DS) feature flag (CPUID.1:EDX.DS[bit 21]) — Indicates that the processor provides the debug store (DS) mechanism, which allows BTMs to be stored in a memory-resident BTS buffer.
DEBUGGING AND PERFORMANCE MONITORING Table 18-7. LBR MSR Stack Structure for the Pentium® 4 and the Intel® Xeon® Processor Family LBR MSRs for Family 0FH, Models 0H-02H; MSRs at locations 1DBH-1DEH. Decimal Value of TOS Pointer in MSR_LASTBRANCH_TOS (bits 0-1) MSR_LASTBRANCH_0 0 MSR_LASTBRANCH_1 1 MSR_LASTBRANCH_2 2 MSR_LASTBRANCH_3 3 LBR MSRs for Family 0FH, Models; MSRs at locations 680H-68FH.
DEBUGGING AND PERFORMANCE MONITORING Table 18-7. LBR MSR Stack Structure for the Pentium® 4 and the Intel® Xeon® Processor Family (Contd.) LBR MSRs for Family 0FH, Models; MSRs at locations 680H-68FH. Decimal Value of TOS Pointer in MSR_LASTBRANCH_TOS (bits 0-3) LBR MSRs for Family 0FH, Model 03H; MSRs at locations 6C0H-6CFH.
DEBUGGING AND PERFORMANCE MONITORING 18.7.2 MSR_DEBUGCTLA MSR The MSR_DEBUGCTLA MSR enables and disables the various last branch recording mechanisms described in the previous section. This register can be written to using the WRMSR instruction, when operating at privilege level 0 or when in real-address mode. A protected-mode operating system procedure is required to provide user access to this register. Figure 18-6 shows the flags in the MSR_DEBUGCTLA MSR.
DEBUGGING AND PERFORMANCE MONITORING • BTINT (branch trace interrupt) flag (bits 4) — When set, the BTS facilities generate an interrupt when the BTS buffer is full. When clear, BTMs are logged to the BTS buffer in a circular fashion. See Section 18.7.8, “Branch Trace Store (BTS).” • BTS_OFF_OS (disable ring 0 branch trace store) flag (bit 5) — When set, enables the BTS facilities to skip sending/logging CPL_0 BTMs to the memoryresident BTS buffer. See Section 18.7.
DEBUGGING AND PERFORMANCE MONITORING CPUID Family 0FH, Models 0H-02H MSR_LASTBRANCH_0 through MSR_LASTBRANCH_3 63 0 32 - 31 From Linear Address To Linear Address CPUID Family 0FH, Model 03H-04H MSR_LASTBRANCH_0_FROM_LIP through MSR_LASTBRANCH_15_FROM_LIP 0 32 - 31 63 Reserved From Linear Address MSR_LASTBRANCH_0_TO_LIP through MSR_LASTBRANCH_15_TO_LIP 63 0 32 - 31 Reserved To Linear Address Figure 18-7.
DEBUGGING AND PERFORMANCE MONITORING When the processor generates a a debug exception (#DB), it automatically clears the LBR flag before executing the exception handler. This action does not clear previously stored LBR stack MSRs. The branch record for the last four taken branches, interrupts and/or exceptions are retained for analysis. A debugger can use the linear addresses in the LBR stack to re-set breakpoints in the breakpoint address registers (DR0 through DR3).
DEBUGGING AND PERFORMANCE MONITORING BTMs when both the TR and LBR flags are set in the MSR_DEBUGCTLA/IA32_DEBUGCTL MSR. 18.7.7 Last Exception Records The Pentium 4 and Intel Xeon processors provide two 32 bit MSRs (the MSR_LER_TO_LIP and the MSR_LER_FROM_LIP MSRs) that duplicate the functions of the LastExceptionToIP and LastExceptionFromIP MSRs found in the P6 family processors.
DEBUGGING AND PERFORMANCE MONITORING 18.7.8.2 Setting Up the DS Save Area To save branch records with the BTS buffer, the DS save area must first be set up in memory as described in the following procedure. See Section 18.7.8.3, “Setting Up the BTS Buffer,” and Section 18.18.8.3, “Setting Up the PEBS Buffer,” for instructions for setting up a BTS buffer and/or a PEBS buffer, respectively, in the DS save area: 1. Create the DS buffer management information area in memory (see Section 18.18.
DEBUGGING AND PERFORMANCE MONITORING • Pages that contain buffers must be mapped to the same physical addresses for all processes, such that any change to control register CR3 will not change the DS addresses. • The DS save area is expected to used only on systems with an enabled APIC. The LVT Performance Counter entry in the APCI must be initialized to use an interrupt gate instead of the trap gate. 18.7.8.
DEBUGGING AND PERFORMANCE MONITORING NOTES If the buffer size is set to less than the minimum allowable value (i.e. BTS absolute maximum < 1 + size of BTS record), the results of BTS is undefined. In order to prevent generating an interrupt, when working with circular BTS buffer, SW need to set BTS interrupt threshold to a value greater than BTS absolute maximum (fields of the DS buffer management area). It's not enough to clear the BTINT flag itself only. 18.7.8.
DEBUGGING AND PERFORMANCE MONITORING 18.7.8.5 Writing the DS Interrupt Service Routine The BTS, non-precise event-based sampling, and PEBS facilities share the same interrupt vector and interrupt service routine (called the debug store interrupt service routine or DS ISR). To handle BTS, non-precise event-based sampling, and PEBS interrupts: separate handler routines must be included in the DS ISR.
DEBUGGING AND PERFORMANCE MONITORING 18.8 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (INTEL® CORE™ SOLO AND INTEL® CORE™ DUO PROCESSORS) Intel Core Solo and Intel Core Duo processors provide last branch interrupt and exception recording. This capability is almost identical to that found in Pentium 4 and Intel Xeon processors. There are differences in the stack and in some MSR names and locations.
DEBUGGING AND PERFORMANCE MONITORING 31 8 7 6 5 4 3 2 1 0 Reserved BTINT — Branch trace interrupt BTS — Branch trace store TR — Trace messages enable Reserved BTF — Single-step on branches LBR — Last branch/interrupt/exception Figure 18-8.
DEBUGGING AND PERFORMANCE MONITORING MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7 0 32 - 31 63 To Linear Address From Linear Address Figure 18-9. LBR Branch Record Layout for the Intel Core Solo and Intel Core Duo Processor 18.9 LAST BRANCH, INTERRUPT, AND EXCEPTION RECORDING (PENTIUM M PROCESSORS) Like the Pentium 4 and Intel Xeon processor family, Pentium M processors provide last branch interrupt and exception recording.
DEBUGGING AND PERFORMANCE MONITORING — TR (trace message enable) flag (bit 6) — When set, branch trace messages are enabled. When the processor detects a taken branch, interrupt, or exception, it sends the branch record out on the system bus as a branch trace message (BTM). See Section 18.7.6, “Branch Trace Messages,” for more information about the TR flag.
DEBUGGING AND PERFORMANCE MONITORING MSR_LASTBRANCH_0 through MSR_LASTBRANCH_7 0 32 - 31 63 To Linear Address From Linear Address Figure 18-11. LBR Branch Record Layout for the Pentium M Processor For more detail on these capabilities, see Section 18.7, “Last Branch, Interrupt, and Exception Recording (Processors based on Intel NetBurst® Microarchitecture),” and Appendix B.7, “MSRs In the Pentium M Processor.” 18.
DEBUGGING AND PERFORMANCE MONITORING 31 7 6 5 4 3 2 1 0 Reserved P P P P B L T B B B B T B R 3 2 1 0 F R TR — Trace messages enable PBi — Performance monitoring/breakpoint pins BTF — Single-step on branches LBR — Last branch/interrupt/exception Figure 18-12. DEBUGCTLMSR Register (P6 Family Processors) • BTF (single-step on branches) flag (bit 1) — When set, the processor treats the TF flag in the EFLAGS register as a “single-step on branches” flag. See Section 18.7.
DEBUGGING AND PERFORMANCE MONITORING tion or interrupt being generated. When an exception or interrupt occurs, the contents of the LastBranchToIP and LastBranchFromIP MSRs are copied into these registers before the to and from addresses of the exception or interrupt are recorded in the LastBranchToIP and LastBranchFromIP MSRs. These registers can be read using the RDMSR instruction.
DEBUGGING AND PERFORMANCE MONITORING 18.11 TIME-STAMP COUNTER The Intel 64 and IA-32 architectures (beginning with the Pentium processor) define a time-stamp counter mechanism that can be used to monitor and identify the relative time occurrence of processor events. The counter’s architecture includes the following components: • TSC flag — A feature bit that indicates the availability of the time-stamp counter. The counter is available in an if the function CPUID.1:EDX.TSC[bit 4] = 1.
DEBUGGING AND PERFORMANCE MONITORING NOTE To determine average processor clock frequency, Intel recommends the use of EMON logic to count processor core clocks over the period of time for which the average is required. See Section 18.20, “Counting Clocks,” and Appendix A, “PerformanceMonitoring Events,” for more information. The RDTSC instruction reads the time-stamp counter and is guaranteed to return a monotonically increasing unique value whenever executed, except for a 64-bit counter wraparound.
DEBUGGING AND PERFORMANCE MONITORING 18.11.2 IA32_TSC_AUX Register and RDTSCP Support Processor based on Intel microarchitecture (Nehalem) provides an auxiliary TSC register, IA32_TSC_AUX that is designed to be used in conjunction with IA32_TSC. IA32_TSC_AUX provides a 32-bit field that is initialized by privileged software with a signature value (for example, a logical processor ID).
DEBUGGING AND PERFORMANCE MONITORING discussed in Section 18.14, “Performance Monitoring (Intel® Core™ Solo and Intel® Core™ Duo Processors).” Non-architectural events for a given microarchitecture can not be enumerated using CPUID; and they are listed in Appendix A, “PerformanceMonitoring Events.” The second class of performance monitoring capabilities is referred to as architectural performance monitoring. This class supports the same counting and sampling usages, with a smaller set of available events.
DEBUGGING AND PERFORMANCE MONITORING Intel Core Solo and Intel Core Duo processors support base level functionality identified by version ID of 1. Processors based on Intel Core microarchitecture support, at a minimum, the base level functionality of architectural performance monitoring. Intel Core 2 Duo processor T 7700 and newer processors based on Intel Core microarchitecture support both the base level functionality and enhanced architectural performance monitoring identified by version ID of 2.
DEBUGGING AND PERFORMANCE MONITORING Software Developer’s Manual, Volume 2A). If the version identifier is greater than zero, architectural performance monitoring capability is supported. Software queries the CPUID.0AH for the version identifier first; it then analyzes the value returned in CPUID.0AH.EAX, CPUID.0AH.EBX to determine the facilities available.
DEBUGGING AND PERFORMANCE MONITORING 63 31 24 23 22 21 20 19 18 17 16 15 Counter Mask I E N (CMASK) V N 0 8 7 I U N P E O S Unit Mask (UMASK) S R T C Event Select INV—Invert counter mask EN—Enable counters INT—APIC interrupt enable PC—Pin control E—Edge detect OS—Operating system mode USR—User Mode Reserved Figure 18-13. Layout of IA32_PERFEVTSELx MSRs • Unit mask (UMASK) field (bits 8 through 15) — These bits qualify the condition that the selected event logic unit detects.
DEBUGGING AND PERFORMANCE MONITORING • PC (pin control) flag (bit 19) — When set, the logical processor toggles the PMi pins and increments the counter when performance-monitoring events occur; when clear, the processor toggles the PMi pins when the counter overflows. The toggling of a pin is defined as assertion of the pin for a single bus clock followed by deassertion.
DEBUGGING AND PERFORMANCE MONITORING — IA32_PERF_GLOBAL_CTRL allows software to enable/disable event counting of all or any combination of fixed-function PMCs (IA32_FIXED_CTRx) or any general-purpose PMCs via a single WRMSR. — IA32_PERF_GLOBAL_STATUS allows software to query counter overflow conditions on any combination of fixed-function PMCs or general-purpose PMCs via a single RDMSR.
DEBUGGING AND PERFORMANCE MONITORING • Enable field (lowest 2 bits within each 4-bit control) — When bit 0 is set, performance counting is enabled in the corresponding fixed-function performance counter to increment while the target condition associated with the architecture performance event occurred at ring 0.
DEBUGGING AND PERFORMANCE MONITORING ware. Figure 18-16 shows the layout of IA32_PERF_GLOBAL_STATUS. A value of 1 in bits 0, 1, 32 through 34 indicates a counter overflow condition has occurred in the associated counter. When a performance counter is configured for PEBS, overflow condition in the counter generates a performance-monitoring interrupt signaling a PEBS event. On a PEBS event, the processor stores data records into the buffer area (see Section 18.15.5), clears the counter overflow status.
DEBUGGING AND PERFORMANCE MONITORING 63 62 35 34 33 32 31 2 1 0 ClrCondChgd ClrOvfBuffer IA32_FIXED_CTR2 ClrOverflow IA32_FIXED_CTR1 ClrOverflow IA32_FIXED_CTR0 ClrOverflow IA32_PMC1 ClrOverflow IA32_PMC0 ClrOverflow Reserved Figure 18-17. Layout of IA32_PERF_GLOBAL_OVF_CTRL MSR 18.13.2.2 Architectural Performance Monitoring Version 3 Facilities The facilities provided by architectural performance monitoring version 1 and 2 are also supported by architectural performance monitoring version 3.
DEBUGGING AND PERFORMANCE MONITORING 63 31 24 23 22 21 20 19 18 17 16 15 0 8 7 A I U Counter Mask I E N N N P E O S Unit Mask (UMASK) (CMASK) S R V N Y T C Event Select INV—Invert counter mask EN—Enable counters ANY—Any Thread INT—APIC interrupt enable PC—Pin control E—Edge detect OS—Operating system mode USR—User Mode Reserved Figure 18-18.
DEBUGGING AND PERFORMANCE MONITORING 63 12 11 P A M N I Y 9 8 7 E N P A M N I Y 5 43 2 1 0 E N P A M N I Y E N Cntr2 — Controls for IA32_FIXED_CTR2 Cntr1 — Controls for IA32_FIXED_CTR1 PMI — Enable PMI on overflow on IA32_FIXED_CTR0 AnyThread — AnyThread for IA32_FIXED_CTR0 ENABLE — IA32_FIXED_CTR0. 0: disable; 1: OS; 2: User; 3: All ring levels Reserved Figure 18-19.
DEBUGGING AND PERFORMANCE MONITORING 63 Global Enable Controls IA32_PERF_GLOBAL_CTRL 35 34 33 32 31 Reserved IA32_FIXED_CTR2 enable IA32_FIXED_CTR1 enable IA32_FIXED_CTR0 enable IA32_PMC(N-1) enable .................... enable IA32_PMC1 enable IA32_PMC0 enable 63 62 Global Overflow Status IA32_PERF_GLOBAL_STATUS 35 34 33 32 31 CondChgd OvfBuffer IA32_FIXED_CTR2 Overflow IA32_FIXED_CTR1 Overflow IA32_FIXED_CTR0 Overflow IA32_PMC1 Overflow IA32_PMC0 Overflow 63 62 N .. .. 1 0 N .. ..
DEBUGGING AND PERFORMANCE MONITORING Table 18-10. UMask and Event Select Encodings for Pre-Defined Architectural Performance Events Bit Position CPUID.AH.
DEBUGGING AND PERFORMANCE MONITORING This event counts reference clock cycles while the clock signal on the core is running. The reference clock operates at a fixed frequency, irrespective of core frequency changes due to performance state transitions. Processors may implement this behavior differently. See Table A-6 and Table A-8 in Appendix A, “Performance-Monitoring Events.
DEBUGGING AND PERFORMANCE MONITORING Non-architectural performance events use event select values that are modelspecific. Event mask (Umask) values are also specific to event logic units. Some microarchitectural conditions detectable by a Umask value may have specificity related to processor topology (see Section 7.7, “Detecting Hardware Multi-Threading Support and Topology,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A).
DEBUGGING AND PERFORMANCE MONITORING Table 18-13.
DEBUGGING AND PERFORMANCE MONITORING mance counters and associated counter control and status MSR becomes part of architectural performance monitoring version 2 facilities (see also Section 18.13.2). Non-architectural performance events in processors based on Intel Core microarchitecture use event select values that are model-specific. Valid event mask (Umask) bits are listed in Appendix A.
DEBUGGING AND PERFORMANCE MONITORING 18.15.1 Fixed-function Performance Counters Processors based on Intel Core microarchitecture provide three fixed-function performance counters. Bits beyond the width of the fixed counter are reserved and must be written as zeros. Model-specific fixed-function performance counters on processors that support Architectural Perfmon version 1 are 40 bits wide. Each of the fixed-function counter is dedicated to count a pre-defined performance monitoring events.
DEBUGGING AND PERFORMANCE MONITORING 63 12 11 P M I 9 8 7 E N P M I 5 43 2 1 0 E N P M I E N Cntr2 — Controls for MSR_PERF_FIXED_CTR2 Cntr1 — Controls for MSR_PERF_FIXED_CTR1 PMI — Enable PMI on overflow Cntr0 — Controls for MSR_PERF_FIXED_CTR0 ENABLE — 0: disable; 1: OS; 2: User; 3: All ring levels Reserved Figure 18-21.
DEBUGGING AND PERFORMANCE MONITORING 63 35 34 33 32 31 2 1 0 FIXED_CTR2 enable FIXED_CTR1 enable FIXED_CTR0 enable PMC1 enable PMC0 enable Reserved Figure 18-22. Layout of MSR_PERF_GLOBAL_CTRL MSR MSR_PERF_GLOBAL_STATUS MSR provides single-bit status used by software to query the overflow condition of each performance counter. The MSR also provides additional status bit to indicate overflow conditions when counters are programmed for precise-event-based sampling (PEBS).
DEBUGGING AND PERFORMANCE MONITORING MSR_PERF_GLOBAL_OVF_CTL MSR allows software to clear overflow the indicators for general-purpose or fixed-function counters via a single WRMSR (see Figure 18-24).
DEBUGGING AND PERFORMANCE MONITORING Table 18-18. At-Retirement Performance Events for Intel Core Microarchitecture Event Name UMask Event Select ITLB_MISS_RETIRED 00H C9H MEM_LOAD_RETIRED.L1D_MISS 01H CBH MEM_LOAD_RETIRED.L1D_LINE_MISS 02H CBH MEM_LOAD_RETIRED.L2_MISS 04H CBH MEM_LOAD_RETIRED.L2_LINE_MISS 08H CBH MEM_LOAD_RETIRED.DTLB_MISS 10H CBH 18.15.
DEBUGGING AND PERFORMANCE MONITORING 18.15.4.1 Setting up the PEBS Buffer For processors based on Intel Core microarchitecture, PEBS is available using IA32_PMC0 only. Use the following procedure to set up the processor and IA32_PMC0 counter for PEBS: 1. Set up the precise event buffering facilities. Place values in the precise event buffer base, precise event index, precise event absolute maximum, precise event interrupt threshold, and precise event counter reset fields of the DS buffer management area.
DEBUGGING AND PERFORMANCE MONITORING The service routine can query MSR_PERF_GLOBAL_STATUS to determine which counter(s) caused of overflow condition. The service routine should clear overflow indicator by writing to MSR_PERF_GLOBAL_OVF_CTL. A comparison of the sequence of requirements to program PEBS for processors based on Intel Core and Intel NetBurst microarchitectures is listed in Table 18-20. Table 18-20.
DEBUGGING AND PERFORMANCE MONITORING Table 18-20. Requirements to Program PEBS (Contd.) For Processors based on Intel Core microarchitecture For Processors based on Intel NetBurst microarchitecture Allocate buffer for PEBS states. Allocate a buffer in memory for the precise information. Program the IA32_DS_AREA MSR. Program the IA32_DS_AREA MSR. Configure the PEBS buffer management records. Configure the PEBS buffer management records in the DS buffer management area. Configure/Enable PEBS.
DEBUGGING AND PERFORMANCE MONITORING Valid event mask (Umask) bits are listed in Appendix A. The UMASK field may contain sub-fields that provide the same qualifying actions like those listed in Table 18-11, Table 18-12, Table 18-13, and Table 18-14. One or more of these sub-fields may apply to specific events on an event-by-event basis. Details are listed in Table A-7 in Appendix A, “Performance-Monitoring Events.” Precise Event Based Monitoring is supported using IA32_PMC0. 18.
DEBUGGING AND PERFORMANCE MONITORING 18.17.1 Enhancements of Performance Monitoring in the Processor Core The notable enhancements in the monitoring of performance events in the processor core include: • Four general purpose performance counters, IA32_PMCx, associated counter configuration MSRs, IA32_PERFEVTSELx, and global counter control MSR supporting simplified control of four counters.
DEBUGGING AND PERFORMANCE MONITORING 36 3534 33 32 31 63 8 7 6 5 43 2 1 0 LL_EN_PMC3 (R/W) LL_EN_PMC2 (R/W) LL_EN_PMC1 (R/W) LL_EN_PMC0 (R/W) PEBS_EN_PMC3 (R/W) PEBS_EN_PMC2 (R/W) PEBS_EN_PMC1 (R/W) PEBS_EN_PMC0 (R/W) Reserved RESET Value — 0x00000000_00000000 Figure 18-25. Layout of IA32_PEBS_ENABLE MSR Upon occurrence of the next PEBS event, the PEBS hardware triggers an assist and causes a PEBS record to be written.
DEBUGGING AND PERFORMANCE MONITORING Table 18-21.
DEBUGGING AND PERFORMANCE MONITORING IA32_DS_AREA MSR DS Buffer Management Area BTS Buffer Base 0H BTS Index 8H BTS Absolute Maximum BTS Interrupt Threshold BTS Buffer Branch Record 0 10H Branch Record 1 18H PEBS Buffer Base 20H PEBS Index PEBS Absolute Maximum PEBS Interrupt Threshold PEBS Counter0 Reset PEBS Counter1 Reset PEBS Counter2 Reset PEBS Counter3 Reset Reserved 28H 30H 38H Branch Record n 40H 48H PEBS Buffer PEBS Record 0 50H 58H PEBS Record 1 60H PEBS Record n Figure 18-26.
DEBUGGING AND PERFORMANCE MONITORING is generated when PEBS buffer is full. Software must reset the PEBS Index field to the beginning of the PEBS buffer address to continue capturing PEBS records. • PEBS Interrupt Threshold: This field specifies the threshold value to trigger a performance interrupt and notify software that the PEBS buffer is nearly full. This field is programmed with the linear address of the first byte of the PEBS record within the PEBS buffer that represents the threshold record.
DEBUGGING AND PERFORMANCE MONITORING threshold interrupt is generated after the PEBS assist completes, followed by the counter overflow interrupt (two separate interrupts are generated). Uncore counters may be programmed to interrupt one or more processor cores (see Section 18.17.2). It is possible for interrupts posted from the uncore facility to occur coincident with counter overflow interrupts from the processor core.
DEBUGGING AND PERFORMANCE MONITORING record will be for the last tagged load operation which retired just before the PEBS assist was invoked. The load-latency information written into a PEBS record (see Table 18-21, bytes AFH:98H) consists of: • Data Linear Address: This is the linear address of the target of the load operation. • Latency Value: This is the elapsed cycles of the tagged load operation between dispatch to GO, measured in processor core clock domain.
DEBUGGING AND PERFORMANCE MONITORING Table 18-22. Data Source Encoding for Load Latency Record (Contd.) Encoding Description 0xC L3 MISS. Local home requests that missed the L3 cache and was serviced by local DRAM (go to exclusive state). 0xD L3 MISS. Remote home requests that missed the L3 cache and was serviced by remote DRAM (go to exclusive state). 0xE Reserved 0xF The request was to un-cacheable memory. The layout of MSR_PEBS_LD_LAT_THRESHOLD is shown in Figure 18-27.
DEBUGGING AND PERFORMANCE MONITORING Table 18-23. Off-Core Response Event Encoding Event code in IA32_PERFEVTSELx Mask Value in IA32_PERFEVTSELx Required Off-core Response MSR 0xB7 0x01 MSR_OFFCORE_RSP_0 (address 0x1A6) The layout of MSR_OFFCORE_RSP_0 is shown in Figure 18-28. Bits 7:0 specifies the request type of a transaction request to the uncore. Bits 15:8 specifies the response of the uncore subsystem.
DEBUGGING AND PERFORMANCE MONITORING Table 18-24. MSR_OFFCORE_RSP_Z Bit Field Definition (Contd.) Bit Name Offset Description DMND_RFO 1 (R/W). Counts the number of demand and DCU prefetch reads for ownership (RFO) requests generated by a write to data cacheline. Does not count L2 RFO. DMND_IFETCH 2 (R/W). Counts the number of demand and DCU prefetch instruction cacheline reads. Does not count L2 code read prefetches. WB 3 (R/W).
DEBUGGING AND PERFORMANCE MONITORING 18.17.2 Performance Monitoring Facility in the Uncore The “uncore” in Intel microarchitecture (Nehalem) refers to subsystems in the physical processor package that are shared by multiple processor cores. Some of the subsystems in the uncore include the L3 cache, Intel QuickPath Interconnect link logic, and integrated memory controller.
DEBUGGING AND PERFORMANCE MONITORING 63 62 51 50 49 48 32 31 8 7 6 5 43 2 1 0 PMI_FRZ (R/W) EN_PMI_CORE3 (R/W) EN_PMI_CORE2 (R/W) EN_PMI_CORE1 (R/W) EN_PMI_CORE0 (R/W) EN_FC0 (R/W) EN_PC7 (R/W) EN_PC6 (R/W) EN_PC5 (R/W) EN_PC4 (R/W) EN_PC3 (R/W) EN_PC2 (R/W) EN_PC1 (R/W) EN_PC0 (R/W) Reserved RESET Value — 0x00000000_00000000 Figure 18-29. Layout of MSR_UNCORE_PERF_GLOBAL_CTRL MSR MSR_UNCORE_PERF_GLOBAL_STATUS provides overflow status of the U-clock performance counters in the uncore.
DEBUGGING AND PERFORMANCE MONITORING 63 62 61 60 32 31 8 7 6 5 43 2 1 0 CHG (R/W) OVF_PMI (R/W) OVF_FC0 (R/O) OVF_PC7 (R/O) OVF_PC6 (R/O) OVF_PC5 (R/O) OVF_PC4 (R/O) OVF_PC3 (R/O) OVF_PC2 (R/O) OVF_PC1 (R/O) OVF_PC0 (R/O) Reserved RESET Value — 0x00000000_00000000 Figure 18-30. Layout of MSR_UNCORE_PERF_GLOBAL_STATUS MSR Figure 18-31 shows the layout of MSR_UNCORE_PERF_GLOBAL_OVF_CTRL.
DEBUGGING AND PERFORMANCE MONITORING • CLR_OVF_PCn (bit n, n = 0, 7): Set this bit to clear the overflow status for general-purpose uncore counter MSR_UNCORE_PerfCntr n. Writing a value other than 1 is ignored. • CLR_OVF_FC0 (bit 32): Set this bit to clear the overflow status for the fixedfunction uncore counter MSR_UNCORE_FixedCntr0. Writing a value other than 1 is ignored. • CLR_OVF_PMI (bit 61): Set this bit to clear the OVF_PMI flag in MSR_UNCORE_PERF_GLOBAL_STATUS.
DEBUGGING AND PERFORMANCE MONITORING • Edge Detect (bit 18): When set causes the counter to increment when a deasserted to asserted transition occurs for the conditions that can be expressed by any of the fields in this register. • PMI (bit 20): When set, the uncore will generate an interrupt request when this counter overflowed. This request will be routed to the logical processors as enabled in the PMI enable bits (EN_PMI_COREx) in the register MSR_UNCORE_PERF_GLOBAL_CTRL.
DEBUGGING AND PERFORMANCE MONITORING and sampling usages. The event logic unit can filter event counts to specific regions of code or transaction types incoming to the home node logic. 18.17.2.3 Uncore Address/Opcode Match MSR The Event Select field [7:0] of MSR_UNCORE_PERFEVTSELx is used to select different uncore event logic unit.
DEBUGGING AND PERFORMANCE MONITORING Table 18-25.
DEBUGGING AND PERFORMANCE MONITORING • The IA32_MISC_ENABLE MSR, which indicates the availability in an Intel 64 or IA-32 processor of the performance monitoring and precise event-based sampling (PEBS) facilities. • Event selection control (ESCR) MSRs for selecting events to be monitored with specific performance counters. The number available differs by family and model (43 to 45). • • 18 performance counter MSRs for counting events.
DEBUGGING AND PERFORMANCE MONITORING Table 18-26. Performance Counter MSRs and Associated CCCR and ESCR MSRs (Pentium 4 and Intel Xeon Processors) (Contd.) Counter CCCR ESCR Name No. Addr Name Addr Name No.
DEBUGGING AND PERFORMANCE MONITORING Table 18-26. Performance Counter MSRs and Associated CCCR and ESCR MSRs (Pentium 4 and Intel Xeon Processors) (Contd.) Counter CCCR ESCR Name No. Addr Name Addr Name No.
DEBUGGING AND PERFORMANCE MONITORING • At-retirement events (see Table A-10) are events that are counted at the retirement stage of instruction execution, which allows finer granularity in counting events and capturing machine state. The at-retirement counting mechanism includes facilities for tagging μops that have encountered a particular performance event during instruction execution.
DEBUGGING AND PERFORMANCE MONITORING Figure 18-35 shows the layout of an ESCR MSR. The functions of the flags and fields are: • USR flag, bit 2 — When set, events are counted when the processor is operating at a current privilege level (CPL) of 1, 2, or 3. These privilege levels are generally used by application code and unprotected operating system code. • OS flag, bit 3 — When set, events are counted when the processor is operating at CPL of 0.
DEBUGGING AND PERFORMANCE MONITORING when operating system code and/or application code are being executed. If neither the OS nor USR flag is set, no events will be counted. The ESCRs are initialized to all 0s on reset. The flags and fields of an ESCR are configured by writing to the ESCR using the WRMSR instruction. Table 18-26 gives the addresses of the ESCR MSRs. Writing to an ESCR MSR does not enable counting with its associated performance counter; it only selects the event or events to be counted.
DEBUGGING AND PERFORMANCE MONITORING Each performance counter is 40-bits wide (see Figure 18-36). The RDPMC instruction has been enhanced in the Pentium 4 and Intel Xeon processors to allow reading of either the full counter-width (40-bits) or the low 32-bits of the counter. Reading the low 32-bits is faster than reading the full counter width and is appropriate in situations where the count is small enough to be contained in 32 bits.
DEBUGGING AND PERFORMANCE MONITORING • Enable flag, bit 12 — When set, enables counting; when clear, the counter is disabled. This flag is cleared on reset. • ESCR select field, bits 13 through 15 — Identifies the ESCR to be used to select events to be counted with the counter associated with the CCCR. • Compare flag, bit 18 — When set, enables filtering of the event count; when clear, disables filtering. The filtering method is selected with the threshold, complement, and edge flags.
DEBUGGING AND PERFORMANCE MONITORING Reserved 31 30 29 27 26 25 24 23 20 19 18 17 16 15 13 12 11 ESCR Select Threshold 0 Reserved Reserved Enable Reserved: Must be set to 11B Compare Complement Edge FORCE_OVF OVF_PMI Cascade OVF 63 32 Reserved Figure 18-37. Counter Configuration Control Register (CCCR) • FORCE_OVF flag, bit 25 — When set, forces a counter overflow on every counter increment; when clear, overflow only occurs when the counter actually overflows.
DEBUGGING AND PERFORMANCE MONITORING 2. The OS and USR flags in the ESCR selected the privilege levels at which events will be counted. 3. The ESCR select field of the CCCR selects the ESCR. Since each counter has several ESCRs associated with it, one ESCR must be chosen to select the classes of events that may be counted. 4. The compare and complement flags and the threshold field of the CCCR select an optional threshold to be used in qualifying an event count. 5.
DEBUGGING AND PERFORMANCE MONITORING occurrence of the PEBS event that caused the counter to overflow. When the state information has been logged, the counter is automatically reset to a preselected value, and event counting begins again. This feature is available only for a subset of the Pentium 4 and Intel Xeon processors’ performance events. NOTES DS save area and recording mechanism is not available in the SMM. The feature is disabled on transition to the SMM mode.
DEBUGGING AND PERFORMANCE MONITORING IA32_DS_AREA MSR DS Buffer Management Area BTS Buffer Base 0H BTS Index 4H BTS Absolute Maximum BTS Interrupt Threshold BTS Buffer Branch Record 0 8H Branch Record 1 CH PEBS Buffer Base 10H PEBS Index PEBS Absolute Maximum PEBS Interrupt Threshold PEBS Counter Reset Reserved 14H 18H 1CH Branch Record n 20H 24H PEBS Buffer 30H PEBS Record 0 PEBS Record 1 PEBS Record n Figure 18-38.
DEBUGGING AND PERFORMANCE MONITORING • PEBS counter reset value — A 40-bit value that the counter is to be reset to after state information has collected following counter overflow. This value allows state information to be collected after a preset number of events have been counted. Figures 18-39 shows the structure of a 12-byte branch record in the BTS buffer.
DEBUGGING AND PERFORMANCE MONITORING 31 0 EFLAGS 0H Linear IP 4H EAX 8H EBX CH ECX 10H EDX 14H ESI 18H EDI 1CH EBP 20H ESP 24H Figure 18-40. PEBS Record Format 18.18.5.1 DS Save Area and IA-32e Mode Operation When IA-32e mode is active (IA32_EFER.LMA = 1), the structure of the DS save area is shown in Figure 18-41. The organization of each field in IA-32e mode operation is similar to that of non-IA-32e mode operation. However, each field now stores a 64-bit address.
DEBUGGING AND PERFORMANCE MONITORING IA32_DS_AREA MSR DS Buffer Management Area BTS Buffer Base 0H BTS Index 8H BTS Absolute Maximum BTS Interrupt Threshold BTS Buffer Branch Record 0 10H Branch Record 1 18H PEBS Buffer Base 20H PEBS Index PEBS Absolute Maximum PEBS Interrupt Threshold PEBS Counter Reset Reserved 28H 30H 38H Branch Record n 40H 48H PEBS Buffer 50H PEBS Record 0 PEBS Record 1 PEBS Record n Figure 18-41.
DEBUGGING AND PERFORMANCE MONITORING 63 4 0 Last Branch From 0H Last Branch To 8H 10H Branch Predicted Figure 18-42. 64-bit Branch Trace Record Format 63 0 RFLAGS 0H RIP 8H RAX 10H RBX 18H RCX 20H RDX 28H RSI 30H RDI 38H RBP 40H RSP 48H R8 50H ... ... R15 88H Figure 18-43. 64-bit PEBS Record Format 18.18.6 Programming the Performance Counters for Non-Retirement Events The basic steps to program a performance counter and to count events include the following: 1.
DEBUGGING AND PERFORMANCE MONITORING 3. Match the CCCR Select value and ESCR name in Table A-9 to a value listed in Table 18-26; select a CCCR and performance counter. 4. Set up an ESCR for the specific event or events to be counted and the privilege levels at which the are to be counted. 5. Set up the CCCR for the performance counter by selecting the ESCR and the desired event filters. 6.
DEBUGGING AND PERFORMANCE MONITORING Table 18-27. Event Example (Contd.) Event Name Event Parameters Parameter Value Event Specific Notes Can Support PEBS Description P6: EMON_BR_INST_RETIRED No Requires Additional No MSRs for Tagging For Table A-9 and Table A-10, Appendix A, the name of the event is listed in the Event Name column and parameters that define the event and other information are listed in the Event Parameters column.
DEBUGGING AND PERFORMANCE MONITORING to be absolutely accurate and should be used as a relative guide for tuning. Known discrepancies are documented where applicable. The following procedure shows how to set up a performance counter for basic counting; that is, the counter is set up to count a specified event indefinitely, wrapping around whenever it reaches its maximum count. This procedure is continued through the following four sections.
DEBUGGING AND PERFORMANCE MONITORING events. The compare, complement, threshold, and edge fields control the filtering of counter increments by input value. If the compare flag is set, then a “greater than” or a “less than or equal to” comparison of the input value vs. a threshold value can be made. The complement flag selects “less than or equal to” (flag set) or “greater than” (flag clear). The threshold field selects a threshold value of from 0 to 15.
DEBUGGING AND PERFORMANCE MONITORING Processor Clock Output from Threshold Filter Counter Increments On Rising Edge (False-to-True) Figure 18-44. Effects of Edge Filtering 18.18.6.3 Starting Event Counting Event counting by a performance counter can be initiated in either of two ways. The typical way is to set the enable flag in the counter’s CCCR. Following the instruction to set the enable flag, event counting begins and continues until it is stopped (see Section 18.18.6.5, “Halting Event Counting”).
DEBUGGING AND PERFORMANCE MONITORING This setup procedure is continued in the next section, Section 18.18.6.5, “Halting Event Counting.” 18.18.6.5 Halting Event Counting After a performance counter has been started (enabled), it continues counting indefinitely. If the counter overflows (goes one count past its maximum count), it wraps around and continues counting. When the counter wraps around, it sets its OVF flag to indicate that the counter has overflowed.
DEBUGGING AND PERFORMANCE MONITORING Example 18-1. Counting Events Assume a scenario where counter X is set up to count 200 occurrences of event A; then counter Y is set up to count 400 occurrences of event B. Each counter is set up to count a specific event and overflow to the next counter. In the above example, counter X is preset for a count of -200 and counter Y for a count of -400; this setup causes the counters to overflow on the 200th and 400th counts respectively.
DEBUGGING AND PERFORMANCE MONITORING The extended cascading feature can be adapted to the sampling usage model for performance monitoring. However, it is known that performance counters do not generate PMI in cascade mode or extended cascade mode due to an erratum. This erratum applies to Pentium 4 and Intel Xeon processors with model encoding of 2. For Pentium 4 and Intel Xeon processors with model encoding of 0 and 1, the erratum applies to processors with stepping encoding greater than 09H.
DEBUGGING AND PERFORMANCE MONITORING 18.18.6.8 Generating an Interrupt on Overflow Any performance counter can be configured to generate a performance monitor interrupt (PMI) if the counter overflows. The PMI interrupt service routine can then collect information about the state of the processor or program when overflow occurred. This information can then be used with a tool like the Intel® VTune™ Performance Analyzer to analyze and tune program performance.
DEBUGGING AND PERFORMANCE MONITORING 18.18.7 At-Retirement Counting At-retirement counting provides a means counting only events that represent work committed to architectural state and ignoring work that was performed speculatively and later discarded. The Intel NetBurst microarchitecture used in the Pentium 4 and Intel Xeon processors performs many speculative activities in an attempt to increase effective processing speeds. One example of this speculative activity is branch prediction.
DEBUGGING AND PERFORMANCE MONITORING performance events are provided in the Intel Pentium 4 Processor Optimization Reference Manual (see Section 1.4, “Related Literature”). • Replay — To maximize performance for the common case, the Intel NetBurst microarchitecture aggressively schedules μops for execution before all the conditions for correct execution are guaranteed to be satisfied. In the event that all of these conditions are not satisfied, μops must be reissued.
DEBUGGING AND PERFORMANCE MONITORING Certain kinds of μops that cannot be tagged, including I/O, uncacheable and locked accesses, returns, and far transfers. Table A-10 lists the performance monitoring events that support at-retirement counting: specifically the Front_end_event, Execution_event, Replay_event, Inst_retired and Uops_retired events. The following sections describe the tagging mechanisms for using these events to tag μop and count tagged μops. 18.18.7.
DEBUGGING AND PERFORMANCE MONITORING The four separate tag bits allow the user to simultaneously but distinctly count up to four execution events at retirement. (This applies for non-precise event-based sampling. There are additional restrictions for PEBS as noted in Section 18.18.8.3, “Setting Up the PEBS Buffer.”) It is also possible to detect or count combinations of events by setting multiple tag value bits in the upstream ESCR or multiple mask bits in the downstream ESCR.
DEBUGGING AND PERFORMANCE MONITORING In processors based on Intel Core microarchitecture, a similar PEBS mechanism is also supported using IA32_PMC0 and IA32_PERFEVTSEL0 MSRs (See Section 18.15.4). 18.18.8.1 Detection of the Availability of the PEBS Facilities The DS feature flag (bit 21) returned by the CPUID instruction indicates (when set) the availability of the DS mechanism in the processor, which supports the PEBS (and BTS) facilities.
DEBUGGING AND PERFORMANCE MONITORING 18.18.8.5 Other DS Mechanism Implications The DS mechanism is not available in the SMM. It is disabled on transition to the SMM mode. Similarly the DS mechanism is disabled on the generation of a machine check exception and is cleared on processor RESET and INIT. The DS mechanism is available in real address mode. 18.18.9 Operating System Implications The DS mechanism can be used by the operating system as a debugging extension to facilitate failure analysis.
DEBUGGING AND PERFORMANCE MONITORING The sections below describe performance counters, event qualification by logical processor ID, and special purpose bits in ESCRs/CCCRs. They also describe MSR_PEBS_ENABLE, MSR_PEBS_MATRIX_VERT, and MSR_TC_PRECISE_EVENT. 18.19.1 ESCR MSRs Figure 18-45 shows the layout of an ESCR MSR in processors supporting Intel HyperThreading Technology.
DEBUGGING AND PERFORMANCE MONITORING • Tag enable, bit 4 — When set, enables tagging of μops to assist in at-retirement event counting; when clear, disables tagging. See Section 18.18.7, “AtRetirement Counting.” • Tag value field, bits 5 through 8 — Selects a tag value to associate with a μop to assist in at-retirement event counting. • Event mask field, bits 9 through 24 — Selects events to be counted from the event class selected with the event select field.
DEBUGGING AND PERFORMANCE MONITORING • Compare flag, bit 18 — When set, enables filtering of the event count; when clear, disables filtering. The filtering method is selected with the threshold, complement, and edge flags. Reserved 31 30 29 27 26 25 24 23 20 19 18 17 16 15 Threshold 13 12 11 ESCR Select 0 Reserved Reserved Enable Active Thread Compare Complement Edge FORCE_OVF OVF_PMI_T0 OVF_PMI_T1 Cascade OVF 63 32 Reserved Figure 18-46.
DEBUGGING AND PERFORMANCE MONITORING • FORCE_OVF flag, bit 25 — When set, forces a counter overflow on every counter increment; when clear, overflow only occurs when the counter actually overflows. • OVF_PMI_T0 flag, bit 26 — When set, causes a performance monitor interrupt (PMI) to be sent to logical processor 0 when the counter overflows occurs; when clear, disables PMI generation for logical processor 0. Note that the PMI is generate on the next event count after the counter has overflowed.
DEBUGGING AND PERFORMANCE MONITORING of the initial APIC ID. This allows for counting an event in any or all of the logical processors. However, not all the events have this logic processor specificity, or thread specificity. Here, each event falls into one of two categories: • Thread specific (TS) — The event can be qualified as occurring on a specific logical processor. • Thread independent (TI) — The event cannot be qualified as being associated with a specific logical processor.
DEBUGGING AND PERFORMANCE MONITORING Table 18-30.
DEBUGGING AND PERFORMANCE MONITORING There are several ways to count processor clock cycles to monitor performance. These are: • Non-halted clockticks — Measures clock cycles in which the specified logical processor is not halted and is not in any power-saving state. When Intel HyperThreading Technology is enabled, ticks can be measured on a per-logicalprocessor basis. There are also performance events on dual-core processors that measure clockticks per logical processor when the processor is not halted.
DEBUGGING AND PERFORMANCE MONITORING 2. Select an appropriate counter. 3. Enable counting in the CCCR for that counter by setting the enable bit. 18.20.2 Non-Sleep Clockticks Performance monitoring counters can be configured to count clockticks whenever the performance monitoring hardware is not powered-down. To count Non-sleep Clockticks with a performance-monitoring counter, do the following: 1. Select one of the 18 counters. 2. Select any of the ESCRs whose events the selected counter can count.
DEBUGGING AND PERFORMANCE MONITORING 18.20.3 Incrementing the Time-Stamp Counter The time-stamp counter increments when the clock signal on the system bus is active and when the sleep pin is not asserted. The counter value can be read with the RDTSC instruction. The time-stamp counter and the non-sleep clockticks count may not agree in all cases and for all processors. See Section 18.11, “Time-Stamp Counter,” for more information on counter operation. 18.20.
DEBUGGING AND PERFORMANCE MONITORING Specific Registers (MSRs)”. The maximum resolved bus ratio can be read from the following bit field: • If XE operation is disabled, the maximum resolved bus ratio can be read in MSR_PLATFORM_ID[12:8]. It corresponds to the maximum qualified frequency. • IF XE operation is enabled, the maximum resolved bus ratio is given in MSR_PERF_STAT[44:40], it corresponds to the maximum XE operation frequency configured by BIOS.
DEBUGGING AND PERFORMANCE MONITORING 63 12 11 8 7 6 5 43 2 1 0 SMM_FREEZE (R/O) PEBS_REC_FMT (R/O) PEBS_ARCH_REG (R/O) PEBS_TRAP (R/O) LBR_FMT (R/O) - 0: 32bit, 1: 64-bit LIP, 2: 64bit EIP Reserved Figure 18-47. Layout of IA32_PERF_CAPABILITIES MSR 18.22 PERFORMANCE MONITORING AND DUAL-CORE TECHNOLOGY The performance monitoring capability of dual-core processors duplicates the microarchitectural resources of a single-core processor implementation.
DEBUGGING AND PERFORMANCE MONITORING 6\VWHP %XV L%864 DQG L6134 UG /HYHO &DFKH RU ZD\ L)6% ,24 3URFHVVRU &RUH )URQW HQG ([HFXWLRQ 5HWLUHPHQW / / Figure 18-48. Block Diagram of 64-bit Intel Xeon Processor MP with 8-MByte L3 Additional performance monitoring capabilities and facilities unique to 64-bit Intel Xeon processor MP with an L3 cache are described in this section.
DEBUGGING AND PERFORMANCE MONITORING MSR_IFSB_IBUSQx, Addresses: 107CCH and 107CDH 63 60 59 58 57 56 55 49 48 46 45 Reserved 38 37 36 35 34 33 32 1 1 Saturate Fill_match Eviction_match L3_state_match Snoop_match Type_match T1_match T0_match 31 0 32 bit event count Figure 18-49. MSR_IFSB_IBUSQx, Addresses: 107CCH and 107CDH • ISNPQ event — This event detects the occurrence of microarchitectural conditions related to the iSNPQ unit. It provides two MSRs: MSR_IFSB_ISNPQ0 and MSR_IFSB_ISNPQ1.
DEBUGGING AND PERFORMANCE MONITORING MSR_IFSB_ISNPQx, Addresses: 107CEH and 107CFH 63 60 59 58 57 56 55 48 46 45 Reserved 39 38 37 36 35 34 33 32 Saturate L3_state_match Snoop_match Type_match Agent_match T1_match T0_match 31 0 32 bit event count Figure 18-50. MSR_IFSB_ISNPQx, Addresses: 107CEH and 107CFH • EFSB event — This event can detect the occurrence of micro-architectural conditions related to the iFSB unit or system bus. It provides two MSRs: MSR_EFSB_DRDY0 and MSR_EFSB_DRDY1.
DEBUGGING AND PERFORMANCE MONITORING MSR_EFSB_DRDYx, Addresses: 107D0H and 107D1H 63 60 59 58 57 56 55 50 49 48 Reserved 39 38 37 36 35 34 33 32 Saturate Other Own 31 0 32 bit event count Figure 18-51. MSR_EFSB_DRDYx, Addresses: 107D0H and 107D1H • IBUSQ Latency event — This event accumulates weighted cycle counts for latency measurement of transactions in the iBUSQ unit. The count is enabled by setting MSR_IFSB_CTRL6[bit 26] to 1; the count freezes after software sets MSR_IFSB_CTRL6[bit 26] to 0.
DEBUGGING AND PERFORMANCE MONITORING MSR_IFSB_CTL6 Address: 107D2H 63 59 0 57 Enable Reserved MSR_IFSB_CNTR7 Address: 107D3H 0 63 64 bit event count Figure 18-52. MSR_IFSB_CTL6, Address: 107D2H; MSR_IFSB_CNTR7, Address: 107D3H 18.24 PERFORMANCE MONITORING ON L3 AND CACHING BUS CONTROLLER SUB-SYSTEMS The Intel Xeon processor 7400 series and Dual-Core Intel Xeon processor 7100 series employ a distinct L3/caching bus controller sub-system.
DEBUGGING AND PERFORMANCE MONITORING Figure 18-53 for the block configuration of six processor cores and the L3/Caching bus controller sub-system in Intel Xeon processor 7400 series. Figure 18-53 shows the block configuration of two processor cores (four logical processors) and the L3/Caching bus controller sub-system in Intel Xeon processor 7100 series. FSB GBSQ, GSNPQ, GINTQ, ... L3 SDI SDI interface L2 Core SDI interface L2 Core Core SDI interface L2 Core Core Core Figure 18-53.
DEBUGGING AND PERFORMANCE MONITORING FSB GBSQ, GSNPQ, GINTQ, ... L3 SDI SDI interface SDI interface Processor core Processor core Logical processor Logical processor Logical processor Logical processor Figure 18-54. Block Diagram of Intel Xeon Processor 7100 Series 18.24.1 Overview of Performance Monitoring with L3/Caching Bus Controller The facility for monitoring events consists of a set of dedicated model-specific registers (MSRs).
DEBUGGING AND PERFORMANCE MONITORING • Four MSRs (MSR_EMON_L3_CTR_CTL4, MSR_EMON_L3_CTR_CTL5, MSR_EMON_L3_CTR_CTL6, and MSR_EMON_L3_CTR_CTL7) are dedicated to counting external bus operations. The bit fields in each of eight MSRs share the following common characteristics: • Bits 63:32 is the event control field that includes an event mask and other bit fields that control counter operation.
DEBUGGING AND PERFORMANCE MONITORING MSR_EMON_L3_CTR_CTL0/1, Addresses: 107CCH/107CDH 63 60 59 58 57 56 55 54 53 47 46 44 43 Reserved 38 37 36 35 32 Saturate Cross_snoop Fill_eviction Core_module_select L3_state Snoop_match Type_match Data_flow Agent_select 31 0 32 bit event count Figure 18-55. MSR_EMON_L3_CTR_CTL0/1, Addresses: 107CCH/107CDH • Data_Flow (bits 37:36): Bit 36 specifies demand transactions, bit 37 specifies prefetch transactions.
DEBUGGING AND PERFORMANCE MONITORING — 01B: Match transactions from this dual-core module only — 10B: Match transactions from either one of the other two dual-core modules in the physical package — 11B: Match transaction from more than one dual-core modules in the physical package • Fill_Eviction (bits 57:56): The valid encodings are — 00B: Match any transactions — 01B: Match transactions that fill L3 — 10B: Match transactions that fill L3 without an eviction — 11B: Match transaction fill L3 with an evic
DEBUGGING AND PERFORMANCE MONITORING the lower two bits (bit 55, 54) differ slightly between Intel Xeon processor 7100 and 7400.
DEBUGGING AND PERFORMANCE MONITORING MSR_EMON_L3_CTR_CTL2/3, Addresses: 107CEH/107CFH 63 60 59 58 57 56 55 54 53 47 46 44 43 Reserved 39 38 37 36 32 Saturate Block_snoop Core_select L2_state Snoop_match Type_match Agent_match 31 0 32 bit event count Figure 18-56. MSR_EMON_L3_CTR_CTL2/3, Addresses: 107CEH/107CFH 18.24.4 FSB Event Interface The layout of MSR_EMON_L3_CTR_CTL4 through MSR_EMON_L3_CTR_CTL7 is given in Figure 18-57.
DEBUGGING AND PERFORMANCE MONITORING MSR_EMON_L3_CTR_CTL4/5/6/7, Addresses: 107D0H-107D3H 63 60 59 58 57 56 55 50 49 48 Reserved 39 38 37 36 35 34 33 32 1 Saturate FSB submask 31 0 32 bit event count Figure 18-57. MSR_EMON_L3_CTR_CTL4/5/6/7, Addresses: 107D0H-107D3H 18.24.4.
DEBUGGING AND PERFORMANCE MONITORING • • FSB_WW_data (bit 50): Counts back-to-back write transaction’s data phase. • FSB_WR_issue (bit 52): Counts back-to-back write-read transaction request pairs issued by this processor. • FSB_RW_issue (bit 53): Counts back-to-back read-write transaction request pairs issued by this processor.
DEBUGGING AND PERFORMANCE MONITORING measure duration. When counting events, a counter increments each time a specified event takes place or a specified number of events takes place. When measuring duration, it counts the number of processor clocks that occur while a specified condition is true. The counters can count events or measure durations that occur at any privilege level. Table A-18, Appendix A, lists the events that can be counted with the P6 family performance monitoring counters.
DEBUGGING AND PERFORMANCE MONITORING 31 24 23 22 21 20 19 18 17 16 15 Counter Mask (CMASK) I N E V N 0 8 7 I U N P E O S Unit Mask (UMASK) S R T C Event Select INV—Invert counter mask EN—Enable counters* INT—APIC interrupt enable PC—Pin control E—Edge detect OS—Operating system mode USR—User Mode * Only available in PerfEvtSel0. Reserved Figure 18-58.
DEBUGGING AND PERFORMANCE MONITORING • Counter mask (CMASK) field (bits 24 through 31) — When nonzero, the processor compares this mask to the number of events counted during a single cycle. If the event count is greater than or equal to this mask, the counter is incremented by one. Otherwise the counter is not incremented. This mask can be used to count events only if multiple occurrences happen per clock (for example, two or more instructions retired per clock).
DEBUGGING AND PERFORMANCE MONITORING 18.25.4 Event and Time-Stamp Monitoring Software To use the performance-monitoring counters and time-stamp counter, the operating system needs to provide an event-monitoring device driver.
DEBUGGING AND PERFORMANCE MONITORING When interrupted by a counter overflow, the interrupt handler needs to perform the following actions: • Save the instruction pointer (EIP register), code-segment selector, TSS segment selector, counter values and other relevant information at the time of the interrupt. • Reset the counter to its initial setting and return from the interrupt.
DEBUGGING AND PERFORMANCE MONITORING 31 26 25 24 P C 1 CC1 22 21 16 15 10 9 8 ES1 P C 0 6 5 CC0 0 ESO PC1—Pin control 1 CC1—Counter control 1 ES1—Event select 1 PC0—Pin control 0 CC0—Counter control 0 ES0—Event select 0 Reserved Figure 18-59. CESR MSR (Pentium Processor Only) • CC0 and CC1 (counter control) fields (bits 6-8, bits 22-24) — Controls the operation of the counter.
DEBUGGING AND PERFORMANCE MONITORING be read, the appropriate bits modified, and all bits must then be written back to CESR. At reset, all bits in the CESR register are cleared. 18.26.2 Use of the Performance-Monitoring Pins When performance-monitor pins PM0/BP0 and/or PM1/BP1 are configured to indicate when the performance-monitor counter has incremented and an “occurrence event” is being counted, the associated pin is asserted (high) each time the event occurs.
CHAPTER 19 INTRODUCTION TO VIRTUAL-MACHINE EXTENSIONS 19.1 OVERVIEW This chapter describes the basics of virtual machine architecture and an overview of the virtual-machine extensions (VMX) that support virtualization of processor hardware for multiple software environments. Information about VMX instructions is provided in Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B.
INTRODUCTION TO VIRTUAL-MACHINE EXTENSIONS Processor behavior in VMX root operation is very much as it is outside VMX operation. The principal differences are that a set of new instructions (the VMX instructions) is available and that the values that can be loaded into certain control registers are limited (see Section 19.8). Processor behavior in VMX non-root operation is restricted and modified to facilitate virtualization.
INTRODUCTION TO VIRTUAL-MACHINE EXTENSIONS Guest 0 VM Exit VMXON Guest 1 VM Entry VM Monitor VM Exit VMXOFF Figure 19-1. Interaction of a Virtual-Machine Monitor and Guests 19.5 VIRTUAL-MACHINE CONTROL STRUCTURE VMX non-root operation and VMX transitions are controlled by a data structure called a virtual-machine control structure (VMCS). Access to the VMCS is managed through a component of processor state called the VMCS pointer (one per logical processor).
INTRODUCTION TO VIRTUAL-MACHINE EXTENSIONS 19.7 ENABLING AND ENTERING VMX OPERATION Before system software can enter VMX operation, it enables VMX by setting CR4.VMXE[bit 13] = 1. VMX operation is then entered by executing the VMXON instruction. VMXON causes an invalid-opcode exception (#UD) if executed with CR4.VMXE = 0. Once in VMX operation, it is not possible to clear CR4.VMXE (see Section 19.8). System software leaves VMX operation by executing the VMXOFF instruction. CR4.
INTRODUCTION TO VIRTUAL-MACHINE EXTENSIONS is provided in an operand to VMXON. Section 20.10.4, “VMXON Region,” details how software should initialize and access the VMXON region. 19.8 RESTRICTIONS ON VMX OPERATION VMX operation places restrictions on processor operation. These are detailed below: • In VMX operation, processors may fix certain bits in CR0 and CR4 to specific values and not support other values.
INTRODUCTION TO VIRTUAL-MACHINE EXTENSIONS 19-6 Vol.
CHAPTER 20 VIRTUAL-MACHINE CONTROL STRUCTURES 20.1 OVERVIEW The virtual-machine control data structure (VMCS) is defined for VMX operation. A VMCS manages transitions in and out of VMX non-root operation (VM entries and VM exits) as well as processor behavior in VMX non-root operation. This structure is manipulated by the new instructions VMCLEAR, VMPTRLD, VMREAD, and VMWRITE. A VMM can use a different VMCS for each virtual machine that it supports.
VIRTUAL-MACHINE CONTROL STRUCTURES pointer into a specified memory location (it stores the value FFFFFFFF_FFFFFFFFH if there is no current VMCS). A VMCS remains current until either software executes VMPTRLD with the address of a different VMCS (which then becomes the current VMCS) or software executes VMCLEAR with the address of the current VMCS (after which there is no current VMCS). This document frequently uses the term “the VMCS” to refer to the current VMCS. 20.
VIRTUAL-MACHINE CONTROL STRUCTURES Section 20.9. To ensure proper behavior in VMX operation, software should maintain the VMCS region and related structures (enumerated in Section 20.10.3) in writeback cacheable memory. Future implementations may allow or require a different memory type1. Software should consult the VMX capability MSR IA32_VMX_BASIC (see Appendix G.1). 20.3 ORGANIZATION OF VMCS DATA The VMCS data are organized into six logical groups: • Guest-state area.
VIRTUAL-MACHINE CONTROL STRUCTURES • RSP, RIP, and RFLAGS (64 bits each; 32 bits on processors that do not support Intel 64 architecture).1 • The following fields for each of the registers CS, SS, DS, ES, FS, GS, LDTR, and TR: — Selector (16 bits). — Base address (64 bits; 32 bits on processors that do not support Intel 64 architecture).
VIRTUAL-MACHINE CONTROL STRUCTURES Table 20-2. Format of Access Rights (Contd.
VIRTUAL-MACHINE CONTROL STRUCTURES — IA32_PAT (64 bits). This field is supported only on logical processors that support either the 1-setting of the “load IA32_PAT” VM-entry control or that of the “save IA32_PAT” VM-exit control. — IA32_EFER (64 bits). This field is supported only on logical processors that support either the 1-setting of the “load IA32_EFER” VM-entry control or that of the “save IA32_EFER” VM-exit control. • The register SMBASE (32 bits).
VIRTUAL-MACHINE CONTROL STRUCTURES Table 20-3. Format of Interruptibility State Bit Position(s) Bit Name Notes 0 Blocking by STI See the “STI—Set Interrupt Flag” section in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B. Execution of STI with RFLAGS.IF = 0 blocks interrupts (and, optionally, other events) for one instruction after its execution. Setting this bit indicates that this blocking is in effect.
VIRTUAL-MACHINE CONTROL STRUCTURES exceptions without immediately delivering them.1 This field contains information about such exceptions. This field is described in Table 20-4. Table 20-4. Format of Pending-Debug-Exceptions Bit Position(s) Bit Name Notes 3:0 B3 – B0 When set, each of these bits indicates that the corresponding breakpoint condition was met. Any of these bits may be set even if the corresponding enabling bit in DR7 is not set. 11:4 Reserved VM entry fails if these bits are not 0.
VIRTUAL-MACHINE CONTROL STRUCTURES Manual, Volume 3A). They are used only if the “enable EPT” VM-execution control is 1. 20.5 HOST-STATE AREA This section describes fields contained in the host-state area of the VMCS. As noted earlier, processor state is loaded from these fields on every VM exit (see Section 23.5). All fields in the host-state area correspond to processor registers: • CR0, CR3, and CR4 (64 bits each; 32 bits on processors that do not support Intel 64 architecture).
VIRTUAL-MACHINE CONTROL STRUCTURES 20.6.1 Pin-Based VM-Execution Controls The pin-based VM-execution controls constitute a 32-bit vector that governs the handling of asynchronous events (for example: interrupts).1 Table 20-5 lists the controls supported. See Chapter 21 for how these controls affect processor behavior in VMX non-root operation. Table 20-5.
VIRTUAL-MACHINE CONTROL STRUCTURES 20.6.2 Processor-Based VM-Execution Controls The processor-based VM-execution controls constitute two 32-bit vectors that govern the handling of synchronous events, mainly those caused by the execution of specific instructions.1 These are the primary processor-based VM-execution controls and the secondary processor-based VM-execution controls. Table 20-6 lists the primary processor-based VM-execution controls.
VIRTUAL-MACHINE CONTROL STRUCTURES Table 20-6. Definitions of Primary Processor-Based VM-Execution Controls (Contd.) Bit Position(s) Name Description 19 This control determines whether executions of MOV to CR8 cause VM exits. CR8-load exiting This control must be 0 on processors that do not support Intel 64 architecture. 20 CR8-store exiting This control determines whether executions of MOV from CR8 cause VM exits. This control must be 0 on processors that do not support Intel 64 architecture.
VIRTUAL-MACHINE CONTROL STRUCTURES Table 20-6. Definitions of Primary Processor-Based VM-Execution Controls (Contd.) Bit Position(s) Name Description 28 This control determines whether MSR bitmaps are used to control execution of the RDMSR and WRMSR instructions (see Section 20.6.9 and Section 21.1.3). Use MSR bitmaps For this control, “0” means “do not use MSR bitmaps” and “1” means “use MSR bitmaps.” If the MSR bitmaps are not used, all executions of the RDMSR and WRMSR instructions cause VM exits.
VIRTUAL-MACHINE CONTROL STRUCTURES Table 20-7 lists the secondary processor-based VM-execution controls. See Chapter 21 for more details of how these controls affect processor behavior in VMX non-root operation. Table 20-7. Definitions of Secondary Processor-Based VM-Execution Controls Bit Position(s) Name Description 0 Virtualize APIC accesses If this control is 1, a VM exit occurs on any attempt to access data on the page with the APIC-access address. See Section 21.2.
VIRTUAL-MACHINE CONTROL STRUCTURES and two 32-bit fields in the VMCS (the page-fault error-code mask and pagefault error-code match). See Section 21.3 for details. 20.6.4 I/O-Bitmap Addresses The VM-execution control fields include the 64-bit physical addresses of I/O bitmaps A and B (each of which are 4 KBytes in size). I/O bitmap A contains one bit for each I/O port in the range 0000H through 7FFFH; I/O bitmap B contains bits for ports in the range 8000H through FFFFH.
VIRTUAL-MACHINE CONTROL STRUCTURES 20.6.7 CR3-Target Controls The VM-execution control fields include a set of 4 CR3-target values and a CR3target count. The CR3-target values each have 64 bits on processors that support Intel 64 architecture and 32 bits on processors that do not. The CR3-target count has 32 bits on all processors. An execution of MOV to CR3 in VMX non-root operation does not cause a VM exit if its source operand matches one of these values.
VIRTUAL-MACHINE CONTROL STRUCTURES • Virtual-APIC address (64 bits). This field is the physical address of the 4-KByte virtual-APIC page. If the “use TPR shadow” VM-execution control is 1, the virtual-APIC address must be 4-KByte aligned. The virtual-APIC page is accessed by the following operations if the “use TPR shadow” VM-execution control is 1: — The MOV CR8 instructions (see Section 21.1.3 and Section 21.4).
VIRTUAL-MACHINE CONTROL STRUCTURES A logical processor uses these bitmaps if and only if the “use MSR bitmaps” control is 1. If the bitmaps are used, an execution of RDMSR or WRMSR causes a VM exit if the value of RCX is in neither of the ranges covered by the bitmaps or if the appropriate bit in the MSR bitmaps (corresponding to the instruction and the RCX value) is 1. See Section 21.1.3 for details. If the bitmaps are used, their address must be 4KByte aligned. 20.6.
VIRTUAL-MACHINE CONTROL STRUCTURES The EPTP exists only on processors that support the 1-setting of the “enable EPT” VM-execution control. 20.6.12 Virtual-Processor Identifier (VPID) The virtual-processor identifier (VPID) is a 16-bit field. It exists only on processors that support the 1-setting of the “enable VPID” VM-execution control. See Chapter 24.1 for details regarding the use of this field. 20.7 VM-EXIT CONTROL FIELDS The VM-exit control fields govern the behavior of VM exits.
VIRTUAL-MACHINE CONTROL STRUCTURES Table 20-9. Definitions of VM-Exit Controls (Contd.) Bit Position(s) Name Description 15 Acknowledge interrupt on exit This control affects VM exits due to external interrupts: 18 Save IA32_PAT This control determines whether the IA32_PAT MSR is saved on VM exit. 19 Load IA32_PAT This control determines whether the IA32_PAT MSR is loaded on VM exit. 20 Save IA32_EFER This control determines whether the IA32_EFER MSR is saved on VM exit.
VIRTUAL-MACHINE CONTROL STRUCTURES 20.7.2 VM-Exit Controls for MSRs A VMM may specify lists of MSRs to be stored and loaded on VM exits. The following VM-exit control fields determine how MSRs are stored on VM exits: • VM-exit MSR-store count (32 bits). This field specifies the number of MSRs to be stored on VM exit. It is recommended that this count not exceed 512 bytes.1 Otherwise, unpredictable processor behavior (including a machine check) may result during VM exit.
VIRTUAL-MACHINE CONTROL STRUCTURES 20.8 VM-ENTRY CONTROL FIELDS The VM-entry control fields govern the behavior of VM entries. They are discussed in Sections 20.8.1 through 20.8.3. 20.8.1 VM-Entry Controls The VM-entry controls constitute a 32-bit vector that governs the basic operation of VM entries. Table 20-11 lists the controls supported. See Chapter 22 for how these controls affect VM entries. Table 20-11.
VIRTUAL-MACHINE CONTROL STRUCTURES All other bits in this field are reserved, some to 0 and some to 1. Software should consult the VMX capability MSRs IA32_VMX_ENTRY_CTLS and IA32_VMX_TRUE_ENTRY_CTLS (see Appendix G.5) to determine how it should set the reserved bits. Failure to set reserved bits properly causes subsequent VM entries to fail (see Section 22.2). Note that the first processors to support the virtual-machine extensions supported only the 1-settings of bits 0–8 and 12.
VIRTUAL-MACHINE CONTROL STRUCTURES Table 20-12. Format of the VM-Entry Interruption-Information Field (Contd.
VIRTUAL-MACHINE CONTROL STRUCTURES 20.9 VM-EXIT INFORMATION FIELDS The VMCS contains a section of read-only fields that contain information about the most recent VM exit. Attempts to write to these fields with VMWRITE fail (see “VMWRITE—Write Field to Virtual-Machine Control Structure” in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B). 20.9.
VIRTUAL-MACHINE CONTROL STRUCTURES SGDT; SIDT; SLDT; STR; VMCLEAR; VMPTRLD; VMPTRST; VMREAD; VMWRITE; VMXON; control-register accesses; MOV DR; I/O instructions; and MWAIT. The format of the field depends on the cause of the VM exit. See Section 23.2.1 for details. • Guest-linear address (64 bits; 32 bits on processors that do not support Intel 64 architecture). This field is used in the following cases: — VM exits due to attempts to execute LMSW with a memory operand.
VIRTUAL-MACHINE CONTROL STRUCTURES Table 20-14. Format of the VM-Exit Interruption-Information Field (Contd.) Bit Position(s) Content 30:13 Reserved (cleared to 0) 31 Valid • VM-exit interruption error code (32 bits). For VM exits caused by hardware exceptions that would have delivered an error code on the stack, this field receives that error code. Section 23.2.2 provides details of how these fields are saved on VM exits. 20.9.
VIRTUAL-MACHINE CONTROL STRUCTURES • IDT-vectoring error code (32 bits). For VM exits the occur during delivery of hardware exceptions that would have delivered an error code on the stack, this field receives that error code. See Section 23.2.3 provides details of how these fields are saved on VM exits. 20.9.
VIRTUAL-MACHINE CONTROL STRUCTURES 20.10 SOFTWARE ACCESS TO THE VMCS AND RELATED STRUCTURES This section details guidelines that software should observe when accessing a VMCS and related structures. It also provides descriptions of consequences for failing to follow guidelines. 20.10.1 Software Access to the Virtual-Machine Control Structure To ensure proper processor behavior, software should observe certain guidelines when accessing an active VMCS.
VIRTUAL-MACHINE CONTROL STRUCTURES 20.10.2 VMREAD, VMWRITE, and Encodings of VMCS Fields Every field of the VMCS is associated with a 32-bit value that is its encoding. The encoding is provided in an operand to VMREAD and VMWRITE when software wishes to read or write that field. These instructions fail if given, in 64-bit mode, an operand that sets an encoding bit beyond bit 32.
VIRTUAL-MACHINE CONTROL STRUCTURES Fields whose encodings use value 1 are specially treated to allow 32-bit software access to all 64 bits of the field. Such access is allowed by defining, for each such field, an encoding that allows direct access to the high 32 bits of the field. See below. • Field type. Bits 11:10 encode the type of VMCS field: control, guest-state, hoststate, or read-only data. The last category includes the VM-exit information fields and the VM-instruction error field. • • Index.
VIRTUAL-MACHINE CONTROL STRUCTURES — A VMREAD returns the value of bits 63:32 of the field in bits 31:0 of the destination operand; in 64-bit mode, bits 63:32 of the destination operand are cleared to 0. — A VMWRITE writes the value of bits 31:0 of the source operand to bits 63:32 of the field; in 64-bit mode, bits 63:32 of the source operand are not used.
VIRTUAL-MACHINE CONTROL STRUCTURES Before executing VMXON, software should write the VMCS revision identifier (see Section 20.2) to the VMXON region. It need not initialize the VMXON region in any other way. Software should use a separate region for each logical processor and should not access or modify the VMXON region of a logical processor between execution of VMXON and VMXOFF on that logical processor. Doing otherwise may lead to unpredictable behavior (including behaviors identified in Section 20.10.
VIRTUAL-MACHINE CONTROL STRUCTURES 20-34 Vol.
CHAPTER 21 VMX NON-ROOT OPERATION In a virtualized environment using VMX, the guest software stack typically runs on a logical processor in VMX non-root operation. This mode of operation is similar to that of ordinary processor operation outside of the virtualized environment. This chapter describes the differences between VMX non-root operation and ordinary processor operation with special attention to causes of VM exits (which bring a logical processor from VMX non-root operation to root operation).
VMX NON-ROOT OPERATION • Certain exceptions have priority over VM exits. These include invalid-opcode exceptions, faults based on privilege level, and general-protection exceptions that are based on checking I/O permission bits in the task-state segment (TSS). For example, execution of RDMSR with CPL = 3 generates a general-protection exception and not a VM exit.
VMX NON-ROOT OPERATION 21.1.3 Instructions That Cause VM Exits Conditionally Certain instructions cause VM exits in VMX non-root operation depending on the setting of the VM-execution controls. The following instructions can cause “fault-like” VM exits based on the conditions described: • CLTS. The CLTS instruction causes a VM exit if the bits in position 3 (corresponding to CR0.TS) are set in both the CR0 guest/host mask and the CR0 read shadow. • HLT.
VMX NON-ROOT OPERATION • MONITOR. The MONITOR instruction causes a VM exit if the “MONITOR exiting” VM-execution control is 1. • MOV from CR3. The MOV from CR3 instruction causes a VM exit if the “CR3store exiting” VM-execution control is 1. Note that the first processors to support the virtual-machine extensions supported only the 1-setting of this control. • MOV from CR8.
VMX NON-ROOT OPERATION — The value of ECX is not in the range 00000000H – 00001FFFH or C0000000H – C0001FFFH. — The value of ECX is in the range 00000000H – 00001FFFH and bit n in read bitmap for low MSRs is 1, where n is the value of ECX. — The value of ECX is in the range C0000000H – C0001FFFH and bit n in read bitmap for high MSRs is 1, where n is the value of ECX & 00001FFFH. See Section 20.6.9 for details regarding how these bitmaps are identified. • RDPMC.
VMX NON-ROOT OPERATION updated by the instruction (for example, the value of CS:RIP saved in the guest-state area of the VMCS references the next instruction). Specifically, a trap-like VM exit occurs following either instruction if the execution reduces the value of the TPR shadow below that of the TPR threshold VM-execution control field (see Section 20.6.8 and Section 21.4) and the following hold: • For MOV to CR8: — The “CR8-load exiting” VM-execution control is 0.
VMX NON-ROOT OPERATION In general, the treatment of APIC-access VM exits caused by linear accesses is similar to that of page faults and EPT violations. Based upon this treatment, Section 21.2.1.2 specifies the priority of such VM exits with respect to other events, while Section 21.2.1.3 discusses instructions that may cause page faults without accessing memory and the treatment when they access the APIC-access page. 21.2.1.
VMX NON-ROOT OPERATION • At time t1, the processor was in VMX non-root operation with non-zero VPID X, and some linear address Y translated to an address that was not on the APICaccess page at that time. (This might be because the “virtualize APIC accesses” VM-execution control was 0.) • At later time t2, the processor was again in VMX non-root operation with VPID X, and a memory access uses linear address, which now translates to an address on the APIC-access page.
VMX NON-ROOT OPERATION These principles imply among other things, that an APIC-access VM exit may occur during the execution of a repeated string instruction (including INS and OUTS). Suppose, for example, that the first n iterations (n may be 0) of such an instruction do not access the APIC-access page and that the next iteration does access that page. As a result, the first n iterations may complete and be followed by an APICaccess VM exit.
VMX NON-ROOT OPERATION 21.2.2 Guest-Physical Accesses to the APIC-Access Page An access to the APIC-access page is called a guest-physical access if (1) guestphysical addresses are being translated using EPT (see Chapter 24); (2) the access’s physical address is the result of an EPT translation; and (3) either (a) the access was not generated by a linear address; or (b) the access’s guest-physical address is not the translation of the access’s linear address.
VMX NON-ROOT OPERATION implies that the “virtualize APIC accesses” VM-execution control is 1 at this time.) — Software did not execute the INVEPT instruction, either with the all-context INVEPT type or with the single-context INVEPT type and X as the INVEPT descriptor, between times t1 and t2. In any of the above cases, the guest-physical access at time t2 might or might not an APIC-access VM exit. If it does not, the access operates on memory on the APICaccess page.
VMX NON-ROOT OPERATION — Updates to the accessed and dirty bits in the paging structures. • If the “enable EPT” VM-execution control is 1, accesses to the EPT paging structures. • Any of the following accesses made by the processor to support VMX non-root operation: — Accesses to the VMCS region. — Accesses to data structures referenced (directly or indirectly) by physical addresses in VM-execution control fields in the VMCS. These include the I/O bitmaps, the MSR bitmaps, and the virtual-APIC page.
VMX NON-ROOT OPERATION APIC-access page is not a VTPR access, even if the “use TPR shadow” VM-execution control is 1. In general, VTPR accesses do not cause APIC-access VM exits. Instead, they are treated as described in Section 21.5.3. Physical VTPR accesses (see Section 21.2.3) may or may not cause APIC-access VM exits; see Section 21.5.2. 21.3 OTHER CAUSES OF VM EXITS In addition to VM exits caused by instruction execution, the following events can cause VM exits: • Exceptions.
VMX NON-ROOT OPERATION the IDT. (If a logical processor is in the wait-for-SIPI state, NMIs are blocked. The NMI is not delivered through the IDT and no VM exit occurs.) • INIT signals. INIT signals cause VM exits. A logical processor performs none of the operations normally associated with these events. Such exits do not modify register state or clear pending events as they would outside of VMX operation. (If a logical processor is in the wait-for-SIPI state, INIT signals are blocked.
VMX NON-ROOT OPERATION Non-maskable interrupts (NMIs) and higher priority events take priority over VM exits caused by this control. VM exits caused by this control take priority over external interrupts and lower priority events. These VM exits wake a logical processor from the same inactive states as would an external interrupt. Specifically, they wake a logical processor from the states entered using the HLT and MWAIT instructions.
VMX NON-ROOT OPERATION — If the “NMI exiting” VM-execution control is 0, IRET operates normally and unblocks NMIs. — If the “NMI exiting” VM-execution control is 1, IRET does not affect blocking of NMIs. — If the “virtual NMIs” VM-execution control is 1, the logical processor tracks virtual-NMI blocking. In this case, IRET removes any virtual-NMI blocking. If the “NMI exiting” VM-execution control is 0, the “virtual NMIs” control must be 0. (See Section 22.2.1.1.) • LMSW.
VMX NON-ROOT OPERATION — If the “CR8-store exiting” VM-execution control is 0 and the “use TPR shadow” VM-execution control is 1, MOV from CR8 reads from the TPR shadow. Specifically, it loads bits 3:0 of its destination operand with the value of bits 7:4 of byte 80H of the virtual-APIC page (see Section 20.6.8). Bits 63:4 of the destination operand are cleared. — If the “CR8-store exiting” VM-execution control is 1, MOV from CR8 causes a VM exit (see Section 21.1.
VMX NON-ROOT OPERATION — If the “CR8-load exiting” VM-execution control is 1, MOV to CR8 causes a VM exit (see Section 21.1.3); the “use TPR shadow” VM-execution control is ignored in this case. • RDMSR. Section 21.1.3 identifies when executions of the RDMSR instruction cause VM exits.
VMX NON-ROOT OPERATION — If the “enable RDTSCP” VM-execution control is 0, RDTSCP causes an invalidopcode exception (#UD). — If the “enable RDTSCP” VM-execution control is 1, treatment is based on the settings the “RDTSC exiting” and “use TSC offsetting” VM-execution controls as well as the TSC offset: • • • • If both controls are 0, RDTSCP operates normally.
VMX NON-ROOT OPERATION processor is not in VMX non-root operation). Otherwise, instruction behavior is determined by the setting of the “virtualize x2APIC mode” VM-execution control and the value of the TPR-threshold VM-execution control field: • If the control is 0, the instruction operates normally. If the local APIC is in x2APIC mode, the value of EAX[7:0] is written to the APIC’s task-priority register. If the local APIC is not in x2APIC mode, a general-protection fault occurs.
VMX NON-ROOT OPERATION causes an APIC-access VM exit). Section 21.5.3 describes the treatment if there is no APIC-access VM exit and the access is a VTPR access. 21.5.3 VTPR Accesses As noted in Section 21.2.4, a memory access is a VTPR access if all of the following hold: (1) the “use TPR shadow” VM-execution control is 1; (2) the access is not for an instruction fetch; (3) the access is at most 32 bits in width; and (4) the access is to offset 80H on the APIC-access page.
VMX NON-ROOT OPERATION — A VTPR access using the CLFLUSH instruction flushes data for offset 80H on the virtual-APIC page. — A VTPR access using the LMSW instruction may cause a VM exit due to the CR0 guest/host mask and the CR0 read shadow. — A VTPR access using the MONITOR instruction causes the logical processor to monitor offset 80H on the virtual-APIC page. — A VTPR access using the PREFETCH instruction may prefetch data; if so, it is from offset 80H on the virtual-APIC page. • VTPR write accesses.
VMX NON-ROOT OPERATION • Suppose that the first iteration of a repeated string instruction (including OUTS) that accesses the APIC-access page performs a VTPR read access and that the next iteration would read from the APIC-access page using an offset other than 80H. The following items describe the behavior of the logical processor: — The iteration that performs the VTPR read access completes successfully, reading data from offset 80H on the virtual-APIC page.
VMX NON-ROOT OPERATION 2. If the value of bits 3:0 of the TPR threshold VM-execution control field is greater than the value of bits 7:4 at offset 80H on the virtual-APIC page, a VM exit will occur. TPR-shadow updates take priority over system-management interrupts (SMIs), INIT signals, and lower priority events. A TPR-shadow update thus has priority over any debug exceptions that may have been triggered by the operation causing the TPRshadow update.
VMX NON-ROOT OPERATION privilege levels are not checked on the referenced task-state segment (TSS) descriptor. c. If CALL or JMP accesses a TSS descriptor directly in IA-32e mode, a generalprotection exception occurs. d. If CALL or JMP accesses a TSS descriptor directly outside IA-32e mode, privilege levels are checked on the TSS descriptor. e. If a non-maskable interrupt (NMI), an exception, or an external interrupt accesses a task gate in the IDT in IA-32e mode, a general-protection exception occurs.
VMX NON-ROOT OPERATION 21.7 FEATURES SPECIFIC TO VMX NON-ROOT OPERATION Some VM-execution controls cause VM exits using features that are specific to VMX non-root operation. These are the VMX-preemption timer (Section 21.7.1) and the monitor trap flag (Section 21.7.2). 21.7.1 VMX-Preemption Timer If the last VM entry was performed with the 1-setting of “activate VMX-preemption timer” VM-execution control, the VMX-preemption timer counts down (from the value loaded by VM entry; see Section 22.6.
VMX NON-ROOT OPERATION MTF VM exits. An MTF VM exit may occur on an instruction boundary in VMX nonroot operation as follows: • If the “monitor trap flag” VM-execution control is 1 and VM entry is injecting a vectored event (see Section 22.5.1), an MTF VM exit is pending on the instruction boundary before the first instruction following the VM entry. • If VM entry is injecting a pending MTF VM exit (see Section 22.5.
VMX NON-ROOT OPERATION • System-management interrupts (SMIs), INIT signals, and higher priority events take priority over MTF VM exits. MTF VM exits take priority over debug-trap exceptions and lower priority events. • No MTF VM exit occurs if the processor is in either the shutdown activity state or wait-for-SIPI activity state.
CHAPTER 22 VM ENTRIES Software can enter VMX non-root operation using either of the VM-entry instructions VMLAUNCH and VMRESUME. VMLAUNCH can be used only with a VMCS whose launch state is clear and VMRESUME can be used only with a VMCS whose the launch state is launched. VMLAUNCH should be used for the first VM entry after VMCLEAR; VMRESUME should be used for subsequent VM entries with the same VMCS. Each VM entry performs the following steps in the order indicated: 1.
VM ENTRIES of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B for the error numbers. • The checks in Section 22.3 and Section 22.4 cause processor state to be loaded from the host-state area of the VMCS (as would be done on a VM exit). Information about the failure is stored in the VM-exit information fields. See Section 22.7 for details. EFLAGS.TF = 1 causes a VM-entry instruction to generate a single-step debug exception only if failure of one of the checks in Section 22.
VM ENTRIES VM-instruction error field. See Chapter 5 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B for the error numbers. 22.2 CHECKS ON VMX CONTROLS AND HOST-STATE AREA If the checks in Section 22.
VM ENTRIES controls must be set properly. Software may consult the VMX capability MSRs to determine the proper settings (see Appendix G.3.3). If the “activate secondary controls” primary processor-based VM-execution control is 0 (or if the processor does not support the 1-setting of that control), no checks are performed on the secondary processor-based VM-execution controls. The logical processor operates as if all the secondary processor-based VM-execution controls were 0.
VM ENTRIES — Any clearing of the bytes occurs even if the VM entry subsequently fails. • If the “use TPR shadow” VM-execution control is 1, bits 31:4 of the TPR threshold VM-execution control field must be 0.
VM ENTRIES 22.2.1.2 VM-Exit Control Fields VM entries perform the following checks on the VM-exit control fields. • Reserved bits in the VM-exit controls must be set properly. Software may consult the VMX capability MSRs to determine the proper settings (see Appendix G.4). • If “activate VMX-preemption timer” VM-execution control is 0, the “save VMXpreemption timer value” VM-exit control must also be 0.
VM ENTRIES • Fields relevant to VM-entry event injection must be set properly. These fields are the VM-entry interruption-information field (see Table 20-12 in Section 20.8.3), the VM-entry exception error code, and the VM-entry instruction length. If the valid bit (bit 31) in the VM-entry interruption-information field is 1, the following must hold: — The field’s interruption type (bits 10:8) is not set to a reserved value.
VM ENTRIES • If the processor is not in SMM, the “entry to SMM” and “deactivate dual-monitor treatment” VM-entry controls must be 0. • The “entry to SMM” and “deactivate dual-monitor treatment” VM-entry controls cannot both be 1. 22.2.2 Checks on Host Control Registers and MSRs The following checks are performed on fields in the host-state area that correspond to control registers and MSRs: • The CR0 field must not set any bit to a value not supported in VMX operation (see Section 19.8).
VM ENTRIES • • The selector fields for CS and TR cannot be 0000H. • On processors that support Intel 64 architecture, the base-address fields for FS, GS, GDTR, IDTR, and TR must contain canonical addresses. The selector field for SS cannot be 0000H if the “host address-space size” VM-exit control is 0. 22.2.
VM ENTRIES 22.3.1 Checks on the Guest State Area This section describes checks performed on fields in the guest-state area. These checks may be performed in any order. The following subsections reference fields that correspond to processor state. Unless otherwise stated, these references are to fields in the guest-state area. 22.3.1.
VM ENTRIES • If the “load IA32_EFER” VM-entry control is 1, bits reserved in the IA32_EFER MSR must be 0 in the field for that register. In addition, the values of the LMA and LME bits in the field must each be that of the “IA-32e mode guest” VM-exit control. 22.3.1.2 Checks on Guest Segment Registers This section specifies the checks on the fields for CS, SS, DS, ES, FS, GS, TR, and LDTR.
VM ENTRIES — Bits 3:0 (Type) must be 3, indicating an expand-up read/write accessed data segment. — Bit 4 (S) must be 1. — Bits 6:5 (DPL) must be 3. — Bit 7 (P) must be 1. — Bits 11:8 (reserved), bit 12 (software available), bit 13 (reserved/L), bit 14 (D/B), bit 15 (G), bit 16 (unusable), and bits 31:17 (reserved) must all be 0. • If the guest will not be virtual-8086, the different sub-fields are considered separately: — Bits 3:0 (Type). • CS.
VM ENTRIES — Bit 14 (D/B). For CS, D/B must be 0 if the guest will be IA-32e mode and the L bit (bit 13) in the access-rights field is 1. — Bit 15 (G). The following checks apply if the register is CS or if the register is usable: • • If any bit in the limit field in the range 11:0 is 0, G must be 0. If any bit in the limit field in the range 31:20 is 1, G must be 1. — Bits 31:17 (reserved). If the register is CS or if the register is usable, these bits must all be 0. — TR.
VM ENTRIES • On processors that support Intel 64 architecture, the base-address fields must contain canonical addresses. • Bits 31:16 of each limit field must be 0. 22.3.1.4 Checks on Guest RIP and RFLAGS The following checks are performed on fields in the guest-state area corresponding to RIP and RFLAGS: • RIP.
VM ENTRIES — The activity-state field must indicate the active state if the interruptibilitystate field indicates blocking by either MOV-SS or by STI (if either bit 0 or bit 1 in that field is 1).
VM ENTRIES — Bit 2 (blocking by SMI) must be 0 if the processor is not in SMM. — Bit 2 (blocking by SMI) must be 1 if the “entry to SMM” VM-entry control is 1. — A processor may require bit 0 (blocking by STI) to be 0 if the valid bit (bit 31) in the VM-entry interruption-information field is 1 and the interruption type (bits 10:8) in that field has value 2, indicating NMI. Other processors may not make this requirement.
VM ENTRIES — If the processor is in SMM and the “entry to SMM” VM-entry control is 0, the field must not contain the VMXON pointer. 22.3.1.6 Checks on Guest Page-Directory-Pointer-Table Entries If CR0.PG =1 and CR4.PAE = 1, the logical processor uses the physical-address extension (PAE). If IA32_EFER.LMA = 0, the logical processor also uses PAE paging (see Section 3.8 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A).
VM ENTRIES • • Some state is determined by VM-entry controls. The page-directory pointers are loaded based on the values of certain control registers. This loading may be performed in any order and in parallel with the checking of VMCS contents (see Section 22.3.1). The loading of guest state is detailed in Section 22.3.2.1 to Section 22.3.2.4. These sections reference VMCS fields that correspond to processor state. Unless otherwise stated, these references are to fields in the guest-state area.
VM ENTRIES tively. On processors that do not support Intel 64 architecture, these fields have only 32 bits; bits 63:32 of the MSRs are cleared to 0. — The following are performed on processors that support Intel 64 architecture: • The MSRs FS.base and GS.base are loaded from the base-address fields for FS and GS, respectively (see Section 22.3.2.2).
VM ENTRIES — For the other fields, the unusable bit of the access-rights field is consulted: • • • If the unusable bit is 0, all of the access-rights fields are loaded. If the unusable bit is 1, the remainder of CS access rights are undefined after VM entry. SS, DS, ES, FS, and GS, and LDTR. — The selector fields are loaded.
VM ENTRIES • If the control is 0, the PDPTEs are loaded from the page-directory-pointer table referenced by the physical address in the value of CR3 being loaded by the VM entry (see Section 22.3.2.1). The values loaded are treated as physical addresses in VMX non-root operation. • If the control is 1, the PDPTEs are loaded from corresponding fields in the gueststate area (see Section 20.4.2). The values loaded are treated as guest-physical addresses in VMX non-root operation. 22.3.2.
VM ENTRIES • The value of bits 31:0 indicates an MSR that cannot be loaded on VM entries for model-specific reasons. A processor may prevent loading of certain MSRs even if they can normally be written by WRMSR. Such model-specific behavior is documented in Appendix B. • • Bits 63:32 are not all 0. An attempt to write bits 127:64 to the MSR indexed by bits 31:0 of the entry would cause a general-protection exception if executed via WRMSR with CPL = 0.
VM ENTRIES Section 22.5.1.1 provides details of vectored-event injection. In general, the event is delivered exactly as if it had been generated normally. If event delivery encounters a nested exception (for example, a general-protection exception because the vector indicates a descriptor beyond the IDT limit), the exception bitmap is consulted using the vector of that exception. If the bit is 0, the exception is delivered through the IDT. If the bit is 1, a VM exit occurs. Section 22.5.1.
VM ENTRIES • If the deliver-error-code bit (bit 11) is set in the VM-entry interruptioninformation field, the contents of the VM-entry exception error-code field is pushed on the stack as an error code would be pushed during delivery of an exception. • DR6, DR7, and the IA32_DEBUGCTL MSR are not modified by event injection, even if the event has vector 1 (normal deliveries of debug exceptions, which have vector 1, do update these registers).
VM ENTRIES • The transition causes a last-branch record to be logged if the LBR bit is set in the IA32_DEBUGCTL MSR. This is true even for events such as debug exceptions, which normally clear the LBR bit before delivery. • The last-exception record MSRs (LERs) may be updated based on the setting of the LBR bit in the IA32_DEBUGCTL MSR.
VM ENTRIES • If the “virtualize APIC accesses” VM-execution control is 1 and event delivery generates an access to the APIC-access page, that access may cause an APICaccess VM exit (see Section 21.2) or, if the access is a VTPR access, be treated as specified in Section 21.5.3.1 If the event-delivery process does cause a VM exit, the processor state before the VM exit is determined just as it would be had the injected event occurred during normal execution in VMX non-root operation.
VM ENTRIES incurs an exception (including a debug exception made pending by VM entry; see Section 22.6.3). — Events are blocked by MOV SS if and only if bit 1 in the interruptibility-state field is 1. This may affect the treatment of pending debug exceptions; see Section 22.6.3. Such blocking is cleared after the guest executes one instruction or incurs an exception (including a debug exception made pending by VM entry).
VM ENTRIES cycle that is normally generated when that activity state is entered from the active state. If VM entry would end with the logical processor in the shutdown state and the logical processor is in SMX operation,1 an Intel® TXT shutdown condition occurs. The error code used is 0000H, indicating “legacy shutdown.” See Intel® Trusted Execution Technology Preliminary Architecture Specification. • Some activity states unconditionally block certain events.
VM ENTRIES • The VM entry is not vectoring and the activity-state field indicates either shutdown or wait-for-SIPI. If none of the above hold, the pending debug exceptions field specifies the debug exceptions that are pending for the guest. There are valid pending debug exceptions if either the BS bit (bit 14) or the enable-breakpoint bit (bit 12) is 1.
VM ENTRIES 22.6.4 VMX-Preemption Timer If the “activate VMX-preemption timer” VM-entry control is 1, VM entry starts the VMX-preemption timer with the unsigned value in the VMX-preemption timer-value field. If the “activate VMX-preemption timer” 1 and the value in the VMX-preemption timer-value field is zero, a VM exit occurs before execution of any instruction following VM entry (if it is not blocked by activity state). The VM exit occurs with its normal priority after any event injection.
VM ENTRIES 22.6.7 VM Exits Induced by the TPR Shadow If the “use TPR shadow” and “virtualize APIC accesses” VM-execution controls are both 1, a VM exit occurs immediately after VM entry if the value of bits 3:0 of the TPR threshold VM-execution control field is greater than the value of bits 7:4 in byte 80H on the virtual-APIC page (see Section 20.6.8).1 The following items detail the treatment of these VM exits: • The VM exits are not blocked if RFLAGS.
VM ENTRIES 22.6.9 VM Entries and Advanced Debugging Features VM entries are not logged with last-branch records, do not produce branch-trace messages, and do not update the branch-trace store. 22.7 VM-ENTRY FAILURES DURING OR AFTER LOADING GUEST STATE VM-entry failures due to the checks identified in Section 22.3.1 and failures during the MSR loading identified in Section 22.4 are treated differently from those that occur earlier in VM entry. In these cases, the following steps take place: 1.
VM ENTRIES there are not also other errors. Different processors may give different exit qualifications for the same VMCS. • VM-entry failure due to MSR loading. The exit qualification is loaded to indicate which entry in the VM-entry MSR-load area caused the problem (1 for the first entry, 2 for the second, etc.). — All other VM-exit information fields are unmodified. 2. Processor state is loaded as would be done on a VM exit (see Section 23.5). If this results in [CR4.PAE & CR0.PG & ~IA32_EFER.
VM ENTRIES • A VM-entry failure occurs as described in Section 22.7. The basic exit reason is 41, for “VM-entry failure due to machine check.” The first option is not used if the machine check occurs after any guest state has been loaded. 22-34 Vol.
CHAPTER 23 VM EXITS VM exits occur in response to certain instructions and events in VMX non-root operation. Section 21.1 through Section 21.3 detail the causes of VM exits. VM exits perform the following operation: 1. Information about the cause of the VM exit is recorded in the VM-exit information fields and VM-entry control fields are modified as described in Section 23.2. 2. Processor state is saved in the guest-state area (Section 23.3). 3. MSRs may be saved in the VM-exit MSR-store area (Section 23.
VM EXITS causes a VM exit directly if the “external-interrupt exiting” VM-execution control is 1. A start-up IPI (SIPI) that arrives while a logical processor is in the wait-forSIPI activity state causes a VM exit directly. INIT signals that arrive while the processor is not in the wait-for-SIPI activity state cause VM exits directly.
VM EXITS — If the logical processor is in an inactive state (see Section 20.4.2) and not executing instructions, some events may be blocked but others may return the logical processor to the active state. Unblocked events may cause VM exits.1 If an unblocked event causes a VM exit directly, a return to the active state occurs only after the VM exit completes.2 The VM exit generates any special bus cycle that is normally generated when the active state is entered from that activity state.
VM EXITS exception record if a VM exit or triple fault occurs before an event handler is reached. • If the “virtual NMIs” VM-execution control is 1, VM entry injects an NMI, and delivery of the NMI causes a nested exception, double fault, task switch, or APIC access that causes a VM exit, virtual-NMI blocking is in effect before the VM exit commences.
VM EXITS control is 1. (Such VM exits can occur only from 64-bit mode and thus only on processors that support Intel 64 architecture.) — Trap-like VM exits due to execution of WRMSR when the “use MSR bitmaps” VM-execution control is 1, the value of ECX is 808H, bit 808H in write bitmap for low MSRs is 0, and the “virtualize x2APIC mode” VM-execution control is 1. See Section 21.1.3. — VM exits caused by TPR-shadow updates (see Section 21.5.3.
VM EXITS — For a debug exception, the exit qualification contains information about the debug exception. The information has the format given in Table 23-1. Table 23-1. Exit Qualification for Debug Exceptions Bit Position(s) Contents 3:0 B3 – B0. When set, each of these bits indicates that the corresponding breakpoint condition was met. Any of these bits may be set even if its corresponding enabling bit in DR7 is not set. 12:4 Reserved (cleared to 0). 13 BD.
VM EXITS Table 23-2. Exit Qualification for Task Switch (Contd.) Bit Position(s) Contents 63:32 Reserved (cleared to 0). These bits exist only on processors that support Intel 64 architecture. — For INVLPG, the exit qualification contains the linear-address operand of the instruction. • On processors that support Intel 64 architecture, bits 63:32 are cleared if the logical processor was not in 64-bit mode before the VM exit.
VM EXITS Table 23-3. Exit Qualification for Control-Register Accesses (Contd.
VM EXITS — For MOV DR, the exit qualification contains information about the instruction and has the format given in Table 23-4. Table 23-4.
VM EXITS Table 23-5. Exit Qualification for I/O Instructions (Contd.) Bit Position(s) Contents 5 REP prefixed (0 = not REP; 1 = REP) 6 Operand encoding (0 = DX, 1 = immediate) 15:7 Reserved (cleared to 0) 31:16 Port number (as specified in DX or in an immediate operand) 63:32 Reserved (cleared to 0). These bits exist only on processors that support Intel 64 architecture. — For MWAIT, the exit qualification contains a value that indicates whether address-range monitoring hardware was armed.
VM EXITS Such a VM exit that set bits 15:12 of the exit qualification to 0000b (data read during instruction execution) or 0001b (data write during instruction execution) set bit 12—which distinguishes data read from data write—to that which would have been stored in bit 1—W/R—of the page-fault error code had the access caused a page fault instead of an APIC-access VM exit.
VM EXITS Table 23-7. Exit Qualification for EPT Violations (Contd.) Bit Position(s) Contents 3 The logical-AND of bit 0 in the EPT paging-structures entries used to translate the guest-physical address of the access causing the EPT violation (indicates that the guest-physical address was readable).
VM EXITS — VM exits due to attempts to execute INS or OUTS for which the relevant segment (ES for INS; DS for OUTS unless overridden by an instruction prefix) is usable. The field receives the value of the linear address generated by ES:(E)DI (for INS) or segment:(E)SI (for OUTS; the default segment is DS but can be overridden by a segment override prefix). (If the relevant segment is not usable, the value is undefined.
VM EXITS — Bit 11 is set to 1 if the VM exit is caused by a hardware exception that would have delivered an error code on the stack. If bit 11 is set to 1, the error code is placed in the VM-exit interruption error code (see below). — Bit 12 is undefined in any of the following cases: • If the “NMI exiting” VM-execution control is 1 and the “virtual NMIs” VM-execution control is 0. • If the VM exit sets the valid bit in the IDT-vectoring information field (see Section 23.2.3).
VM EXITS • A fault occurs during event delivery and causes a VM exit (because the bit associated with the fault is set to 1 in the exception bitmap). • A task switch is invoked through a task gate in the IDT. Note that the VM exit occurs due to the task switch only after the initial checks of the task switch pass (see Section 21.6.2). • • Event delivery causes an APIC-access VM exit (see Section 21.2). An EPT violation or EPT misconfiguration that occurs during event delivery.
VM EXITS — Bit 11 is set to 1 if the VM exit occurred during delivery of a hardware exception that would have delivered an error code on the stack. If bit 11 is set to 1, the error code is placed in the IDT-vectoring error code (see below). — Bit 12 is undefined. — Bits 30:13 are always set to 0. — Bit 31 is always set to 1. For other VM exits, the field is marked invalid (by clearing bit 31) and the remainder of the field is undefined. • IDT-vectoring error code.
VM EXITS — For VM exits due to attempts to effect a task switch via instruction execution. These are VM exits that produce an exit reason indicating task switch and either of the following: • An exit qualification indicating execution of CALL, IRET, or JMP instruction. • An exit qualification indicating a task gate in the IDT and an IDT-vectoring information field indicating that the task gate was encountered during delivery of a software interrupt, privileged software exception, or software exception.
VM EXITS Table 23-8. Format of the VM-Exit Instruction-Information Field as Used for INS and OUTS (Contd.) Bit Position(s) Content 9:7 Address size: 0: 16-bit 1: 32-bit 2: 64-bit (used only on processors that support Intel 64 architecture) Other values not used. 14:10 Undefined. 17:15 Segment register: 0: ES 1: CS 2: SS 3: DS 4: FS 5: GS Other values not used. Undefined for VM exits due to execution of INS. 31:18 Undefined.
VM EXITS Table 23-9. Format of the VM-Exit Instruction-Information Field as Used for LIDT, LGDT, SIDT, or SGDT (Contd.) Bit Position(s) Content 10 Cleared to 0. 11 Undefined. 13:12 Operand size: 1: 16-bit 2: 32-bit Other values not used. Undefined for VM exits from 64-bit mode. 14 Undefined. 17:15 Segment register: 0: ES 1: CS 2: SS 3: DS 4: FS 5: GS Other values not used.
VM EXITS Table 23-9. Format of the VM-Exit Instruction-Information Field as Used for LIDT, LGDT, SIDT, or SGDT (Contd.) Bit Position(s) Content 29:28 Instruction identity: 0: SGDT 1: SIDT 2: LGDT 3: LIDT 31:30 Undefined. — For VM exits due to attempts to execute LLDT, LTR, SLDT, or STR, the field has the format is given in Table 23-10. Table 23-10.
VM EXITS Table 23-10. Format of the VM-Exit Instruction-Information Field as Used for LLDT, LTR, SLDT, and STR (Contd.) Bit Position(s) Content 9:7 Address size: 0: 16-bit 1: 32-bit 2: 64-bit (used only on processors that support Intel 64 architecture) Other values not used. Undefined for register instructions (bit 10 is set). 10 Mem/Reg (0 = memory; 1 = register). 14:11 Undefined. 17:15 Segment register: 0: ES 1: CS 2: SS 3: DS 4: FS 5: GS Other values not used.
VM EXITS Table 23-11. Format of the VM-Exit Instruction-Information Field as Used for VMCLEAR, VMPTRLD, VMPTRST, and VMXON Bit Position(s) Content 1:0 Scaling: 0: no scaling 1: scale by 2 2: scale by 4 3: scale by 8 (used only on processors that support Intel 64 architecture) Undefined for instructions with no index register (bit 22 is set). 6:2 Undefined. 9:7 Address size: 0: 16-bit 1: 32-bit 2: 64-bit (used only on processors that support Intel 64 architecture) Other values not used.
VM EXITS Table 23-11. Format of the VM-Exit Instruction-Information Field as Used for VMCLEAR, VMPTRLD, VMPTRST, and VMXON (Contd.) Bit Position(s) Content 26:23 BaseReg (encoded as IndexReg above) Undefined for instructions with no base register (bit 27 is set). 27 BaseReg invalid (0 = valid; 1 = invalid) 31:28 Undefined. — For VM exits due to attempts to execute VMREAD or VMWRITE, the field has the format is given in Table 23-12. Table 23-12.
VM EXITS Table 23-12. Format of the VM-Exit Instruction-Information Field as Used for VMREAD and VMWRITE (Contd.) Bit Position(s) Content 10 Mem/Reg (0 = memory; 1 = register). 14:11 Undefined. 17:15 Segment register: 0: ES 1: CS 2: SS 3: DS 4: FS 5: GS Other values not used. Undefined for register instructions (bit 10 is set).
VM EXITS Table 23-13. Format of the VM-Exit Instruction-Information Field as Used for INVEPT and INVVPID (Contd.) Bit Position(s) Content 9:7 Address size: 0: 16-bit 1: 32-bit 2: 64-bit (used only on processors that support Intel 64 architecture) Other values not used. 10 Cleared to 0. 14:11 Undefined. 17:15 Segment register: 0: ES 1: CS 2: SS 3: DS 4: FS 5: GS Other values not used.
VM EXITS • I/O RCX, I/O RSI, I/O RDI, I/O RIP. These fields are undefined except for SMM VM exits due to system-management interrupts (SMIs) that arrive immediately after retirement of I/O instructions. See Section 25.15.2.3. 23.3 SAVING GUEST STATE Each field in the guest-state area of the VMCS (see Section 20.4) is written with the corresponding component of processor state. On processors that support Intel 64 architecture, the full values of each natural-width field (see Section 20.10.
VM EXITS 23.3.2 Saving Segment Registers and Descriptor-Table Registers For each segment register (CS, SS, DS, ES, FS, GS, LDTR, or TR), the values saved for the base-address, segment-limit, and access rights are based on whether the register was unusable (see Section 20.4.1) before the VM exit: • If the register was unusable, the values saved into the following fields are undefined: (1) base address; (2) segment limit; and (3) bits 7:0 and bits 15:12 in the access-rights field.
VM EXITS — If the VM exit occurs due to the 1-setting of either the “interrupt-window exiting” VM-execution control or the “NMI-window exiting” VM-execution control, the value saved is that which would be in the register had the VM exit not occurred. — If the VM exit is due to an external interrupt, non-maskable interrupt (NMI), or hardware exception (as defined in Section 23.2.
VM EXITS — If the VM exit is caused directly by an event that would normally be delivered through the IDT, the value saved is that which would appear in the saved RFLAGS image (either that which would be saved on the stack had the event been delivered through a trap or interrupt gate1 or into the old task-state segment had the event been delivered through a task gate) had the event been delivered through the IDT. See below for VM exits due to task switches caused by task gates in the IDT.
VM EXITS • The activity-state field is saved with the logical processor’s activity state before the VM exit.1 See Section 23.1 for details of how events leading to a VM exit may affect the activity state. • The interruptibility-state field is saved to reflect the logical processor’s interruptibility before the VM exit. See Section 23.1 for details of how events leading to a VM exit may affect this state.
VM EXITS • IA32_DEBUGCTL.BTF = 0 and the cause of a pending debug exception was the execution of a single instruction. • IA32_DEBUGCTL.BTF = 1 and the cause of a pending debug exception was a taken branch. — Suppose that a VM exit is due to another reason (but not a debug exception) and occurs while there is MOV-SS blocking of debug exceptions. In this case, the value saved sets bits corresponding to the causes of any debug exceptions that were pending at the time of the VM exit.
VM EXITS — If the “enable EPT” VM-execution control is 0 or the logical processor was not using PAE paging at the time of the VM exit, the values saved are undefined. 23.4 SAVING MSRS After processor state is saved to the guest-state area, values of MSRs may be stored into the VM-exit MSR-store area (see Section 20.7.2).
VM EXITS A logical processor is in IA-32e mode after a VM exit only if the “host address-space size” VM-exit control is 1. If the logical processor was in IA-32e mode before the VM exit and this control is 0, a VMX abort occurs. See Section 23.7. In addition to loading host state, VM exits clear address-range monitoring (Section 23.5.6). After the state loading described in this section, VM exits may load MSRs from the VM-exit MSR-load area (see Section 23.6).
VM EXITS • The MSRs FS.base and GS.base are loaded from the base-address fields for FS and GS, respectively (see Section 23.5.2). • The LMA and LME bits in the IA32_EFER MSR are each loaded with the setting of the “host address-space size” VM-exit control. — If the “load IA32_PERF_GLOBAL_CTRL” VM-exit control is 1, the IA32_PERF_GLOBAL_CTRL MSR is loaded from the IA32_PERF_GLOBAL_CTRL field. — If the “load IA32_PAT” VM-exit control is 1, the IA32_PAT MSR is loaded from the IA32_PAT field.
VM EXITS • The type field and S bit are set as follows: — CS. Type set to 11 and S set to 1 (execute/read, accessed, non-conforming code segment). — SS, DS, ES, FS, and GS. Undefined if the segment is unusable; otherwise, type set to 3 and S set to 1 (read/write, accessed, expand-up data segment). — TR. Type set to 11 and S set to 0 (busy 32-bit task-state segment). • The DPL is set as follows: — CS, SS, and TR. Set to 0. The current privilege level (CPL) will be 0 after the VM exit completes.
VM EXITS 23.5.3 Loading Host RIP, RSP, and RFLAGS RIP and RSP are loaded from the RIP field and the RSP field, respectively. RFLAGS is cleared, except bit 1, which is always set. 23.5.4 Checking and Loading Host Page-Directory-Pointer-Table Entries If CR0.PG = 1 and CR4.PAE = 1, the logical processor uses the physical-address extension (PAE). If, in addition, IA32_EFER.LMA = 0, the logical processor uses PAE paging. See Section 3.
VM EXITS • There are no pending debug exceptions after a VM exit. Section 24.3 describes how the VMX architecture controls how a logical processor manages information in the TLBs and paging-structure caches. The following items detail how VM exits invalidate cached mappings: • If the “enable VPID” VM-execution control is 0, the logical processor invalidates VPID-tagged mappings and dual-tagged mappings associated with VPID 0000H; dual-tagged mappings for VPID 0000H are invalidated for all EPTPs.
VM EXITS If any MSR is being loaded in such a way that would architecturally require a TLB flush, the TLBs are updated so that, after VM exit, the logical processor does not use any translations that were cached before the transition. 23.7 VMX ABORTS A problem encountered during a VM exit leads to a VMX abort. A VMX abort takes a logical processor into a shutdown state as described below. A VMX abort does not modify the VMCS data in the VMCS region of any active VMCS.
VM EXITS After saving the VMX-abort indicator, operation of a logical processor experiencing a VMX abort depends on whether the logical processor is in SMX operation:1 • If the logical processor is in SMX operation, an Intel® TXT shutdown condition occurs. The error code used is 000DH, indicating “VMX abort.” See Intel® Trusted Execution Technology Measured Launched Environment Programming Guide.
VM EXITS The first option is not used if the machine check occurs after any host state has been loaded. 23-40 Vol.
CHAPTER 24 SUPPORT FOR ADDRESS TRANSLATION The architecture for VMX operation includes two features that support address translation: virtual-processor identifiers (VPIDs) and extended page tables (EPT). VPIDs are a mechanism for managing translations of linear addresses. EPT defines a layer of address translation that augments the translation of linear addresses. Section 24.1 details the architecture of VPIDs. Section 24.2 provides the details of EPT. Section 24.
SUPPORT FOR ADDRESS TRANSLATION 24.2 EXTENDED PAGE TABLES (EPT) The extended page-table mechanism (EPT) is a feature that can be used to support the virtualization of physical memory. When EPT is in use, certain addresses that would normally be treated as physical addresses (and used to access memory) are instead treated as guest-physical addresses. Guest-physical addresses are translated by traversing a set of EPT paging structures to produce physical addresses that are used to access memory.
SUPPORT FOR ADDRESS TRANSLATION • If PAE paging is not being used, the MOV to CR3 instruction does not use the guest-physical address to access memory.1 Thus, the instruction does not cause that address to be translated through EPT. The address will be translated through EPT on the next memory accessing using a linear address. • If PAE paging is being used, the MOV to CR3 instruction loads the four (4) pagedirectory-pointer-table entries (PDPTEs) from the guest-physical address.
SUPPORT FOR ADDRESS TRANSLATION • A 4-KByte naturally aligned EPT PML4 table is located at the physical address specified in bits 51:12 of the extended-page-table pointer (EPTP), a VMexecution control field (see Table 20-8 in Section 20.6.11). An EPT PML4 entry is selected from this table using a physical address defined as follows: — Bits 63:52 are all 0. — Bits 51:12 are from the EPTP. — Bits 11:3 are bits 47:39 of the guest-physical address. — Bits 2:0 are all 0.
SUPPORT FOR ADDRESS TRANSLATION • • • • Bits 63:52 are all 0. Bits 51:12 are from the EPT PDE. Bits 11:3 are bits 20:12 of the guest-physical address. Bits 2:0 are all 0. — The final physical address is computed as follows: • • • • Bits 63:52 are all 0. Bits 51:12 are from the EPT PTE (see Table 24-5). Bits 11:0 are from the original guest-physical address. If bit 7 of the EPT PDE is 1, the final physical address is computed as follows: — Bits 63:52 are all 0.
SUPPORT FOR ADDRESS TRANSLATION 24.2.3.1 EPT PML4 Entries An EPT PML4 entry is identified using bits 47:39 of the guest-physical address (see Section 24.2.2) and thus controls access to a 512-Gbyte region of the guest-physical address space. Table 24-1 shows the format of an EPT PML4 entry. Table 24-1.
SUPPORT FOR ADDRESS TRANSLATION Gbyte region of the guest-physical address space. Table 24-2 shows the format of an EPT PDPTE. Table 24-2.
SUPPORT FOR ADDRESS TRANSLATION Table 24-3.
SUPPORT FOR ADDRESS TRANSLATION Table 24-4. Format of an EPT Page-Directory Entry that Maps a 2-MByte Page Bit Position(s) Contents 7 Must be 1 (otherwise, this entry references an EPT page table) 11:8 Ignored 20:12 Reserved (must be 0) N–1:21 Physical address of the 2-MByte page referenced by this entry1 51:N Reserved (must be 0) 63:52 Ignored NOTES: 1. N is the physical-address width supported by the logical processor.
SUPPORT FOR ADDRESS TRANSLATION Table 24-5. Format of an EPT Page-Table Entry (Contd.) Bit Position(s) Contents N–1:12 Physical address of the 4-KByte page referenced by this entry1 51:N Reserved (must be 0) 63:52 Ignored NOTES: 1. N is the physical-address width supported by the logical processor. Note that, if bits 2:0 of an EPT PTE are all 0, the entry is considered to be “not present”; the logical processor ignores bits 63:3 of such an entry and will not use it to map a 4-KByte page. 24.2.
SUPPORT FOR ADDRESS TRANSLATION VMX capability MSR IA32_VMX_EPT_VPID_CAP to determine whether this value is supported (see Appendix G.10). • The value of bits 2:0 of an EPT paging-structure entry is not 000b (the entry is present) and one of the following holds: — A reserved bit is set. This includes the setting of a bit in the range 51:12 that is beyond the logical processor’s physical-address width.1 See Section 24.2.3 for details of which bits are reserved in which EPT paging-structure entries.
SUPPORT FOR ADDRESS TRANSLATION For an access to a guest-physical address, determination of whether an EPT misconfiguration or an EPT violation occurs is based on an iterative process:1 1. An EPT paging-structure entry is read (initially, this is an EPT PML4 entry): a. If the entry is not present (bits 2:0 are all 0), an EPT violation occurs. b. If the entry is present but its contents are not configured properly (see Section 24.2.4.1), an EPT misconfiguration occurs. c.
SUPPORT FOR ADDRESS TRANSLATION • If the entry does references another guest paging structure, an entry from that structure is accessed; step 1 is executed for that other entry. • Otherwise, the entry is used to produce the ultimate guest-physical address (the translation of the original linear address); step 2 is executed. 2. Once the ultimate guest-physical address is determined, the privileges determined by the guest paging-structure entries are evaluated: a.
SUPPORT FOR ADDRESS TRANSLATION 24.2.5.2 Memory Type Used for Translated Guest-Physical Addresses The effective memory type of a memory access using a guest-physical address (an access that is translated using EPT) is the memory type that is used to access memory.
SUPPORT FOR ADDRESS TRANSLATION which a logical processor may create and use information cached from the paging structures. Section 24.3.1 describes the different kinds of information that may be cached. Section 24.3.2 specifies when such information may be cached and how it may be used. Section 24.3.3 details how software can invalidate cached information. 24.3.
SUPPORT FOR ADDRESS TRANSLATION — Dual-tagged translations. Each of these is a mapping from a linear page number to the physical page frame to which it translates, along with information about access privileges and memory typing. — Dual-tagged paging-structure-cache entries.
SUPPORT FOR ADDRESS TRANSLATION — No dual-tagged mappings are created with information derived from guest paging-structure entries that are not present or that set reserved bits. — No VPID-tagged mappings are created while EPT is in use. The following items detail the use of the various mappings: • If EPT is not in use (e.g.
SUPPORT FOR ADDRESS TRANSLATION caused the EPT violation. If that guest-physical address was the translation of a linear address, the EPT violation also invalidates any dual-tagged mappings for that linear address associated with the current VPID and the current EPTP. • If the “enable VPID” VM-execution control is 0, VM entries and VM exits invalidate VPID-tagged mappings and dual-tagged mappings associated with VPID 0000H. Dual-tagged mappings for VPID 0000H are invalidated for all EPTPs.
SUPPORT FOR ADDRESS TRANSLATION — All-context. If the INVEPT type is 2, the logical processor invalidates EPTPtagged mappings and dual-tagged mappings associated with all EPTPs (and, for dual-tagged mappings, for all VPIDs). See Chapter 5 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B for details of the INVEPT instruction. See Section 24.3.3.4 for guidelines regarding use of this instruction.
SUPPORT FOR ADDRESS TRANSLATION — The INVVPID type is individual-address (0). — The VPID in the INVVPID descriptor is the one assigned to the virtual processor whose execution is being emulated. — The linear address in the INVVPID descriptor is that of the operand of the INVLPG instruction being emulated. • Some instructions invalidate all entries in the TLBs and paging-structure caches—except for global translations. An example is the MOV to CR3 instruction. (See Section 3.
SUPPORT FOR ADDRESS TRANSLATION 24.3.3.4 Guidelines for Use of the INVEPT Instruction The following items provide guidelines for use of the INVEPT instruction to invalidate information cached from the EPT paging structures.
SUPPORT FOR ADDRESS TRANSLATION shootdown.” A discussion of TLB shootdown appears in Section 9 of the application note “TLBs, Paging-Structure Caches, and Their Invalidation.” 24-22 Vol.
CHAPTER 25 SYSTEM MANAGEMENT This chapter describes aspects of IA-64 and IA-32 architecture used in system management mode (SMM). SMM provides an alternate operating environment that can be used to monitor and manage various system resources for more efficient energy usage, to control system hardware, and/or to run proprietary code. It was introduced into the IA-32 architecture in the Intel386 SL processor (a mobile specialized version of the Intel386 processor).
SYSTEM MANAGEMENT • All interrupts normally handled by the operating system are disabled upon entry into SMM. • The RSM instruction can be executed only in SMM. SMM is similar to real-address mode in that there are no privilege levels or address mapping. An SMM program can address up to 4 GBytes of memory and can execute all I/O and applicable system instructions. See Section 25.5 for more information about the SMM execution environment.
SYSTEM MANAGEMENT 25.2 SYSTEM MANAGEMENT INTERRUPT (SMI) The only way to enter SMM is by signaling an SMI through the SMI# pin on the processor or through an SMI message received through the APIC bus. The SMI is a nonmaskable external interrupt that operates independently from the processor’s interrupt- and exception-handling mechanism and the local APIC. The SMI takes precedence over an NMI and a maskable interrupt. SMM is non-reentrant; that is, the SMI is disabled while the processor is in SMM.
SYSTEM MANAGEMENT An SMI has a greater priority than debug exceptions and external interrupts. Thus, if an NMI, maskable hardware interrupt, or a debug exception occurs at an instruction boundary along with an SMI, only the SMI is handled. Subsequent SMI requests are not acknowledged while the processor is in SMM.
SYSTEM MANAGEMENT other inhibits. On these processors, NMIs will be inhibited if no action is taken in the SMM handler to uninhibit them (see Section 25.8). If the processor is in the HALT state when the SMI is received, the processor handles the return from SMM slightly differently (see Section 25.10). Also, the SMBASE address can be changed on a return from SMM (see Section 25.11). 25.4 SMRAM While in SMM, the processor executes code and stores data in the SMRAM space.
SYSTEM MANAGEMENT 25.4.1 SMRAM State Save Map When an IA-32 processor that does not support Intel 64 architecture initially enters SMM, it writes its state to the state save area of the SMRAM. The state save area begins at [SMBASE + 8000H + 7FFFH] and extends down to [SMBASE + 8000H + 7E00H]. Table 25-1 shows the state save map. The offset in column 1 is relative to the SMBASE value plus 8000H. Reserved spaces should not be used by software.
SYSTEM MANAGEMENT Table 25-1. SMRAM State Save Map (Contd.) Offset (Added to SMBASE + 8000H) Register Writable? 7FDCH EBX Yes 7FD8H EDX Yes 7FD4H ECX Yes 7FD0H EAX Yes 7FCCH DR6 No 7FC8H DR7 No 7FC4H 1 TR No 7FC0H Reserved No 7FBCH GS1 No 7FB8H 1 No 7FB4H DS 1 No 7FB0H SS1 No 7FACH 1 No 1 FS CS 7FA8H ES No 7FA4H I/O State Field, see Section 25.7 No 7FA0H I/O Memory Address Field, see Section 25.
SYSTEM MANAGEMENT If an SMI request is issued for the purpose of powering down the processor, the values of all reserved locations in the SMM state save must be saved to nonvolatile memory. The following state is not automatically saved and restored following an SMI and the RSM instruction, respectively: • • • • • Debug registers DR0 through DR3. • • • • The state of the trap controller. The x87 FPU registers. The MTRRs. Control register CR2.
SYSTEM MANAGEMENT Additionally, the SMRAM state save map shown in Table 25-3 also applies to processors with the following CPUID signatures listed in Table 25-2, irrespective of the value in CPUID.80000001:EDX[29]. Table 25-2.
SYSTEM MANAGEMENT Table 25-3. SMRAM State Save Map for Intel 64 Architecture (Contd.
SYSTEM MANAGEMENT Table 25-3. SMRAM State Save Map for Intel 64 Architecture (Contd.) Offset (Added to SMBASE + 8000H) Register Writable? 7E94H IDT Base (lower 32 bits) No 7E90H GDT Limit No 7E8CH GDT Base (lower 32 bits) No 7E8BH - 7E44H Reserved No 7E40H CR4 No 7E3FH - 7DF0H Reserved No 7DE8H IO_EIP Yes 7DE7H - 7DDCH Reserved No 7DD8H IDT Base (Upper 32 bits) No 7DD4H LDT Base (Upper 32 bits) No 7DD0H GDT Base (Upper 32 bits) No 7DCFH - 7C00H Reserved No NOTE: 1.
SYSTEM MANAGEMENT maintains cache coherency, but the incurs the overhead of two complete cache flushes. For Pentium 4, Intel Xeon, and P6 family processors, a combination of the first two methods of locating the SMRAM is recommended. Here the SMRAM is split between an overlapping and a dedicated region of memory. Upon entering SMM, the SMRAM space that is accessed overlaps video memory (typically located in low memory). This SMRAM section is designated as UC memory.
SYSTEM MANAGEMENT • The default operand and address sizes are set to 16 bits, which restricts the addressable SMRAM address space to the 1-MByte real-address mode limit for native real-address-mode code. However, operand-size and address-size override prefixes can be used to access the address space beyond the 1-MByte. Table 25-4.
SYSTEM MANAGEMENT Maskable hardware interrupts, exceptions, NMI interrupts, SMI interrupts, A20M interrupts, single-step traps, breakpoint traps, and INIT operations are inhibited when the processor enters SMM. Maskable hardware interrupts, exceptions, singlestep traps, and breakpoint traps can be enabled in SMM if the SMM execution environment provides and initializes an interrupt table and the necessary interrupt and exception handlers (see Section 25.6). 25.
SYSTEM MANAGEMENT • The SMBASE relocation feature affects the way the processor will return from an interrupt or exception generated while the SMI handler is executing. For example, if the SMBASE is relocated to above 1 MByte, but the exception handlers are below 1 MByte, a normal return to the SMI handler is not possible.
SYSTEM MANAGEMENT Note that the IO_SMI bit by itself is a strong indication, not a guarantee, that the SMI is synchronous. This is because an asynchronous SMI might coincidentally be taken after an I/O instruction. In such a case, the IO_SMI bit would still be set in the SMM state save map. Information characterizing the I/O instruction is saved in two locations in the SMM State Save Map (Table 25-5). Note that the IO_SMI bit also serves as a valid bit for the rest of the I/O information fields.
SYSTEM MANAGEMENT 25.8 NMI HANDLING WHILE IN SMM NMI interrupts are blocked upon entry to the SMI handler. If an NMI request occurs during the SMI handler, it is latched and serviced after the processor exits SMM. Only one NMI request will be latched during the SMI handler. If an NMI request is pending when the processor executes the RSM instruction, the NMI is serviced before the next instruction of the interrupted code sequence. This assumes that NMIs were not blocked before the SMI occurred.
SYSTEM MANAGEMENT Register Offset 7EFCH 31 0 18 17 16 15 SMM Revision Identifier Reserved SMBASE Relocation I/O Instruction Restart Figure 25-2. SMM Revision Identifier The upper word of the SMM revision identifier refers to the extensions available. If the I/O instruction restart flag (bit 16) is set, the processor supports the I/O instruction restart (see Section 25.12); if the SMBASE relocation flag (bit 17) is set, SMRAM base address relocation is supported (see Section 25.11). 25.
SYSTEM MANAGEMENT These options are summarized in Table 25-7. Note that if the processor was not in a HALT state when the SMI was received (the auto HALT restart flag is cleared), setting the flag to 1 will cause unpredictable behavior when the RSM instruction is executed. Table 25-7. Auto HALT Restart Flag Values Value of Flag After Entry to SMM Value of Flag Action of Processor When Exiting SMM When Exiting SMM 0 0 Returns to next instruction in interrupted program or task. 0 1 Unpredictable.
SYSTEM MANAGEMENT 31 0 SMM Base Register Offset 7EF8H Figure 25-4. SMBASE Relocation Field In multiple-processor systems, initialization software must adjust the SMBASE value for each processor so that the SMRAM state save areas for each processor do not overlap. (For Pentium and Intel486 processors, the SMBASE values must be aligned on a 32-KByte boundary or the processor will enter shutdown state during the execution of a RSM instruction.
SYSTEM MANAGEMENT returning from the SMI handler, the I/O instruction restart mechanism can be used to re-execute the I/O instruction that caused the SMI. The I/O instruction restart field (at offset 7F00H in the SMM state-save area, see Figure 25-5) controls I/O instruction restart. When an RSM instruction is executed, if this field contains the value FFH, then the EIP register is modified to point to the I/O instruction that received the SMI request.
SYSTEM MANAGEMENT tion restart field is set to FFH prior to returning from the second SMI handler, the EIP will point to an address different from the originally interrupted I/O instruction, which will likely lead to a program error.
SYSTEM MANAGEMENT 25.14.1 Default Treatment of SMI Delivery Ordinary SMI delivery saves processor state into SMRAM and then loads state based on architectural definitions. Under the default treatment, processors that support VMX operation perform SMI delivery as follows: enter SMM; save the following internal to the processor: CR4.
SYSTEM MANAGEMENT Note that processors that do not support SMI recognition while there is blocking by STI or by MOV SS need not save the state of such blocking. If the logical processor supports the 1-setting of the “enable EPT” VM-execution control and the logical processor was in VMX non-root operation at the time of an SMI, it saves the value of that control into bit 0 of the 32-bit field at offset SMBASE + 8000H + 7EE0H (SMBASE + FEE0H; see Table 25-3).
SYSTEM MANAGEMENT SS.RPL ← SS.DPL; FI; restore current VMCS pointer; FI; leave SMM; IF logical processor will be in VMX operation or in SMX operation after RSM THEN block A20M and leave A20M mode; FI; FI; RSM unblocks SMIs. It restores the state of blocking by NMI (see Table 20-3 in Section 20.4.2) as follows: • If the RSM is not to VMX non-root operation or if the “virtual NMIs” VM-execution control will be 0, the state of NMI blocking is restored normally.
SYSTEM MANAGEMENT 25.15 DUAL-MONITOR TREATMENT OF SMIs AND SMM Dual-monitor treatment is activated through the cooperation of the executive monitor (the VMM that operates outside of SMM to provide basic virtualization) and the SMM monitor (the VMM that operates inside SMM—while in VMX operation—to support system-management functions). Control is transferred to the SMM monitor through VM exits; VM entries are used to return from SMM. The dual-monitor treatment may not be supported by all processors.
SYSTEM MANAGEMENT Unlike other VM exits, SMM VM exits can begin in VMX root operation. SMM VM exits result from the arrival of an SMI outside SMM or from execution of VMCALL in VMX root operation outside SMM. Execution of VMCALL in VMX root operation causes an SMM VM exit only if the valid bit is set in the IA32_SMM_MONITOR_CTL MSR (see Section 25.15.5). Execution of VMCALL in VMX root operation causes an SMM VM exit even under the default treatment.
SYSTEM MANAGEMENT — SMM VM exits are the only VM exits that may occur in VMX root operation. Because the SMM monitor may need to know whether it was invoked from VMX root or VMX non-root operation, this information is stored in bit 29 of the exit-reason field (see Table 20-13 in Section 20.9.1). The bit is set by SMM VM exits from VMX root operation. — If the SMM VM exit occurred in VMX non-root operation and an MTF VM exit was pending, bit 28 of the exit-reason field is set; otherwise, it is cleared.
SYSTEM MANAGEMENT processors that support Intel 64 architecture, bits 63:32 are clear if the logical processor was not in 64-bit mode before the VM exit. • I/O RCX, I/O RSI, I/O RDI, and I/O RIP. For an SMM VM exit due an SMI that arrives immediately after the retirement of an I/O instruction, these fields receive the values that were in RCX, RSI, RDI, and RIP, respectively, before the I/O instruction executed. Thus, the value saved for I/O RIP addresses the I/O instruction. 25.15.2.
SYSTEM MANAGEMENT 25.15.4 VM Entries that Return from SMM The SMM monitor returns from SMM using a VM entry with the “entry to SMM” VM-entry control clear. VM entries that return from SMM reverse the effects of an SMM VM exit (see Section 25.15.2). VM entries that return from SMM may differ from other VM entries in that they do not necessarily enter VMX non-root operation.
SYSTEM MANAGEMENT • If the executive-VMCS pointer field does not contain the VMXON pointer (the VM entry enters VMX non-root operation), the checks are performed on the VM-execution control fields in the executive VMCS (the VMCS referenced by the executive-VMCS pointer field in the current VMCS). These checks are performed after checking the executive-VMCS pointer field itself (for proper alignment).
SYSTEM MANAGEMENT 25.15.4.5 Loading Guest State VM entries that return from SMM load the SMBASE register from the SMBASE field. VM entries that return from SMM invalidate VPID-tagged mappings and dual-tagged mappings associated with all VPIDs (dual-tagged mappings are invalidated for all EPTPs); see Section 24.3. (Note that ordinary VM entries are required to perform such invalidation only for VPID 0000H and are not required to do even that if the “enable VPID” VM-execution control is 1; see Section 22.3.
SYSTEM MANAGEMENT utive-VMCS pointer field does not contain the VMXON pointer (the VM entry enters VMX non-root operation). In this case, determination is based on the VM-execution control fields in the VMCS that is current after the VM entry. This is the VMCS referenced by the value of the executive-VMCS pointer field at the time of the VM entry (see Section 25.15.4.7). This VMCS also controls the delivery of such VM exits.
SYSTEM MANAGEMENT lishes its content. This code is also responsible for enabling the dual-monitor treatment. SMM code enables the dual-monitor treatment and determines the location of MSEG by writing to IA32_SMM_MONITOR_CTL MSR (index 9BH). The MSR has the following format: • Bit 0 is the register’s valid bit. The SMM monitor may be invoked using VMCALL only if this bit is 1. Because VMCALL is used to activate the dual-monitor treatment (see Section 25.15.
SYSTEM MANAGEMENT Table 25-10. Format of MSEG Header (Contd.) Byte Offset Field 16 CS selector 20 EIP offset 24 ESP offset 28 CR3 offset To ensure proper behavior in VMX operation, software should maintain the MSEG header in writeback cacheable memory. Future implementations may allow or require a different memory type.1 Software should consult the VMX capability MSR IA32_VMX_BASIC (see Appendix G.1).
SYSTEM MANAGEMENT 25.15.6 Activating the Dual-Monitor Treatment The dual-monitor treatment may be enabled by SMM code as described in Section 25.15.5. The dual-monitor treatment is activated only if it is enabled and only by the executive monitor. The executive monitor activates the dual-monitor treatment by executing VMCALL in VMX root operation. When VMCALL activates the dual-monitor treatment, it causes an SMM VM exit.
SYSTEM MANAGEMENT processor’s physical-address width. On processors that do not support Intel 64 architecture, the address of the last byte in the VM-exit MSRstore area should not set any bits in the range 63:32. The address of this last byte is VM-exit MSR-store address + (MSR count * 16) – 1. (The arithmetic used for the computation uses more bits than the processor’s physical-address width.) If any of these checks fail, subsequent checks are skipped and VMCALL fails.
SYSTEM MANAGEMENT 25.15.6.4 Loading Host State The VMCS that is current during an SMM VM exit that activates the dual-monitor treatment was established by the executive monitor. It does not contain the VM-exit controls and host state required to initialize the SMM monitor. For this reason, such SMM VM exits do not load processor state as described in Section 23.5.
SYSTEM MANAGEMENT • CS.Type is set to 11 (execute/read, accessed, non-conforming code segment). • For SS, DS, FS, and GS, the Type is set to 3 (read/write, accessed, expand-up data segment). • • • • The S bits for all registers are set to 1. The DPL for each register is set to 0. The P bits for all registers are set to 1. On processors that support Intel 64 architecture, CS.L is loaded with the value of the IA-32e mode SMM feature bit. • CS.
SYSTEM MANAGEMENT 25.15.6.5 Loading MSRs The VM-exit MSR-load area is not used by SMM VM exits that activate the dualmonitor treatment. No MSRs are loaded from that area. 25.15.7 Deactivating the Dual-Monitor Treatment An SMM monitor may deactivate the dual monitor treatment and return the processor to default treatment of SMIs and SMM (see Section 25.14). It does this by executing a VM entry with the “deactivate dual-monitor treatment” VM-entry control set to 1. As noted in Section 22.2.1.
CHAPTER 26 VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS 26.1 VMX SYSTEM PROGRAMMING OVERVIEW The Virtual Machine Monitor (VMM) is a software class used to manage virtual machines (VM). This chapter describes programming considerations for VMMs. Each VM behaves like a complete physical machine and can run operating system (OS) and applications. The VMM software layer runs at the most privileged level and has complete ownership of the underlying system hardware.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS • By using the similarity between real-mode and virtual-8086 mode to support real-mode guest execution in a virtual-8086 container. The virtual-8086 container may be implemented as a virtual-8086 container task within a monitor that emulates real-mode guest state and instructions, or by running the guest VM as the virtual-8086 container (by entering the guest with RFLAGS.VM1 set).
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS Before entering VMX operation, the host VMM allocates a VMXON region. A VMM can host several virtual machines and have many VMCSs active under its management. A unique VMCS region is required for each virtual machine; a VMXON region is required for the VMM itself. A VMM determines the VMCS region size by reading IA32_VMX_BASIC MSR; it creates VMCS regions of this size using a 4-KByte-aligned area of physical memory.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS (a ) V M X O p era tio n a n d V M X T ra n si tio n s V M E n try V M E n tr y V M E n try V M E n tr y VM XOFF P ro c e s s o r O p e ra tio n VM XON V M E x it V M E x it V M E x it V M E x it Le g en d: O u tsid e VMX Operation V M X R oot O p e ra tio n VMX N on -Root Operation ( b ) Sta te o f V M C S a nd VMX O p erat io n V M L a un c h B V M R e su m e B V M L a un c h A V M R e su m e A V M P trL d B V M C le a r B V M XO FF VMCS
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS 26.4 USING VMX INSTRUCTIONS VMX instructions are allowed only in VMX root operation. An attempt to execute a VMX instruction in VMX non-root operation causes a VM exit. Processors perform various checks while executing any VMX instruction. They follow well-defined error handling on failures.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS • Enable VMX operation by setting CR4.VMXE = 1. Ensure the resultant CR4 value supports all the CR4 fixed bits reported in the IA32_VMX_CR4_FIXED0 and IA32_VMX_CR4_FIXED1 MSRs. • Ensure that the IA32_FEATURE_CONTROL MSR (MSR index 3AH) has been properly programmed and that its lock bit is set (Bit 0 = 1). This MSR is generally configured by the BIOS using WRMSR. • Execute VMXON with the physical address of the VMXON region as the operand.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS is 1, the logical processor supports the capability MSRs IA32_VMX_TRUE_PINBASED_CTLS, IA32_VMX_TRUE_PROCBASED_CTLS, IA32_VMX_TRUE_EXIT_CTLS, and IA32_VMX_TRUE_ENTRY_CTLS. These capability MSRs report, respectively, on the allowed settings of all of the pin-based VM-execution controls, the primary processor-based VM-execution controls, the VM-exit controls, and the VM-entry controls.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS IA32_VMX_TRUE_PROCBASED_CTLS, IA32_VMX_TRUE_EXIT_CTLS, and IA32_VMX_TRUE_ENTRY_CTLS. c. Set the VMX controls as follows: i) If the relevant VMX capability MSR reports that a control has a single setting, use that setting. ii) If (1) the relevant VMX capability MSR reports that a control can be set to 0 or 1; and (2) the control’s meaning is known to the VMM; then set the control based on functionality desired.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS ii) If (1) the relevant VMX capability MSR just read reports that a control can be set to 0 or 1; and (2) the control’s meaning is known to the VMM; then set the control based on functionality desired.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS upon subsequent VM exits from the guest. Host-state fields include control registers (CR0, CR3 and CR4), selector fields for the segment registers (CS, SS, DS, ES, FS, GS and TR), and base-address fields (for FS, GS, TR, GDTR and IDTR; RSP, RIP and the MSRs that control fast system calls). Chapter 22 describes the host-state consistency checking done by the processor for VM entries.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS 26.7 HANDLING OF VM EXITS This section provides examples of software steps involved in a VMM’s handling of VMexit conditions: • Determine the exit reason through a VMREAD of the exit-reason field in the working-VMCS. Appendix I describes exit reasons and their encodings. • VMREAD the exit-qualification from the VMCS if the exit-reason field provides a valid qualification.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS VMWRITE) the contents of the VM-exit interruption-information field (which is valid, since the VM exit was caused by an exception) to the VM-entry interruption-information field (which, if valid, will cause the exception to be delivered as part of the next VM entry).
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS A VMM can reflect a double-fault exception to guest software by setting the VM-entry interruption-information and VM-entry exception error-code fields as follows: • Set bits 7:0 (vector) of the VM-entry interruption-information field to 8 (#DF). • Set bits 10:8 (interruption type) of the VM-entry interruption-information field to 3 (hardware exception). • Set bit 11 (deliver error code) of the VM-entry interruption-information field to 1.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS an execution of the IRET instruction that removed virtual-NMI blocking. In particular, it provides this indication if the following are both true: — Bit 31 (valid) in the IDT-vectoring information field is 0. — The value of bits 7:0 (vector) of the VM-exit interruption-information field is not 8 (the VM exit is not due to a double-fault exception).
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS 26.8 MULTI-PROCESSOR CONSIDERATIONS The most common VMM design will be the symmetric VMM. This type of VMM runs the same VMM binary on all logical processors. Like a symmetric operating system, the symmetric VMM is written to ensure all critical data is updated by only one processor at a time, IO devices are accessed sequentially, and so forth. Asymmetric VMM designs are possible.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS 26.8.2 Moving a VMCS Between Processors An MP-aware VMM is free to assign any logical processor to a VM. But for performance considerations, moving a guest VMCS to another logical processor is slower than resuming that guest VMCS on the same logical processor. Certain VMX performance features (such as caching of portions of the VMCS in the processor) are optimized for a guest VMCS that runs on the same logical processor.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS the virtualized state. If the VM is moved during execution, writes to the index should be redone so subsequent data reads/writes go to the right location. 26.8.4 External Data Structures Certain fields in the VMCS point to external data structures (for example: the MSR bitmap, the I/O bitmaps).
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS or not available is referred to as a 32-bit VMM. The type of guest operations such VMMs support are summarized in Table 26-1. Table 26-1. Operating Modes for Host and Guest Environments Capability Guest Operation in IA-32e mode Guest Operation Not Requiring IA-32e Mode IA-32e mode VMM Yes Yes 32-bit VMM Not supported Yes A VM exit may occur to an IA-32e mode guest in either 64-bit sub-mode or compatibility sub-mode of IA-32e mode.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS these fields write 64-bits. When outside of 64-bit mode, reads of these fields return the low 32-bits and writes to these fields write the low 32-bits and zero the upper 32-bits. Should a non-IA-32e mode host require access to the upper 32-bits of these fields, a separate VMCS encoding is used when issuing VMREAD/VMWRITE instructions.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS On a VMM teardown, VMX operation should be exited before deactivating IA-32e mode if the latter is required. 26.9.4 IA-32e Mode Guests A 32-bit guest can be launched by either IA-32e-mode hosts or non-IA-32e-mode hosts. A 64-bit guests can only be launched by a IA-32e-mode host. In addition to the steps outlined in Section 26.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS A VMM should save/restore the extended (full 64-bit) contents of the guest generalpurpose registers, the new general-purpose registers (R8-R15) and the SIMD registers introduced in 64-bit mode should it need to modify these upon VM exit. 26.9.5 32-Bit Guests To launch or resume a 32-bit guest, VMM writers can follow the steps outlined in Section 26.6, making sure that the “IA-32e-mode guest” VM-entry control bit is set to 0.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS to proceed. This level of protection may be utilized by VMMs to selectively allow guest access to some MSRs while virtualizing others. • Default MSR protection: If the use-MSR-bitmap control is not set, an attempt by a guest to access any MSR causes a VM exit. This also occurs for any attempt to access an MSR outside the ranges identified above (even if the use-MSRbitmap control is set).
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS 26.10.4 Handling Special-Case MSRs and Instructions A number of instructions make use of designated MSRs in their operation. The VMM may need to consider saving the states of those MSRs. Instructions that merit such consideration include SYSENTER/SYSEXIT, SYSCALL/SYSRET, SWAPGS. 26.10.4.1 Handling IA32_EFER MSR The IA32_EFER MSR includes bit fields that allow system software to enable processor features.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS uniformly. Further, even if the host intends to support fast system calls during a VM-exit, some of the MSR values (such as the setting of the SCE bit in IA32_EFER) may not require modification as they may already be set to the appropriate value in the guest. For performance reasons, a VMM may perform lazy save, load, and restore of these MSR values on certain VM exits when it is determined that this is acceptable.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS 26.11 HANDLING ACCESSES TO CONTROL REGISTERS Bit fields in control registers (CR0, CR4) control various aspects of processor operation. The VMM must prevent guests from modifying bits in CR0 or CR4 that are reserved at the time the VMM is written. Guest/host masks should be used by the VMM to cause VM exits when a guest attempts to modify reserved bits.
VIRTUAL-MACHINE MONITOR PROGRAMMING CONSIDERATIONS 26-26 Vol.
CHAPTER 27 VIRTUALIZATION OF SYSTEM RESOURCES 27.1 OVERVIEW When a VMM is hosting multiple guest environments (VMs), it must monitor potential interactions between software components using the same system resources. These interactions can require the virtualization of resources. This chapter describes the virtualization of system resources. These include: debugging facilities, address translation, physical memory, and microcode update facilities. 27.
VIRTUALIZATION OF SYSTEM RESOURCES • Debug registers such as DR7 and the IA32_DEBUGCTL MSR may be explicitly modified by the guest (through MOV-DR or WRMSR instructions) or modified implicitly by the processor as part of generating debug exceptions. The current values of DR7 and the IA32_DEBUGCTL MSR are saved to guest-state area of VMCS on every VM exit. Pending debug exceptions are debug exceptions that are recognized by the processor but not yet delivered. See Section 22.6.
VIRTUALIZATION OF SYSTEM RESOURCES 27.3 MEMORY VIRTUALIZATION VMMs must control physical memory to ensure VM isolation and to remap guest physical addresses in host physical address space for virtualization. Memory virtualization allows the VMM to enforce control of physical memory and yet support guest OSs’ expectation to manage memory address translation. 27.3.
VIRTUALIZATION OF SYSTEM RESOURCES present. The VMM may handle these VM exits by invoking appropriate virtual device emulation code. 27.3.3 Virtualizing Virtual Memory by Brute Force VMX provides the hardware features required to fully virtualize guest virtual memory accesses. VMX allows the VMM to trap guest accesses to the PAT (Page Attribute Table) MSR and the MTRR (Memory Type Range Registers). This control allows the VMM to virtualize the specific memory type of a guest memory.
VIRTUALIZATION OF SYSTEM RESOURCES inconsistencies can be solved using techniques analogous to those used by the processor and its TLB. This section describes an alternative approach that allows guest software to freely access page directories and page tables. Traps occur on CR3 accesses and executions of INVLPG. They also occur when necessary to ensure that guest modifications to the translation structures actually take effect.
VIRTUALIZATION OF SYSTEM RESOURCES translation to take effect, guest software should flush any older translations from the TLB either by executing INVLPG or by loading CR3. Because both these operations will cause a trap to the VMM, the VMM will gain control and can remove from the active page-table hierarchy the translations indicated by guest software (the translation of a specific linear address for INVLPG or all translations for a load of CR3).
VIRTUALIZATION OF SYSTEM RESOURCES "Virtual TLB" Active Page-Table Hierarchy Guest Page-Table Hierarchy Guest Active F CR3 TLB refill on TLB miss set dirty accessed PT F PD F F CR3 refill on page fault set accessed and dirty bits PT PT INVLPG MOV to CR3 task switch F PD F PT F F INVLPG MOV to CR3 task switch PD = page directory PT = page table F = page frame OM19040 Figure 27-1.
VIRTUALIZATION OF SYSTEM RESOURCES When guest software first enables paging, the VMM creates an aligned 4-KByte active page directory that is invalid (all entries marked not-present). This invalid directory is analogous to an empty TLB. 27.3.5.2 Response to Page Faults Page faults can occur for a variety of reasons. In some cases, the page fault alerts the VMM to an inconsistency between the active and guest page-table hierarchy.
VIRTUALIZATION OF SYSTEM RESOURCES b. If the active PDE contains a page base address (if PS = 1), then set the page base address in the active PDE to be the physical page base address that corresponds to the guest address in the guest PDE. c. Set the P, U/S, and PS bits in the active PDE to be identical to those in the guest PDE. d. Set the PWT, PCD, and G bits according to the policy of the VMM. e. Set A = 1 in the guest PDE. f.
VIRTUALIZATION OF SYSTEM RESOURCES 8. Consult the active PTE, which can be located using the next 10 bits of the faulting address (bits 21–12) and the physical page-table base address in the active PDE. The active PTE is the source of the fault if it is marked not-present or if its R/W bit and U/S bits are inconsistent with the attempted guest access (the guest privilege level and the value of CR0:WP should also be taken into account). 9.
VIRTUALIZATION OF SYSTEM RESOURCES R/W in the active PTE as in the guest PTE, set D = 1 in the guest PTE and reexecute the faulting instruction. 14. If none of the above cases apply, then raise a page fault of the guest operating system. 27.3.5.3 Response to Uses of INVLPG Operating-systems can use INVLPG to flush entries from the TLB. This instruction takes a linear address as an operand and software expects any cached translations for the address to be flushed.
VIRTUALIZATION OF SYSTEM RESOURCES the BIOS boot process. This is sufficient to boot the BIOS and operating system. As a microcode update more current than the system BIOS may be available, system software should provide another mechanism for invoking the microcode update facility. The implications of the microcode update mechanism on the design of the VMM are described in this section. NOTE Microcode updates must not be performed during VMX non-root operation.
VIRTUALIZATION OF SYSTEM RESOURCES the entire guest memory buffer (which contains the microcode update image) will not cause a page fault when accessed. If the VMM loads the microcode update, then the VMM must have access to the current set of microcode updates. These updates could be part of the VMM image or could be contained in a separate microcode update image database (for example: a database file on disk or in memory).
VIRTUALIZATION OF SYSTEM RESOURCES 27-14 Vol.
CHAPTER 28 HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR 28.1 OVERVIEW This chapter describes what a VMM must consider when handling exceptions, interrupts, error conditions, and transitions between activity states. 28.2 INTERRUPT HANDLING IN VMX OPERATION The following bullets summarize VMX support for handling interrupts: • Control of Processor Exceptions. The VMM can get control on specific guest exceptions through the exception-bitmap in the guest controlling-VMCS.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR • Control of Other Events. There is a pin-based VM-execution control that controls system behavior (exit or no-exit) for NMI events. Most VMM usages will need handling of NMI external events in the VMM and hence will specify host control of these events. Some processors also support a pin-based VM-execution control called “virtual NMIs.” When this control is set, NMIs cause VM exits, but the processor tracks guest readiness for virtual NMIs.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR • Interrupt-window Exiting. The interrupt-window exiting control bit in the VM-execution controls (Section 20.6.2) causes VM exits when guest RFLAGS.IF is 1 and no other conditions block external interrupts. If the control is 1, a VM exit occurs at the beginning of any instruction at which RFLAGS.IF = 1 and on which the interruptibility state of the guest would allow delivery of an interrupt.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR With host control of external interrupts, the VMM (or the host OS in a hosted VMM model) manages the physical interrupt controllers in the platform and the interrupts generated through them. The VMM exposes software-emulated virtual interrupt controller devices (such as PIC and APIC) to each guest virtual machine instance. 28.3.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR VM D evice D river C G uest ID T D evice D river B G uest Vector P G uest Vector Q G uest ID TR Virtual Interrupt Virtual Interrupt Virtual D evice C Em ulation H ost ID T M onitor H andler Platform Interrupt Vector Y H ost V irtual M achine M onitor (V M M ) H ost H ost ID TR V ector X D evice D river A Platform Interrupt H ardw are D evice A D evice B O M 19041 Figure 28-1.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR Interrupt Controllers (APIC), and Message Signaled Interrupts (MSI). The following sections provide information on the virtualization of each of these mechanisms. 28.3.2.1 PIC Virtualization Typical PIC-enabled platform implementations support dual 8259 interrupt controllers cascaded as master and slave controllers. They supporting up to 15 possible interrupt inputs.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR page-table virtualization) to trap guest accesses to the page frame hosting the virtual local APIC registers.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR need to emulate various redirection table entry settings such as delivery mode, destination mode, delivery status, polarity, masking, and trigger mode programmed by the guest and track remote-IRR state on guest EOI writes to various virtual local APICs. 28.3.2.5 Virtualization of Message Signaled Interrupts The PCI Local Bus Specification (Rev. 2.2) introduces the concept of message signaled interrupts (MSI).
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR 28.3.3.2 Processor Treatment of External Interrupt Interrupts are automatically masked by hardware in the processor on VM exit by clearing RFLAGS.IF. The exit-reason field in VMCS is set to 1 to indicate an external interrupt as the exit reason.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR Once the physical interrupt source is masked and the platform EOI generated, the VMM can map the host vector to its corresponding guest vector to inject the virtual interrupt into the assigned VM. The guest software does EOI write sequences to its virtual interrupt controller after completing interrupt processing.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR 5. Update the virtual interrupt controller state. When the above checks have passed, before generating the virtual interrupt to the guest, the VMM updates the virtual interrupt controller state (Local-APIC, IO-APIC and/or PIC) to reflect assertion of the virtual interrupt. This involves updating the various interrupt capture registers, and priority registers as done by the respective hardware interrupt controllers.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR 28.4.2 Machine Check Considerations The following sequence determine how machine checks are handled during VMXON, VMXOFF, VM entries, and VM exits: • VMXOFF and VMXON: If a machine check occurs during VMXOFF or VMXON and CR4.MCE = 1, a machine-check exception (#MC) is generated. If CR4.MCE = 0, the processor goes to shutdown state. • VM entry: If a machine check occurs during VM entry, one of the following two treatments must occur: a.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR NOTE The state saved in the guest-state area on VM exits due to machinecheck exceptions should be considered suspect. A VMM should consult the RIPV and EIPV bits in the IA32_MCG_STATUS MSR before resuming a guest that caused a VM exit due to a machine-check exception. 28.
HANDLING BOUNDARY CONDITIONS IN A VIRTUAL MACHINE MONITOR 28-14 Vol.
APPENDIX A PERFORMANCE-MONITORING EVENTS This appendix lists the performance-monitoring events that can be monitored with the Intel 64 or IA-32 processors. The ability to monitor performance events and the events that can be monitored in these processors are mostly model-specific, except for architectural performance events, described in Section A.1. Non-architectural performance events (i.e. model-specific events) are listed for each generation of microarchitecture: • • • • • • • Section A.
PERFORMANCE-MONITORING EVENTS Table A-1. Architectural Performance Events Event Num.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 05H 01H MISALIGN_MEM_REF. Counts the number of misaligned LOAD load references 05H 02H MISALIGN_MEM_REF. Counts the number of misaligned STORE store references 05H 03H MISALIGN_MEM_REF. Counts the number of misaligned ANY memory references 06H 01H STORE_BLOCKS.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 08H 10H DTLB_LOAD_MISSES. Number of cache load STLB hits STLB_HIT 08H 20H DTLB_LOAD_MISSES. Number of DTLB cache load misses PDE_MISS where the low part of the linear to physical address translation was missed. 08H 40H DTLB_LOAD_MISSES.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 0EH 01H UOPS_ISSUED.ANY 0EH 01H UOPS_ISSUED.STALL Counts the number of cycles no set “invert=1, ED_CYCLES Uops issued by the Register cmask = 1“ Allocation Table to the Reservation Station, i.e. the UOPs issued from the front end to the back end. 0EH 02H UOPS_ISSUED.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 0FH 20H MEM_UNCORE_RETI RED.LOCAL_DRAM Counts number of memory load instructions retired where the memory reference missed the L1, L2 and LLC caches and required a local socket memory reference. This includes locally homed cachelines that were in a modified state in another socket. 10H 01H FP_COMP_OPS_EXE.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 12H 04H SIMD_INT_128.PACK Counts number of 128 bit SIMD integer pack operations. 12H 08H SIMD_INT_128.UNPA Counts number of 128 bit SIMD CK integer unpack operations. 12H 10H SIMD_INT_128.PACK Counts number of 128 bit SIMD ED_LOGICAL integer logical operations. 12H 20H SIMD_INT_128.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 14H 02H ARITH.MUL 17H 01H INST_QUEUE_WRITE Counts the number of instructions S written into the instruction queue every cycle. 18H 01H INST_DECODED.DEC0 Counts number of instructions that require decoder 0 to be decoded.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 24H 01H L2_RQSTS.LD_HIT Counts number of loads that hit the L2 cache. L2 loads include both L1D demand misses as well as L1D prefetches. L2 loads can be rejected for various reasons. Only non rejected loads are counted. 24H 02H L2_RQSTS.LD_MISS Counts the number of loads that miss the L2 cache.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 24H 30H L2_RQSTS.IFETCHES Counts all instruction fetches. L2 instruction fetches include both L1I demand misses as well as L1I instruction prefetches. 24H 40H L2_RQSTS.PREFETC H_HIT Counts L2 prefetch hits for both code and data. 24H 80H L2_RQSTS.PREFETC H_MISS Counts L2 prefetch misses for both code and data.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 26H 10H L2_DATA_RQSTS.PR EFETCH.I_STATE Counts number of L2 prefetch data loads where the cache line to be loaded is in the I (invalid) state, i.e. a cache miss. 26H 20H L2_DATA_RQSTS.PR EFETCH.S_STATE Counts number of L2 prefetch data loads where the cache line to be loaded is in the S (shared) state.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 27H 08H L2_WRITE.RFO.M_ST Counts number of L2 store RFO This is a demand ATE requests where the cache line to be RFO request loaded is in the M (modified) state. The L1D prefetcher does not issue a RFO prefetch. 27H 0EH L2_WRITE.RFO.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 28H 02H L1D_WB_L2.S_STAT E Counts number of L1 writebacks to the L2 where the cache line to be written is in the S state. 28H 04H L1D_WB_L2.E_STAT E Counts number of L1 writebacks to the L2 where the cache line to be written is in the E (exclusive) state. 28H 08H L1D_WB_L2.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 3CH 01H CPU_CLK_UNHALTED Increments at the frequency of TSC see Table A-1 .REF_P when not halted. 3DH 01H UOPS_DECODED.DEC Counts micro-ops decoded by 0 decoder 0. 40H 01H L1D_CACHE_LD.I_ST ATE 40H 02H L1D_CACHE_LD.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 42H 02H L1D_CACHE_LOCK.S_ Counts L1 data cache retired load STATE locks that hit the target cache line in the shared state. Counter 0, 1 only 42H 04H L1D_CACHE_LOCK.E_ Counts L1 data cache retired load STATE locks that hit the target cache line in the exclusive state. Counter 0, 1 only 42H 08H L1D_CACHE_LOCK.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 49H 40H DTLB_MISSES.PDP_M Number of DTLB misses where the ISS high part of the linear to physical address translation was missed. 49H 80H DTLB_MISSES.LARGE Counts number of completed large _WALK_COMPLETED page walks due to misses in the STLB. 4BH 01H SSE_MEM_EXEC.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 4EH 04H L1D_PREFETCH.TRIG Counts number of prefetch GERS requests triggered by the Finite State Machine and pushed into the prefetch FIFO. Some of the prefetch requests are dropped due to overwrites or competition between the IP index prefetcher and streamer prefetcher. The prefetch FIFO contains 4 entries. 4FH 02H EPT.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic Description Comment 60H 01H OFFCORE_REQUEST S_OUTSTANDING.DE MAND.READ_DATA Counts weighted cycles of offcore demand data read requests. Does not include L2 prefetch requests. counter 0 60H 02H OFFCORE_REQUEST S_OUTSTANDING.DE MAND.READ_CODE Counts weighted cycles of offcore demand code read requests.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 80H 04H L1I.CYCLES_STALLED Cycle counts for which an instruction fetch stalls due to a L1I cache miss, ITLB miss or ITLB fault. 81H 01H IFU_IVC.FULL Instruction Fetche unit victim cache full. 81H 02H IFU_IVC.L1I_EVICTIO N L1 Instruction cache evictions. 82H 01H LARGE_ITLB.HIT Counts number of large ITLB hits.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 87H 02H ILD_STALL.MRU Instruction Length Decoder stall cycles due to Brand Prediction Unit (PBU) Most Recently Used (MRU) bypass. 87H 04H ILD_STALL.IQ_FULL Stall cycles due to a full instruction queue. 87H 08H ILD_STALL.REGEN Counts the number of regen stalls. 87H 0FH ILD_STALL.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 88H 7FH BR_INST_EXEC.ANY Counts all near executed branches (not necessarily retired). This includes only instructions and not micro-op branches. Frequent branching is not necessarily a major performance issue. However frequent branch mispredictions may be a problem. 89H 01H BR_MISP_EXEC.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 89H 7FH BR_MISP_EXEC.ANY Counts the number of mispredicted near branch instructions that were executed, but not necessarily retired. A2H 01H RESOURCE_STALLS. ANY Counts the number of Allocator resource related stalls. Includes register renaming buffer entries, memory buffer entries.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic A2H 20H RESOURCE_STALLS.F Counts the number of cycles while PCW execution was stalled due to writing the floating-point unit (FPU) control word. A2H 40H RESOURCE_STALLS. MXCSR Stalls due to the MXCSR register rename occurring to close to a previous MXCSR rename. The MXCSR provides control and status for the MMX registers.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic Description B0H 02H OFFCORE_REQUEST S.DEMAND.READ_CO DE Counts number of offcore demand code read requests. Does not count L2 prefetch requests. B0H 04H OFFCORE_REQUEST S.DEMAND.RFO Counts number of offcore demand RFO requests. Does not count L2 prefetch requests. B0H 08H OFFCORE_REQUEST S.ANY.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic B1H 10H UOPS_EXECUTED.PO Counts number of Uops executed RT4_CORE that where issued on port 4. Port 4 handles the value to be stored for the store Uops issued on port 3. This is a core count only and can not be collected per thread. B1H 20H UOPS_EXECUTED.PO Counts number of Uops executed RT5 that where issued on port 5.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic BAH 01H PIC_ACCESSES.TPR_ READS Counts number of TPR reads BAH 02H PIC_ACCESSES.TPR_ WRITES Counts number of TPR writes C0H 01H INST_RETIRED.ANY_ See Table A-1 P Notes: INST_RETIRED.ANY is counted by a designated fixed counter. INST_RETIRED.ANY_P is counted by a programmable counter and is an architectural performance event.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic C3H 04H MACHINE_CLEARS.S MC Counts the number of times that a program writes to a code section. Self-modifying code causes a sever penalty in all Intel 64 and IA-32 processors. The modified cache line is written back to the L2 and L3caches. C3H 10H MACHINE_CLEARS.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic CBH 01H MEM_LOAD_RETIRED Counts number of retired loads that .L1D_HIT hit the L1 data cache. CBH 02H MEM_LOAD_RETIRED Counts number of retired loads that .L2_HIT hit the L2 data cache. CBH 04H MEM_LOAD_RETIRED Counts number of retired loads that .LLC_UNSHARED_HIT hit their own, unshared lines in the LLC cache.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic CCH 02H FP_MMX_TRANS.TO _MMX Counts the first MMX instruction following a floating-point instruction. You can use this event to estimate the penalties for the transitions between floating-point and MMX technology states. CCH 03H FP_MMX_TRANS.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic D2H 01H RAT_STALLS.FLAGS Counts the number of cycles during which execution stalled due to several reasons, one of which is a partial flag register stall.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic D2H 0FH RAT_STALLS.ANY D4H 01H SEG_RENAME_STALL Counts the number of stall cycles S due to the lack of renaming resources for the ES, DS, FS, and GS segment registers.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic E6H 01H BACLEAR.CLEAR E6H 02H BACLEAR.BAD_TARG Counts number of Branch Address ET Calculator clears (BACLEAR) asserted due to conditional branch instructions in which there was a target hit but the direction was wrong. Each BACLEAR asserted by the BAC generates approximately an 8 cycle bubble in the instruction fetch pipeline.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic F0H 04H L2_TRANSACTIONS.I Counts L2 instruction fetch FETCH operations due to HW prefetch or demand ifetch. F0H 08H L2_TRANSACTIONS. PREFETCH F0H 10H L2_TRANSACTIONS.L Counts L1D writeback operations to 1D_WB the L2. F0H 20H L2_TRANSACTIONS.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic F3H 04H L2_HW_PREFETCH.D Count L2 HW data prefetcher ATA_TRIGGER triggered F3H 08H L2_HW_PREFETCH.C Count L2 HW code prefetcher ODE_TRIGGER triggered F3H 10H L2_HW_PREFETCH.D Count L2 HW DCA prefetcher CA_TRIGGER triggered F3H 20H L2_HW_PREFETCH.K Count L2 HW prefetcher kick ICK_START started F4H 01H SQ_MISC.
PERFORMANCE-MONITORING EVENTS Table A-2. Non-Architectural Performance Events In the Processor Core for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic F7H 04H FP_ASSIST.INPUT Counts number of floating point micro-code assist when the input value (one of the source operands to an FP instruction) is invalid. F8H 01H SEGMENT_REG_LOA DS Counts number of segment register loads FDH 01H SIMD_INT_64.PACKE D_MPY Counts number of SID integer 64 bit packed multiply operations.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 00H 04H UNC_GQ_CYCLES_FU Uncore cycles Global Queue peer LL.PEER_PROBE_TR probe tracker is full. The peer probe ACKER tracker queue tracks snoops from the IOH and remote sockets. 01H 01H UNC_GQ_CYCLES_NO Uncore cycles were Global Queue T_EMPTY.READ_TRA read tracker has at least one valid CKER entry.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 03H 04H UNC_GQ_ALLOC.RT_ TO_LLC_RESP Counts the number of GQ read tracker entries that are allocated in the read tracker queue that hit or miss the LLC. The GQ read tracker LLC hit occupancy count is divided by this count to obtain the average LLC hit latency. 03H 08H UNC_GQ_ALLOC.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 03H 40H UNC_GQ_ALLOC.PEE R_PROBE_TRACKER 04H 01H UNC_GQ_DATA.FROM Cycles Global Queue Quickpath _QPI Interface input data port is busy importing data from the Quickpath Interface. Each cycle the input port can transfer 8 or 16 bytes of data. 04H 02H UNC_GQ_DATA.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 05H 02H UNC_GQ_DATA.TO_L LC 05H 04H UNC_GQ_DATA.TO_C Cycles GQ Core output data port is ORES busy sending data to the Cores. Each cycle the output port can transfer 32 bytes of data. 06H 01H UNC_SNP_RESP_TO_ Number of snoop responses to the LOCAL_HOME.I_STAT local home that LLC does not have E the referenced cache line.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 07H 02H UNC_SNP_RESP_TO_ Number of snoop responses to a REMOTE_HOME.S_ST remote home that LLC has the ATE referenced line cached in the S state. 07H 04H UNC_SNP_RESP_TO_ Number of responses to code or REMOTE_HOME.FWD data read snoops to a remote home _S_STATE that the LLC has the referenced cache line in the E state.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 09H 01H UNC_LLC_MISS.READ Number of code read, data read and RFO requests that miss the LLC. 09H 02H UNC_LLC_MISS.WRIT Number of writeback requests that E miss the LLC. Should always be zero as writebacks from the cores will always result in LLC hits due to the inclusive property of the LLC. 09H 04H UNC_LLC_MISS.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic Description 0BH 10H UNC_LLC_LINES_OU T.F_STATE Counts the number of LLC lines victimized that were in the F state. 0BH 1FH UNC_LLC_LINES_OU T.ANY Counts the number of LLC lines victimized in any state. 20H 10H UNC_QHL_REQUEST S.LOCAL_READS Counts number of Quickpath Home Logic read requests from the local socket.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 25H 01H UNC_QHL_CONFLICT Counts cycles the Quickpath Home _CYCLES.IOH Logic IOH Tracker contains two or more requests with an address conflict. A max of 3 requests can be in conflict. 25H 02H UNC_QHL_CONFLICT Counts cycles the Quickpath Home _CYCLES.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 27H 10H UNC_QMC_NORMAL_ Counts cycles all the entries in the FULL.WRITE.CH1 DRAM channel 1 medium or low priority queue are occupied with write requests. 27H 20H UNC_QMC_NORMAL_ Uncore cycles all the entries in the FULL.WRITE.CH2 DRAM channel 2 medium or low priority queue are occupied with write requests.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 29H 02H UNC_QMC_BUSY.REA Counts cycles where Quickpath D.CH1 Memory Controller has at least 1 outstanding read request to DRAM channel 1. 29H 04H UNC_QMC_BUSY.REA Counts cycles where Quickpath D.CH2 Memory Controller has at least 1 outstanding read request to DRAM channel 2. 29H 08H UNC_QMC_BUSY.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 2CH 04H UNC_QMC_NORMAL_ Counts the number of Quickpath READS.CH2 Memory Controller channel 2 medium and low priority read requests. The QMC channel 2 normal read occupancy divided by this count provides the average QMC channel 2 read latency. 2CH 07H UNC_QMC_NORMAL_ Counts the number of Quickpath READS.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 2FH 01H UNC_QMC_WRITES.F Counts number of full cache line ULL.CH0 writes to DRAM channel 0. 2FH 02H UNC_QMC_WRITES.F Counts number of full cache line ULL.CH1 writes to DRAM channel 1. 2FH 04H UNC_QMC_WRITES.F Counts number of full cache line ULL.CH2 writes to DRAM channel 2. 2FH 07H UNC_QMC_WRITES.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 31H 02H UNC_QMC_PRIORITY Counts number of DRAM channel 1 _UPDATES.CH1 priority updates. A priority update occurs when an ISOC high or critical request is received by the QHL and there is a matching request with normal priority that has already been issued to the QMC.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 40H 01H UNC_QPI_TX_STALL ED_SINGLE_FLIT.HO ME.LINK_0 40H 02H UNC_QPI_TX_STALL Counts cycles the Quickpath ED_SINGLE_FLIT.SNO outbound link 0 SNOOP virtual OP.LINK_0 channel is stalled due to lack of a VNA and VN0 credit.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 40H 20H UNC_QPI_TX_STALL Counts cycles the Quickpath ED_SINGLE_FLIT.NDR outbound link 1 non-data response .LINK_1 virtual channel is stalled due to lack of a VNA and VN0 credit. Note that this event does not filter out when a flit would not have been selected for arbitration because another virtual channel is getting arbitrated.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 41H 04H UNC_QPI_TX_STALL ED_MULTI_FLIT.NCS. LINK_0 Counts cycles the Quickpath outbound link 0 Non-Coherent Standard virtual channel is stalled due to lack of VNA and VN0 credits.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 41H 07H UNC_QPI_TX_STALL ED_MULTI_FLIT.LINK _0 Counts cycles the Quickpath outbound link 0 virtual channels are stalled due to lack of VNA and VN0 credits. Note that this event does not filter out when a flit would not have been selected for arbitration because another virtual channel is getting arbitrated.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 60H 02H UNC_DRAM_OPEN.C H1 Counts number of DRAM Channel 1 open commands issued either for read or write. To read or write data, the referenced DRAM page must first be opened. 60H 04H UNC_DRAM_OPEN.C H2 Counts number of DRAM Channel 2 open commands issued either for read or write.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 62H 02H UNC_DRAM_PAGE_M Counts the number of precharges ISS.CH1 (PRE) that were issued to DRAM channel 1 because there was a page miss. A page miss refers to a situation in which a page is currently open and another page from the same bank needs to be opened. The new page experiences a page miss.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 63H 20H UNC_DRAM_READ_C Counts the number of times a read AS.AUTOPRE_CH2 CAS command was issued on DRAM channel 2 where the command issued used the auto-precharge (auto page close) mode. 64H 01H UNC_DRAM_WRITE_ CAS.CH0 Counts the number of times a write CAS command was issued on DRAM channel 0. 64H 02H UNC_DRAM_WRITE_ CAS.
PERFORMANCE-MONITORING EVENTS Table A-3. Non-Architectural Performance Events In the Processor Uncore for Intel Core i7 Processors Event Num. Umask Value Event Mask Mnemonic 65H 02H UNC_DRAM_REFRES Counts number of DRAM channel 1 H.CH1 refresh commands. DRAM loses data content over time. In order to keep correct data content, the data values have to be refreshed periodically. 65H 04H UNC_DRAM_REFRES Counts number of DRAM channel 2 H.CH2 refresh commands. DRAM loses data content over time.
PERFORMANCE-MONITORING EVENTS AND INTEL® CORE™2 EXTREME PROCESSORS QX 9000 SERIES Processors based on the Enhanced Intel Core microarchitecture support the architectural and non-architectural performance-monitoring events listed in Table A-1 and Table A-6. In addition, they also support the following non-architectural performance-monitoring events listed in Table A-4. Table A-4. Non-Architectural Performance Events for Processors based on Enhanced Intel Core Microarchitecture Event Num.
PERFORMANCE-MONITORING EVENTS Table A-5. Fixed-Function Performance Counter and Pre-defined Performance Events Fixed-Function Performance Counter Address Event Mask Mnemonic MSR_PERF_FIXED_ CTR0 309H Instr_Retired.Any MSR_PERF_FIXED_ CTR1 30AH CPU_CLK_UNHALT This event counts the number of core ED.CORE cycles while the core is not in a halt state. The core enters the halt state when it is running the HLT instruction. This event is a component in many key event ratios.
PERFORMANCE-MONITORING EVENTS behavior where applicable. Software must use a general-purpose performance counter to count events listed in Table A-6. Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture Event Num Umask Value Event Name Definition 03H 02H LOAD_BLOCK.STA Loads blocked by a preceding store with unknown address Description and Comment This event indicates that loads are blocked by preceding stores.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment • The load’s data size is one or two bytes and it is not aligned to the store. • The load’s data size is of four or eight bytes and the load is misaligned. • The load is from bytes written by the preceding store, the store is misaligned and the load is not aligned on the beginning of the store.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment 03H 20H LOAD_BLOCK.L1D Loads blocked by the L1 data cache This event indicates that loads are blocked due to one or more reasons. Some triggers for this event are: • The number of L1 data cache misses exceeds the maximum number of outstanding misses supported by the processor.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment 04H 08H STORE_BLOCK. SNOOP A store is blocked due to a conflict with an external or internal snoop. This event counts the number of cycles the store port was used for snooping the L1 data cache and a store was stalled by the snoop. The store is typically resubmitted one cycle later.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment 07H 02H SSE_PRE_EXEC.L2 Streaming SIMD Extensions (SSE) PrefetchT1 and PrefetchT2 instructions executed 07H 03H SSE_PRE_ EXEC.STORES Streaming SIMD This event counts the number of times Extensions SSE non-temporal store instructions are (SSE) Weaklyexecuted.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 08H 08H Description and Comment Event Name Definition DTLB_MISSES. MISS_ST TLB misses due This event counts the number of Data to store Table Lookaside Buffer (DTLB) misses due operations to store operations. This count includes misses detected as a result of speculative accesses.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 0CH 02H Event Name Definition PAGE_WALKS. CYCLES Duration of page-walks in core cycles Description and Comment This event counts the duration of pagewalks in core cycles. The paging mode in use typically affects the duration of page walks. Page walk duration divided by number of page walks is the average duration of page-walks.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 14H 00H Event Name Definition CYCLES_DIV _BUSY Cycles the divider busy Description and Comment This event counts the number of cycles the divider is busy executing divide or square root operations. The divide can be integer, X87 or Streaming SIMD Extensions (SSE). The square root operation can be either X87 or SSE. Use IA32_PMC0 only.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 19H 02H Description and Comment Event Name Definition DELAYED_ BYPASS.LOAD Delayed bypass This event counts the number of delayed to load bypass penalty cycles that a load operation operation incurred.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 26H 27H 28H 29H Event Name Definition Description and Comment See Table 18-11 and Table 18-13 L2_LINES_OUT. (Core, Prefetch) L2 cache lines evicted This event counts the number of L2 cache lines evicted. See Table 18-11 and Table 18-13 L2_M_LINES_OUT.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 2AH See Table 18-11 and Table 18-14 L2_ST.(Core, Cache L2 store Line State) requests See Table 18-11 and Table 18-14 L2_LOCK.(Core, Cache Line State) See Table 18-11, Table 18-13, and Table 18-14 L2_RQSTS.(Core, Prefetch, Cache Line State) 2EH 41H L2_RQSTS.SELF. DEMAND.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 30H See Table 18-11, Table 18-13, and Table 18-14 Event Name Definition Description and Comment L2_REJECT_BUSQ.( Rejected L2 This event indicates that a pending L2 Core, Prefetch, cache requests cache request that requires a bus Cache Line State) transaction is delayed from moving to the bus queue.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition 3BH C0H THERMAL_TRIP Number of thermal trips Description and Comment This event counts the number of thermal trips. A thermal trip occurs whenever the processor temperature exceeds the thermal trip threshold temperature. Following a thermal trip, the processor automatically reduces frequency and voltage.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 3CH 02H Event Name Definition Description and Comment CPU_CLK_ UNHALTED.NO _OTHER Bus cycles when core is active and the other is halted This event counts the number of bus cycles during which the core remains nonhalted and the other core on the processor is halted.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 43H 02H Event Name Definition L1D_ALL_ CACHE_REF L1 Data cacheable reads and writes Description and Comment This event counts the number of data reads and writes from cacheable memory, including locked operations. This event is a sum of: • L1D_CACHE_LD.MESI • L1D_CACHE_ST.MESI • L1D_CACHE_LOCK.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 49H 4BH 4BH 4BH Description and Comment Event Name Definition 02H L1D_SPLIT. STORES Cache line split stores to the L1 data cache This event counts the number of store operations that span two cache lines. 00H SSE_PRE_ MISS.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 60H See Table 18-11 and Table 18-12 Event Name Definition BUS_REQUEST_ OUTSTANDING. (Core and Bus Agents) Outstanding cacheable data read bus requests duration Description and Comment This event counts the number of pending full cache line read transactions on the bus occurring in each cycle.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment 62H See Table 18-12 BUS_DRDY_ CLOCKS.(Bus Agents) Bus cycles This event counts the number of bus when data is cycles during which the DRDY (Data sent on the bus Ready) signal is asserted on the bus. The DRDY signal is asserted when data is sent on the bus.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 66H See Table 18-11 and Table 18-12. BUS_TRANS_RFO.( RFO bus Core and Bus transactions Agents) This event counts the number of Read For Ownership (RFO) bus transactions, due to store operations that miss the L1 data cache and the L2 cache. It also counts RFO bus transactions due to locked operations. 67H See Table 18-11 and Table 18-12.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 6CH Description and Comment Event Name Definition See Table 18-11 and Table 18-12 BUS_TRANS_IO.(C ore and Bus Agents) IO bus transactions This event counts the number of completed I/O bus transactions as a result of IN and OUT instructions. The count does not include memory mapped IO. 6DH See Table 18-11 and Table 18-12 BUS_TRANS_ DEF.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 77H See Table 18-11 and Table 18-15 EXT_SNOOP. External (Bus Agents, Snoop snoops Response) See Table 18-11 and Table 18-16 CMP_SNOOP.(Core, L1 data cache Snoop Type) snooped by other core 78H Event Name Definition Description and Comment This event counts the snoop responses to bus transactions.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 7BH 7DH Description and Comment Event Name Definition See Table 18-12 BUS_HITM_DRV. HITM signal asserted This event counts the number of bus cycles during which the processor drives the HITM# pin to signal HITM snoop response. See Table 18-11 BUSQ_EMPTY.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition 82H 02H ITLB.SMALL_MISS ITLB small page This event counts the number of misses instruction fetches from small pages that miss the ITLB. 82H 10H ITLB.LARGE_MISS ITLB large page This event counts the number of misses instruction fetches from large pages that miss the ITLB. 82H 40H ITLB.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment 8DH 00H BR_IND_EXEC Indirect branch instructions executed This event counts the number of indirect branch instructions that were executed.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value 98H 00H Description and Comment Event Name Definition BR_TKN_ BUBBLE_2 Branch The events BR_TKN_BUBBLE_1 and predicted taken BR_TKN_BUBBLE_2 together count the with bubble 2 number of times a taken branch prediction incurred a one-cycle penalty. The penalty incurs when: • Too many taken branches are placed together.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value A1H 10H RS_UOPS_ Cycles microDISPATCHED.PORT ops dispatched 4 for execution on port 4 This event counts the number of cycles for which micro-ops dispatched for execution. Each cycle, at most one micro-op can be dispatched on the port. Use IA32_PMC0 only. A1H 20H RS_UOPS_ Cycles microDISPATCHED.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition ABH 02H ESP.ADDITIONS ESP register automatic additions Description and Comment This event counts the number of ESP additions performed automatically by the decoder. A high count of this event is good, since each automatic addition performed by the decoder saves a micro-op from the execution units.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value C0H 00H Event Name Definition INST_RETIRED. ANY_P Instructions retired Description and Comment This event counts the number of instructions that retire execution. For instructions that consist of multiple microops, this event counts the retirement of the last micro-op of the instruction.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment This event does not count: • floating-point computational operations that cause traps or assists. • floating-point loads and stores.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value C2H 04H Event Name Definition UOPS_RETIRED. MACRO_FUSION Retired instruction pairs fused into one micro-op Description and Comment This event counts the number of times CMP or TEST instructions were fused with a conditional branch instruction into one micro-op. It counts fusion by retired microops only.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment Some instructions are decoded into longer sequences such as repeat instructions, floating point transcendental instructions, and assists. In some cases micro-op sequences are fused or whole instructions are fused into one micro-op.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value C4H 02H BR_INST_RETIRED. Retired branch MISPRED_NOT_ instructions TAKEN that were mispredicted not-taken C4H 04H BR_INST_RETIRED. Retired branch This event counts the number of branch PRED_TAKEN instructions instructions retired that were correctly that were predicted to be taken. predicted taken C4H 08H BR_INST_RETIRED.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value C7H Description and Comment Event Name Definition 04H SIMD_INST_ RETIRED.PACKED_ DOUBLE Retired SSE2 packed-double instructions This event counts the number of SSE2 packed-double instructions retired. C7H 08H SIMD_INST_ RETIRED.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value CAH 01H Event Name Definition SIMD_COMP_ INST_RETIRED. PACKED_SINGLE Retired computational SSE packedsingle instructions Description and Comment This event counts the number of computational SSE packed-single instructions retired. Computational instructions perform arithmetic computations (for example: add, multiply and divide).
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value CBH 01H Event Name Definition MEM_LOAD_ RETIRED.L1D _MISS Retired loads that miss the L1 data cache (precise event) Description and Comment This event counts the number of retired load operations that missed the L1 data cache.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value CBH 04H Event Name Definition MEM_LOAD_ RETIRED.L2_MISS Retired loads that miss the L2 cache (precise event) Description and Comment This event counts the number of retired load operations that missed the L2 cache. This event counts loads from cacheable memory only. It does not count loads by software prefetches.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value CBH 10H Event Name Definition Description and Comment MEM_LOAD_ RETIRED.DTLB_ MISS Retired loads that miss the DTLB (precise event) This event counts the number of retired loads that missed the DTLB. The DTLB miss is not counted if the load operation causes a fault. This event counts loads from cacheable memory only.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value D2H 01H Event Name Definition RAT_STALLS. ROB_READ_PORT ROB read port stalls cycles Description and Comment This event counts the number of cycles when ROB read port stalls occurred, which did not allow new micro-ops to enter the out-of-order pipeline.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value D4H Description and Comment Event Name Definition 01H SEG_RENAME_ STALLS.ES Segment rename stalls ES This event counts the number of stalls due to the lack of renaming resources for the ES segment register.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment D5H 02H SEG_REG_ RENAMES.DS Segment renames - DS This event counts the number of times the DS segment register is renamed. D5H 04H SEG_REG_ RENAMES.FS Segment renames - FS This event counts the number of times the FS segment register is renamed. D5H 08H SEG_REG_ RENAMES.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value DCH 04 Event Name Definition RESOURCE_ STALLS.
PERFORMANCE-MONITORING EVENTS Table A-6. Non-Architectural Performance Events in Processors Based on Intel Core Microarchitecture (Contd.) Event Num Umask Value Event Name Definition Description and Comment E4H 00H BOGUS_BR Bogus branches This event counts the number of byte sequences that were mistakenly detected as taken branch instructions. This results in a BACLEAR event. This occurs mainly after task switches.
PERFORMANCE-MONITORING EVENTS A.5 PERFORMANCE MONITORING EVENTS FOR INTEL® ATOM™ PROCESSORS Processors based on the Intel Atom microarchitecture support the architectural and non-architectural performance-monitoring events listed in Table A-1 and Table A-6. In addition, they also support the following non-architectural performance-monitoring events listed in Table A-7. Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 07H 08H PREFETCH.PREF Streaming SIMD ETCHNTA Extensions (SSE) Prefetch NTA instructions executed This event counts the number of times the SSE instruction prefetchNTA is executed. This instruction prefetches the data to the L1 data cache. 08H 07H DATA_TLB_MIS SES.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 0CH 03H Event Name Definition Description and Comment PAGE_WALKS.C YCLES Duration of page-walks in core cycles This event counts the duration of page-walks in core cycles. The paging mode in use typically affects the duration of page walks. Page walk duration divided by number of page walks is the average duration of pagewalks.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition Description and Comment 12H 81H MUL.AR Multiply operations retired This event counts the number of multiply operations retired. This includes integer as well as floating point multiply operations. 13H 01H DIV.S Divide operations executed This event counts the number of divide operations executed.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 25H See Table 18-11 Event Name Definition L2_M_LINES_IN L2 cache line modifications Description and Comment This event counts whenever a modified cache line is written back from the L1 data cache to the L2 cache. This event can count occurrences for this core or both cores.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 29H See Table 18-11, Table 18-13 and Table 18-14 Event Name Definition Description and Comment L2_LD L2 cache reads This event counts L2 cache read requests coming from the L1 data cache and L2 prefetchers. This event can count occurrences for this core or both cores. This event can count occurrences - for this core or both cores.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 2EH 41H Event Name Definition L2_RQSTS.SELF. L2 cache DEMAND.I_STAT demand E requests from this core that missed the L2 Description and Comment This event counts all completed L2 cache demand requests from this core that miss the L2 cache. This includes L1 data cache reads, writes, and locked accesses, L1 data prefetch requests, and instruction fetches.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition Description and Comment EIST transitions are commonly initiated by OS, but can be initiated by HW internally. For example: CxE states are C-states (C1,C2,C3…) which not only place the CPU into a sleep state by turning off the clock and other components, but also lower the voltage (which reduces the leakage power consumption).
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 3CH 01H Event Name Definition CPU_CLK_UNH ALTED.BUS Bus cycles This event counts the number of bus cycles when core is not while the core is not in the halt state. This halted event can give you a measurement of the elapsed time while the core was not in the halt state, by dividing the event count by the bus frequency.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 61H See Table 18-12 Event Name Definition BUS_BNR_DRV Number of Bus This event counts the number of Bus Not Not Ready Ready (BNR) signals that the processor signals asserted asserts on the bus to suspend additional bus requests by other bus agents.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 64H See Table 18-11 BUS_DATA_RCV Bus cycles while This event counts the number of cycles processor during which the processor is busy receiving receives data data.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition 6AH See Table 18-11 and Table 18-12 BUS_TRANS_P WR Partial write bus This event counts partial write bus transaction. transactions.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value 70H See Table 18-11 and Table 18-12 Event Name Definition Description and Comment BUS_TRANS_A NY All bus transactions This event counts all bus transactions.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition Description and Comment 7FH See Table 18-11 BUS_IO_WAIT IO requests waiting in the bus queue This event counts the number of core cycles during which IO requests wait in the bus queue. This event counts IO requests from the core. 80H 03H ICACHE.ACCESS ES Instruction fetches This event counts all instruction fetches, including uncacheable fetches.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value B3H 01H SIMD_UOP_TYP SIMD packed E_EXEC.MUL.S multiply microops executed This event counts the number of SIMD packed multiply micro-ops executed. B3H 81H SIMD_UOP_TYP SIMD packed E_EXEC.MUL.AR multiply microops retired This event counts the number of SIMD packed multiply micro-ops retired. B3H 02H SIMD_UOP_TYP SIMD packed E_EXEC.SHIFT.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition Description and Comment C0H 00H INST_RETIRED. ANY_P Instructions retired (precise event). This event counts the number of instructions that retire execution. For instructions that consist of multiple micro-ops, this event counts the retirement of the last micro-op of the instruction.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value C4H 02H BR_INST_RETIR Retired branch ED.MISPRED_N instructions OT_TAKEN that were mispredicted not-taken C4H 04H BR_INST_RETIR Retired branch This event counts the number of branch ED.PRED_TAKE instructions instructions retired that were correctly N that were predicted to be taken. predicted taken C4H 08H BR_INST_RETIR Retired branch ED.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition Description and Comment To determine the branch misprediction ratio, divide the BR_INST_RETIRED.MISPRED event count by the number of BR_INST_RETIRED.ANY event count. To determine the number of mispredicted branches per instruction, divide the number of mispredicted branches by the INST_RETIRED.ANY event count.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition Description and Comment Using the Profile-Guided Optimization (PGO) features of the Intel® C++ compiler may help reduce branch mispredictions. See the compiler documentation for more information on this feature. To determine the branch misprediction ratio, divide the BR_INST_RETIRED.MISPRED event count by the number of BR_INST_RETIRED.ANY event count.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value C7H 02H SIMD_INST_RET Retired This event counts the number of SSE scalarIRED.SCALAR_SI Streaming SIMD single instructions retired. NGLE Extensions (SSE) scalarsingle instructions C7H 04H SIMD_INST_RET Retired This event counts the number of SSE2 IRED.PACKED_D Streaming SIMD packed-double instructions retired.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value CAH 01H SIMD_COMP_IN Retired ST_RETIRED.PA computational CKED_SINGLE Streaming SIMD Extensions (SSE) packedsingle instructions. This event counts the number of computational SSE packed-single instructions retired. Computational instructions perform arithmetic computations, like add, multiply and divide.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition Description and Comment CBH 04H MEM_LOAD_RE TIRED.DTLB_MI SS Retired loads that miss the DTLB (precise event) This event counts the number of retired loads that missed the DTLB. The DTLB miss is not counted if the load operation causes a fault. CDH 00H SIMD_ASSIST SIMD assists invoked This event counts the number of SIMD assists invoked.
PERFORMANCE-MONITORING EVENTS Table A-7. Non-Architectural Performance Events for Intel Atom Processors Event Num. Umask Value Event Name Definition E4H 01H BOGUS_BR Bogus branches This event counts the number of byte sequences that were mistakenly detected as taken branch instructions. This results in a BACLEAR event and the BTB is flushed. This occurs mainly after task switches E6H 01H BACLEARS.ANY BACLEARS asserted A-124 Vol.
PERFORMANCE-MONITORING EVENTS A.6 PERFORMANCE MONITORING EVENTS FOR INTEL® CORE™ SOLO AND INTEL® CORE™ DUO PROCESSORS Table A-8 lists non-architectural performance events for Intel Core Duo processors. If a non-architectural event requires qualification in core specificity, it is indicated in the comment column. Table A-8 also applies to Intel Core Solo processors; bits in the unit mask corresponding to core-specificity are reserved and should be 00B. Table A-8.
PERFORMANCE-MONITORING EVENTS Table A-8. Non-Architectural Performance Events in Intel Core Solo and Intel Core Duo Processors (Contd.) Event Num. Event Mask Mnemonic Umask Value Description Comment 12H Mul 00H Multiply operations (a speculative count, including FP and integer multiplies). IA32_PMC1 only. 13H Div 00H Divide operations (a speculative count, including FP and integer divisions). IA32_PMC1 only. 14H Cycles_Div_Busy 00H Cycles the divider is busy IA32_PMC0 only.
PERFORMANCE-MONITORING EVENTS Table A-8. Non-Architectural Performance Events in Intel Core Solo and Intel Core Duo Processors (Contd.) Event Num. Event Mask Mnemonic Umask Value Description Comment 2EH L2_Rqsts Requires MESI qualification L2 cache reference requests 30H L2_Reject_Cycles Requires MESI qualification Cycles L2 is busy and rejecting new requests.
PERFORMANCE-MONITORING EVENTS Table A-8. Non-Architectural Performance Events in Intel Core Solo and Intel Core Duo Processors (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS Table A-8. Non-Architectural Performance Events in Intel Core Solo and Intel Core Duo Processors (Contd.) Event Num. Event Mask Mnemonic Umask Value 66H Bus_Trans_RFO See comment. Completed read for ownership (RFO) transactions Requires agent specificity 68H Bus_Trans_Ifetch See comment. Completed instruction fetch transactions Requires core specificity 69H Bus_Trans_Inval See comment. Completed invalidate transactions 6AH Bus_Trans_Pwr See comment.
PERFORMANCE-MONITORING EVENTS Table A-8. Non-Architectural Performance Events in Intel Core Solo and Intel Core Duo Processors (Contd.) Event Num. Event Mask Mnemonic Umask Value 7EH Bus_Snoop_Stall 00H Number of bus cycles while bus snoop is stalled 80H ICache_Reads 00H Number of instruction fetches from ICache, streaming buffers (both cacheable and uncacheable fetches) 81H ICache_Misses 00H Number of instruction fetch misses from ICache, streaming buffers.
PERFORMANCE-MONITORING EVENTS Table A-8. Non-Architectural Performance Events in Intel Core Solo and Intel Core Duo Processors (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS Table A-8. Non-Architectural Performance Events in Intel Core Solo and Intel Core Duo Processors (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS Table A-8. Non-Architectural Performance Events in Intel Core Solo and Intel Core Duo Processors (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS A.7 PENTIUM 4 AND INTEL XEON PROCESSOR PERFORMANCE-MONITORING EVENTS Tables A-9, A-10 and list performance-monitoring events that can be counted or sampled on processors based on Intel NetBurst microarchitecture. Table A-9 lists the non-retirement events, and Table A-10 lists the at-retirement events. Tables A-12, A-13, and A-14 describes three sets of parameters that are available for three of the at-retirement counting events defined in Table A-10.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value ESCR Event Mask Description ESCR[24:9] Bit CCCR Select 0: DD Both logical processors are in deliver mode. 1: DB Logical processor 0 is in deliver mode and logical processor 1 is in build mode.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Event Specific Notes Description If only one logical processor is available from a physical processor package, the event mask should be interpreted as logical processor 1 is halted. Event mask bit 2 was previously known as “DELIVER”, bit 5 was previously known as “BUILD”.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description ESCR[24:9] ESCR Event Mask Bit CCCR Select 0: HIT ITLB hit 1: MISS ITLB miss 2: HIT_UC Uncacheable ITLB hit 03H CCCR[15:13] Event Specific Notes All page references regardless of the page size are looked up as actual 4-KByte pages.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value memory_ complete Description This event counts the completion of a load split, store split, uncacheable (UC) split, or UC load. Specify one or more mask bits to select the operations to be counted.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value store_port_replay Description This event counts replayed events at the store port. Specify one or more mask bits to select the cause of the replay.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description 2: Currently this event causes both over and undercounting by as much as a factor of two due to an erratum. 3: It is possible for a transaction that is started as a prefetch to change the transaction's internal status, making it no longer a prefetch.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description Bits 13 and 14 form a bit field to specify the source agent of the request. Bit 15 affects read operation only. The event is triggered by evaluating the logical expression: (((Request type) OR Bit 5 OR Bit 6) OR (Memory type)) AND (Source agent).
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Event Specific Notes Parameter Value Description 1: If PREFETCH bit is cleared, sectors fetched using prefetch are excluded in the counts. If PREFETCH bit is set, all sectors or chunks read are counted. 2: Specify the edge trigger in CCCR to avoid double counting.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description 4b:For Pentium 4 and Xeon Processors with CPUID Model field encoding less than 2, this event is triggered by evaluating the logical expression [((Request type) or Bit 5 or Bit 6) or (Memory type)] and (Source agent).
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value IOQ_active_ entries Description This event counts the number of entries (clipped at 15) in the IOQ that are active. An allocated entry can be a sector (64 bytes) or a chunks of 8 bytes. The event must be programmed in conjunction with IOQ_allocation.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Event Specific Notes Parameter Value Description 1: Specified desired mask bits in ESCR0 and ESCR1. 2: See the ioq_allocation event for descriptions of the mask bits. 3: Edge triggering should not be used when counting cycles.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description 5b:For Pentium 4 and Xeon Processors starting with CPUID MODEL field encoding less than 2, this event is triggered by evaluating the logical expression [((Request type) or Bit 5 or Bit 6) or (Memory type)] and (Source agent).
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description Asserted two processor clock cycles for partial writes and 4 processor clocks (usually in consecutive bus clocks) for full line writes. 1: DRDY_OWN Count when this processor reads data from the bus - includes loads and some PIC transactions.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description 4: DBSY_OWN Count when some agent reserves the bus for use in the next bus cycle to drive data that this processor will sample. Asserted for two processor clock cycles for full line writes and not at all for partial line writes.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters CCCR Select Event Specific Notes Parameter Value Description 11: MEM_TYPE0 12: MEM_TYPE1 13: MEM_TYPE2 Memory type encodings (bit 11-13) are: 07H CCCR[15:13] 0 – UC 1 – WC 4 – WT 5 – WP 6 – WB 1: Specify edge trigger in CCCR to avoid double counting.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description 6b:This event may undercount for requests of read request of 16-byte operands from WC or UC address. 6c: This event may undercount WC partial requests originated from store operands that are dwords.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description 4: This event can be used to estimate the latency of a transaction from allocation to de-allocation in the BSQ. The latency observed by BSQ_allocation includes the latency of FSB, plus additional overhead.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value ESCR restrictions MSR_FIRM_ESCR0 MSR_FIRM_ESCR1 Counter numbers per ESCR ESCR0: 8, 9 ESCR Event Select 34H ESCR1: 10, 11 ESCR Event Mask CCCR Select Event Specific Notes Description ESCR[31:25] ESCR[24:9] 15: ALL Count assists for SSE/SSE2/SSE3 μops.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description 3: Enabling the DAZ mode prevents SSE/SSE2/SSE3 operations from needing assists in the first situation. Enabling the FTZ mode prevents SSE/SSE2/SSE3 operations from needing assists in the second situation.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Counter numbers per ESCR ESCR0: 8, 9 ESCR Event Select 0CH ESCR1: 10, 11 ESCR Event Mask CCCR Select Description ESCR[31:25] ESCR[24:9] Bit 15: ALL Count all μops operating on packed double-precision operands.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Counter numbers per ESCR ESCR0: 8, 9 ESCR Event Select 0EH ESCR1: 10, 11 ESCR Event Mask CCCR Select Description ESCR[31:25] ESCR[24:9] Bit 15: ALL Count all μops operating on scalar double-precision operands.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value TC_misc Description This event counts miscellaneous events detected by the TC. The counter will count twice for each occurrence.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value ESCR Event Mask CCCR Select Description ESCR[24:9] Bit 0: CISC A TC to MS transfer occurred. 0H CCCR[15:13] uop_queue_ writes This event counts the number of valid uops written to the uop queue. Specify one or more mask bits to select the source type of writes.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters CCCR Select Parameter Value Description 3: RETURN Return branches. 4: INDIRECT Returns, indirect calls, or indirect jumps. 02H CCCR[15:13] Event Specific Notes This event may overcount conditional branches if: • Mispredictions cause the trace cache and delivery engine to build new traces.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value resource_stall Description This event monitors the occurrence or latency of stalls in the Allocator.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Event Specific Notes Description This event is useful for detecting the subset of 64K aliasing cases that are more costly (i.e. 64K aliasing cases involving stores) as long as there are no significant contributions due to write combining buffer full or hitmodified conditions.
PERFORMANCE-MONITORING EVENTS Table A-9. Performance Monitoring Events Supported by Intel NetBurst Microarchitecture for Non-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value snoop Description This event can be configured to count snoop hit modified bus traffic using sub-event mask bits 2, 6 and 7.
PERFORMANCE-MONITORING EVENTS Table A-10. Performance Monitoring Events For Intel NetBurst Microarchitecture for At-Retirement Counting Event Name Event Parameters Parameter Value front_end_event Description This event counts the retirement of tagged μops, which are specified through the front-end tagging mechanism. The event mask specifies bogus or non-bogus μops.
PERFORMANCE-MONITORING EVENTS Table A-10. Performance Monitoring Events For Intel NetBurst Microarchitecture for At-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description ESCR[24:9] ESCR Event Mask Bit CCCR Select 0: NBOGUS0 The marked μops are not bogus. 1: NBOGUS1 The marked μops are not bogus. 2: NBOGUS2 The marked μops are not bogus. 3: NBOGUS3 The marked μops are not bogus. 4: BOGUS0 The marked μops are bogus. 5: BOGUS1 The marked μops are bogus.
PERFORMANCE-MONITORING EVENTS Table A-10. Performance Monitoring Events For Intel NetBurst Microarchitecture for At-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Description ESCR[24:9] ESCR Event Mask Bit CCCR Select 0: NBOGUS The marked μops are not bogus. 1: BOGUS The marked μops are bogus. 05H CCCR[15:13] Event Specific Notes Supports counting tagged μops with additional MSRs.
PERFORMANCE-MONITORING EVENTS Table A-10. Performance Monitoring Events For Intel NetBurst Microarchitecture for At-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value Event Specific Notes Description 1: The event count may vary depending on the microarchitectural states of the processor when the event detection is enabled. 2: The event may count more than once for some instructions with complex uop flows and were interrupted before retirement.
PERFORMANCE-MONITORING EVENTS Table A-10. Performance Monitoring Events For Intel NetBurst Microarchitecture for At-Retirement Counting (Contd.) Event Name Event Parameters ESCR Event Select Parameter Value 02H Description ESCR[31:25] ESCR[24:9] ESCR Event Mask Bit CCCR Select 1: TAGLOADS The μop is a load operation. 2: TAGSTORES The μop is a store operation.
PERFORMANCE-MONITORING EVENTS Table A-10. Performance Monitoring Events For Intel NetBurst Microarchitecture for At-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value mispred_branch_ retired Description This event represents the retirement of mispredicted branch instructions.
PERFORMANCE-MONITORING EVENTS Table A-10. Performance Monitoring Events For Intel NetBurst Microarchitecture for At-Retirement Counting (Contd.) Event Name Event Parameters Parameter Value machine_clear Description This event increments according to the mask bit specified while the entire pipeline of the machine is cleared. Specify one of the mask bit to select the cause.
PERFORMANCE-MONITORING EVENTS Table A-11. Intel NetBurst Microarchitecture Model-Specific Performance Monitoring Events (For Model Encoding 3, 4 or 6) Event Name Event Parameters Parameter Value instr_completed Description This event counts instructions that have completed and retired during a clock cycle.
PERFORMANCE-MONITORING EVENTS Table A-12. List of Metrics Available for Front_end Tagging (For Front_end Event Only) Front-end metric1 MSR_ TC_PRECISE_EVEN T MSR Bit field Additional MSR Event mask value for Front_end_event memory_loads None Set TAGLOADS bit in ESCR corresponding to event Uop_Type. NBOGUS memory_stores None Set TAGSTORES bit in the ESCR corresponding to event Uop_Type. NBOGUS NOTES: 1.
PERFORMANCE-MONITORING EVENTS Table A-13. List of Metrics Available for Execution Tagging (For Execution Event Only) (Contd.) Execution metric Upstream ESCR TagValue in Upstream ESCR Event mask value for execution_event 64_bit_MMX_retired Set ALL bit in event mask, TagUop bit in ESCR of 64_bit_MMX_uop. 1 NBOGUS0 X87_FP_retired Set ALL bit in event mask, TagUop bit in ESCR of x87_FP_uop.
PERFORMANCE-MONITORING EVENTS Table A-14. List of Metrics Available for Replay Tagging (For Replay Event Only) (Contd.) Event Mask Value for Replay_event IA32_PEBS_ ENABLE Field to Set MSR_PEBS_ MATRIX_VERT Bit Field to Set split_load_retired Bit 10, Bit 24, Bit 25 Bit 0 Select load_port_replay event with the MSR_SAAT_ESCR1 MSR and set the SPLIT_LD mask bit.
PERFORMANCE-MONITORING EVENTS Table A-15.
PERFORMANCE-MONITORING EVENTS Table A-15. Event Mask Qualification for Logical Processors (Contd.
PERFORMANCE-MONITORING EVENTS Table A-15. Event Mask Qualification for Logical Processors (Contd.
PERFORMANCE-MONITORING EVENTS Table A-15. Event Mask Qualification for Logical Processors (Contd.
PERFORMANCE-MONITORING EVENTS Table A-15. Event Mask Qualification for Logical Processors (Contd.
PERFORMANCE-MONITORING EVENTS Table A-15. Event Mask Qualification for Logical Processors (Contd.) Event Type Event Name Event Masks, ESCR[24:9] At Retirement machine_clear Bit At Retirement At Retirement At Retirement At Retirement At Retirement At Retirement A-182 Vol.
PERFORMANCE-MONITORING EVENTS Table A-15. Event Mask Qualification for Logical Processors (Contd.) Event Type Event Name At Retirement uops_retired At Retirement A.8 Event Masks, ESCR[24:9] TS or TI Bit instr_completed 0: NBOGUS TS 1: BOGUS TS Bit 0: NBOGUS TS 1: BOGUS TS PERFORMANCE MONITORING EVENTS FOR INTEL® PENTIUM® M PROCESSORS The Pentium M processor’s performance-monitoring events are based on monitoring events for the P6 family of processors.
PERFORMANCE-MONITORING EVENTS Table A-16. Performance Monitoring Events on Intel® Pentium® M Processors (Contd.) Name Hex Values Descriptions BR_BAC_MISSP_EXEC 8AH Branch instructions executed that were mispredicted at front end (BAC). BR_CND_EXEC 8BH Conditional branch instructions that were executed. BR_CND_MISSP_EXEC 8CH Conditional branch instructions executed that were mispredicted. BR_IND_EXEC 8DH Indirect branch instructions executed.
PERFORMANCE-MONITORING EVENTS Table A-16. Performance Monitoring Events on Intel® Pentium® M Processors (Contd.) Name Hex Values Descriptions EMON_PREF_RQSTS_UP F0H Number of upward prefetches issued EMON_PREF_RQSTS_DN F8H Number of downward prefetches issued Prefetcher A number of P6 family processor performance monitoring events are modified for the Pentium M processor.
PERFORMANCE-MONITORING EVENTS Table A-17. Performance Monitoring Events Modified on Intel® Pentium® M Processors (Contd.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters Unit Data Cache Unit (DCU) Event Mnemonic Event Num. Name Unit Mask 43H 00H DATA_MEM_REFS Description Comments All loads from any memory type. All stores to any memory type. Each part of a split is counted separately. The internal logic counts not only memory loads and stores, but also internal retries.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask 48H 00H DCU_MISS_ OUTSTANDING Description Comments Weighted number of cycles while a DCU miss is outstanding, incremented by the number of outstanding cache misses at any particular time. An access that also misses the L2 is short-changed by 2 cycles (i.e., if counts N cycles, should be N+2 cycles).
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit 1 L2 Cache Event Mnemonic Event Num. Name Unit Mask Description 28H MESI 0FH Number of L2 instruction fetches. L2_IFETCH Comments This event indicates that a normal instruction fetch was received by the L2. The count includes only L2 cacheable instruction fetches; it does not include UC instruction fetches. It does not include ITLB miss accesses.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask Description it indicates that the DCU sent a read-forownership request to the L2. It also includes Invalid to Modified requests sent by the DCU to the L2. It includes only L2 cacheable memory accesses; it does not include I/O accesses, other nonmemory accesses, or memory accesses such as UC/WT memory accesses.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit External Bus Logic (EBL)2 Event Mnemonic Event Num. Name Unit Mask 62H 00H (Self) BUS_DRDY_ CLOCKS 20H (Any) 63H 60H BUS_LOCK_ CLOCKS BUS_REQ_ OUTSTANDING Description Number of clocks during Unit Mask = 00H which DRDY# is asserted. counts bus clocks Utilization of the external when the processor is driving DRDY#. system data bus during data transfers.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask 68H 00H (Self) BUS_TRAN_ IFETCH 20H (Any) 69H BUS_TRAN_INVA L 00H (Self) Description Number of completed instruction fetch transactions. Number of completed invalidate transactions. 20H (Any) 6AH BUS_TRAN_PWR 00H (Self) 20H (Any) 6BH BUS_TRANS_P 00H (Self) Number of completed partial write transactions.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask Description 6FH 00H (Self) Number of completed memory transactions. BUS_TRAN_MEM Comments 20H (Any) 64H BUS_DATA_RCV 00H (Self) Number of bus clock cycles during which this processor is receiving data. 61H BUS_BNR_DRV 00H (Self) Number of bus clock cycles during which this processor is driving the BNR# pin.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask Description Comments • If the PC bit is clear, the processor toggles the BPMi pins when the counter overflows. • If the clock ratio is not 2:1 or 3:1, the BPMi pins will not function for these performancemonitoring counter events.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask Description Comments • If the PC bit is clear, the processor toggles the BPMipins when the counter overflows. • If the clock ratio is not 2:1 or 3:1, the BPMi pins will not function for these performancemonitoring counter events.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask 10H 00H FP_COMP_OPS_ EXE Description Comments Number of computational floating-point operations executed. Counter 0 only. The number of FADD, FSUB, FCOM, FMULs, integer MULs and IMULs, FDIVs, FPREMs, FSQRTS, integer DIVs, and IDIVs. This number does not include the number of cycles, but the number of operations.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask 14H 00H CYCLES_DIV_ BUSY Description Comments Number of cycles during which the divider is busy, and cannot accept new divides. Counter 0 only. This includes integer and FP divides, FPREM, FPSQRT, etc. and is speculative. Memory Ordering 03H LD_BLOCKS 00H Number of load operations delayed due to store buffer blocks.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask 05H 00H MISALIGN_ MEM_REF Description Comments Number of misaligned data memory references. MISALIGN_MEM_ REF is only an approximation to the true number of misaligned memory references. Incremented by 1 every cycle, during which either the processor’s load or store pipeline dispatches a misaligned μop.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Instruction Decoding and Retirement Event Mnemonic Event Num. Name Unit Mask C0H 00H INST_RETIRED Description Comments Number of instructions retired. A hardware interrupt received during/after the last iteration of the REP STOS flow causes the counter to undercount by 1 instruction.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask Interrupts C8H HW_INT_RX 00H Number of hardware interrupts received. C6H CYCLES_INT_ MASKED 00H Number of processor cycles for which interrupts are disabled. C7H CYCLES_INT_ PENDING_ AND_MASKED 00H Number of processor cycles for which interrupts are disabled and interrupts are pending.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask Stalls A2H 00H RESOURCE_ STALLS Description Comments Incremented by 1 during every cycle for which there is a resource related stall. Includes register renaming buffer entries, memory buffer entries. Does not include stalls due to bus queue full, too many cache misses, etc.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Event Mnemonic Event Num. Name Unit Mask MMX Unit B0H 00H MMX_INSTR_ EXEC Description Comments Number of MMX Instructions Executed. Available in Intel Celeron, Pentium II and Pentium II Xeon processors only. Does not account for MOVQ and MOVD stores from register to memory. B1H MMX_SAT_ INSTR_EXEC 00H Number of MMX Saturating Instructions Executed.
PERFORMANCE-MONITORING EVENTS Table A-18. Events That Can Be Counted with the P6 Family PerformanceMonitoring Counters (Contd.) Unit Segment Register Renaming Event Mnemonic Event Num. Name D4H Unit Mask SEG_RENAME_ STALLS Description Comments Number of Segment Available in Pentium Register Renaming Stalls: II and Pentium III processors only.
PERFORMANCE-MONITORING EVENTS A.10 PENTIUM PROCESSOR PERFORMANCEMONITORING EVENTS Table A-19 lists the events that can be counted with the performance-monitoring counters for the Pentium processor. The Event Number column gives the hexadecimal code that identifies the event and that is entered in the ES0 or ES1 (event select) fields of the CESR MSR. The Mnemonic Event Name column gives the name of the event, and the Description and Comments columns give detailed descriptions of the events.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num. Mnemonic Event Name Description Comments 0BH MISALIGNED DATA MEMORY OR I/O REFERENCES Number of memory or I/O reads or writes that are misaligned A 2- or 4-byte access is misaligned when it crosses a 4-byte boundary; an 8-byte access is misaligned when it crosses an 8-byte boundary.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num. 16H Mnemonic Event Name INSTRUCTIONS_ EXECUTED Description Comments Number of instructions executed (up to two per clock) Invocations of a fault handler are considered instructions. All hardware and software interrupts and exceptions will also cause the count to be incremented.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num. 22H Mnemonic Event Name FLOPS Description Number of floatingpoint operations that occur Comments Number of floating-point adds, subtracts, multiplies, divides, remainders, and square roots are counted. The transcendental instructions consist of multiple adds and multiplies and will signal this event multiple times.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num. Mnemonic Event Name Description Comments 26H BREAKPOINT MATCH ON DR3 REGISTER Number of matches on register DR3 breakpoint See comment for 23H event.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num. Mnemonic Event Name Description Comments 2FH SATURATING_ MMX_ INSTRUCTIONS_ EXECUTED (Counter 0) Number of saturating MMX instructions executed, independently of whether they actually saturated.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num. Mnemonic Event Name Description Comments 32H TAKEN_BRANCHES (Counter 1) Number of taken branches 33H D1_STARVATION_ AND_FIFO_IS_ EMPTY (Counter 0) Number of times D1 stage cannot issue ANY instructions since the FIFO buffer is empty The D1 stage can issue 0, 1, or 2 instructions per clock if those are available in an instructions FIFO buffer.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num. 35H Mnemonic Event Name PIPELINE_ FLUSHES_DUE_ TO_WRONG_ BRANCH_ PREDICTIONS (Counter 0) Description Comments Number of pipeline flushes due to wrong branch predictions resolved in either the Estage or the WB-stage The count includes any pipeline flush due to a branch that the pipeline did not follow correctly.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num. Mnemonic Event Name Description Comments 37H MISPREDICTED_ OR_ UNPREDICTED_ RETURNS (Counter 1) Number of returns predicted incorrectly or not predicted at all The count is the difference between the total number of executed returns and the number of returns that were correctly predicted.
PERFORMANCE-MONITORING EVENTS Table A-19. Events That Can Be Counted with Pentium Processor Performance-Monitoring Counters (Contd.) Event Num.
PERFORMANCE-MONITORING EVENTS A-218 Vol.
APPENDIX B MODEL-SPECIFIC REGISTERS (MSRS) This appendix lists MSRs provided in Intel Core 2 processor family, Intel Atom, Intel Core Duo, Intel Core Solo, Pentium 4 and Intel Xeon processors, P6 family processors, and Pentium processors in Tables B-6, B-11, and B-12, respectively. All MSRs listed can be read with the RDMSR and written with the WRMSR instructions. Register addresses are given in both hexadecimal and decimal.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-1. CPUID Signature (Contd.)Values of DisplayFamily_DisplayModel (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 0H 0 IA32_P5_MC_ADDR (P5_MC_ADDR) See Appendix B.9, “MSRs in Pentium Processors.” Pentium Processor (05_01H) 1H 1 IA32_P5_MC_TYPE (P5_MC_TYPE) See Appendix B.9, “MSRs in Pentium Processors.” DF_DM = 05_01H 6H 6 IA32_MONITOR_FILTER_SI ZE See Section 7.11.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 3AH Decimal 58 Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 11 APIC Global Enable (R/W) (MAXPHYWID - 1):12 APIC Base (R/W) 63: MAXPHYWID Reserved IA32_FEATURE_CONTROL Control Features in Intel 64 Processor. (R/W) 0 Lock bit (R/WO): (1 = locked). If When set, locks this MSR CPUID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) 2 MSR/Bit Description Enable VMX outside SMX operation (R/WL): This bit enables VMX for system executive that do not require SMX.. Introduced as Architectural MSR If CPUID.01H:ECX[b it 5 or bit 6] = 1 BIOS must set this bit only when the CPUID function 1 returns VMX feature flag set (ECX bit 5).
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 8BH Decimal 139 Architectural MSR Name and bit fields (Former MSR Name) IA32_BIOS_SIGN_ID (BIOS_SIGN/BBL_CR _D3) MSR/Bit Description BIOS Update Signature (RO) Introduced as Architectural MSR 06_01H Returns the microcode update signature following the execution of CPUID.01H. A processor may prevent writing to this MSR when loading guest states on VM entries or saving guest states on VM exits.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex E7H Decimal 231 Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description IA32_MPERF Maximum Qualified Performance Clock Counter (R/Write to clear) 63:0 C0_MCNT: C0 Maximum Frequency Clock Count. Introduced as Architectural MSR If CPUID.06H: ECX[0] = 1 Increments at fixed interval (relative to TSC freq.) when the logical processor is in C0.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description 7:0 Event Select: Selects a performance event logic unit 15:8 UMask: Qualifies the microarchitectural condition to detect on the selected event logic. 16 USR: Counts while in privilege level is not ring 0. 17 OS: Counts while in privilege level is ring 0.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description 06_0EH2 188H197H 392407 Reserved 198H 408 IA32_PERF_STATUS (RO) 15:0 Current performance State Value 199H 409 Introduced as Architectural MSR 63:16 Reserved IA32_PERF_CTL (R/W) 15:0 Target performance State Value 31:16 Reserved 32 IDA Engage.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 19BH Decimal 411 Architectural MSR Name and bit fields (Former MSR Name) IA32_THERM_INTERRUPT MSR/Bit Description Thermal Interrupt Control (R/W) Introduced as Architectural MSR 0F_0H Enables and disables the generation of an interrupt on temperature transitions detected with the processor’s thermal sensors and thermal monitor. See Section 13.5.2, “Thermal Monitor.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal B-12 Vol. 3 Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 0 Thermal Status (RO): 1 Thermal Status Log (R/W): 2 PROCHOT # or FORCEPR# event (RO) 3 PROCHOT # or FORCEPR# log (R/WC0) 4 Critical Temperature Status (RO) 5 Critical Temperature Status log (R/WC0) 6 Thermal Threshold #1 Status (RO) If CPUID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 1A0H Decimal 416 Architectural MSR Name and bit fields (Former MSR Name) IA32_MISC_ENABLE MSR/Bit Description Introduced as Architectural MSR Enable Misc. Processor Features. (R/W) Allows a variety of processor functions to be enabled and disabled. 0 Fast-Strings Enable. 0F_0H When set, the fast-strings feature (for REP MOVS and REP STORS) is enabled (default); when clear, faststrings are disabled.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) 11 MSR/Bit Description Branch Trace Storage Unavailable. (RO) 1= 0= 12 Reserved 16 Enhanced Intel SpeedStep Technology Enable. (R/W) 1= 06_0DH Enhanced Intel SpeedStep Technology disabled Enhanced Intel SpeedStep Technology enabled 17 Reserved 18 ENABLE MONITOR FSM.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR If the SSE3 feature flag ECX[0] is not set (CPUID.01H:ECX[bit 0] = 0), the OS must not attempt to alter this bit. BIOS must leave it in the default state. Writing this bit when the SSE3 feature flag is set to 0 may generate a #GP exception. 21:19 22 Limit CPUID Maxval.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) 23 MSR/Bit Description xTPR Message Disable. (R/W) When set to 1, xTPR messages are disabled. xTPR messages are optional messages that allow the processor to inform the chipset of its priority. 33:24 Reserved 34 XD Bit Disable.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) 23:16 MSR/Bit Description Introduced as Architectural MSR Temperature Target. the minimum temperature at which PROCHOT# will be asserted. The value is degree C. 1D9H 473 63:24 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 10 BTS_OFF_USR: When set, BTS or BTM is skipped if CPL > 0. 06_0FH 11 FREEZE_LBRS_ON_PMI: When set, the LBR stack is frozen on a PMI request. If CPUID.01H: ECX[15] = 1 12 FREEZE_PERFMON_ON_PMI: If CPUID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description 2:0 Default Memory Type 9:3 Reserved 10 Fixed Range MTRR Enable 11 MTRR Enable 63:12 Reserved Introduced as Architectural MSR 309H 777 IA32_FIXED_CTR0 (MSR_PERF_FIXED_CTR0) Fixed-Function Performance Counter 0 (R/W): Counts Instr_Retired.Any If CPUID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 38EH Decimal 910 B-22 Vol.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 3BFH Decimal 911 Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 1 Ovf_PMC1: Overflow status of IA32_PMC1 If CPUID.0AH: EAX[7:0] > 0 31:2 Reserved 32 Ovf_FixedCtr0: Overflow status of IA32_FIXED_CTR0 If CPUID.0AH: EAX[7:0] > 1 33 Ovf_FixedCtr1: Overflow status of IA32_FIXED_CTR1 If CPUID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 390H 3F1H Decimal 912 1009 Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR IA32_PERF_GLOBAL_OVF_ Global Performance Counter CTRL Overflow Control (R/W) (MSR_PERF_GLOBAL_OVF _CTRL) If CPUID.0AH: EAX[7:0] > 0 0 Set 1 to Clear Ovf_PMC0 bit If CPUID.0AH: EAX[7:0] > 0 1 Set 1 to Clear Ovf_PMC1 bit If CPUID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 480H Decimal 1152 Architectural MSR Name and bit fields (Former MSR Name) IA32_VMX_BASIC MSR/Bit Description Introduced as Architectural MSR Reporting Register of Basic VMX Capabilities. (R/O) If CPUID.01H:ECX.[ bit 5] = 1 See Appendix G.1, “Basic VMX Information” 481H 1153 IA32_VMX_PINBASED_CT LS Capability Reporting Register of Pin-based VM-execution Controls. (R/O) If CPUID.01H:ECX.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 486H Decimal 1158 Architectural MSR Name and bit fields (Former MSR Name) IA32_VMX_CRO_FIXED0 MSR/Bit Description Introduced as Architectural MSR Capability Reporting Register of CR0 Bits Fixed to 0. (R/O) If CPUID.01H:ECX.[ bit 5] = 1 See Appendix G.7, “VMXFixed Bits in CR0” 487H 1159 IA32_VMX_CRO_FIXED1 Capability Reporting Register of CR0 Bits Fixed to 1. (R/O) If CPUID.01H:ECX.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 48CH Decimal 1164 Architectural MSR Name and bit fields (Former MSR Name) IA32_VMX_EPT_VPID_CA P MSR/Bit Description Capability Reporting Register of EPT and VPID. (R/O) See Appendix G.10, “VPID and EPT Capabilities” 48DH 1165 IA32_VMX_TRUE_PINBAS ED_CTLS Capability Reporting Register of Pin-based VM-execution Flex Controls. (R/O) See Appendix G.3.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex 600H Decimal 1536 Architectural MSR Name and bit fields (Former MSR Name) IA32_DS_AREA MSR/Bit Description DS Save Area. (R/W) Introduced as Architectural MSR 0F_0H Points to the linear address of the first byte of the DS buffer management area, which is used to manage the BTS and PEBS buffers. See Section 18.18.4, “Debug Store (DS) Mechanism.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 80FH 2063 IA32_EXT_XAPIC_SIVR x2APIC Spurious Interrupt Vector Register. (R/W) If ( CPUID.01H:ECX.[ bit 21] = 1 ) 810H 2064 IA32_EXT_XAPIC_ISR0 x2APIC In-Service Register Bits 31:0. (R/O) If ( CPUID.01H:ECX.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 81BH 2075 IA32_EXT_XAPIC_TMR3 x2APIC Trigger Mode If ( Register Bits 127:96. (R/O) CPUID.01H:ECX.[ bit 21] = 1 ) 81CH 2076 IA32_EXT_XAPIC_TMR4 x2APIC Trigger Mode Register Bits 159:128 (R/O) If ( CPUID.01H:ECX.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 827H 2087 IA32_EXT_XAPIC_IRR7 x2APIC Interrupt Request Register Bits 255:224. (R/O) If ( CPUID.01H:ECX.[ bit 21] = 1 ) 828H 2088 IA32_EXT_XAPIC_ESR x2APIC Error Status Register. (R/W) If ( CPUID.01H:ECX.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR 83EH 2110 IA32_EXT_XAPIC_DIV_CO NF x2APIC Divide Configuration Register. (R/W) If ( CPUID.01H:ECX.[ bit 21] = 1 ) 83FH 2111 IA32_EXT_XAPIC_SELF_IP I x2APIC Self IPI Register. (W/O) If ( CPUID.01H:ECX.[ bit 21] = 1 ) IA32_EFER Extended Feature Enables. If ( CPUID.80000001 .EDX.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-2. IA-32 Architectural MSRs (Contd.) Register Address Hex Decimal Architectural MSR Name and bit fields (Former MSR Name) MSR/Bit Description Introduced as Architectural MSR C000_ 0100H IA32_FS_BASE Map of BASE Address of FS. (R/W) If CPUID.80000001 .EDX.[bit 29] = 1 C000_ 0101H IA32_GS_BASE Map of BASE Address of GS. (R/W) If CPUID.80000001 .EDX.[bit 29] = 1 C000_ 0102H IA32_KERNEL_GS_BASE Swap Target of BASE Address of GS. (R/W) If CPUID.80000001 .
MODEL-SPECIFIC REGISTERS (MSRS) field in an MSR governs only a core independently. “Shared” means the MSR or the bit field in an MSR address governs the operation of both processor cores. Vol.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture Register Address Register Name Shared/ Unique Bit Description Hex Dec 0H 0 IA32_P5_MC_ ADDR Unique See Appendix B.9, “MSRs in Pentium Processors.” 1H 1 IA32_P5_MC_ TYPE Unique See Appendix B.9, “MSRs in Pentium Processors.” 6H 6 IA32_MONITOR_ FILTER_SIZE Unique See Section 7.11.5, “Monitor/Mwait Address Range Determination.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 3 MCERR# Drive Enable. (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 4 Address Parity Enable. (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 5 Reserved 6 Reserved 7 BINIT# Driver Enable. (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Register Name Shared/ Unique Hex Dec 3AH 58 IA32_FEATURE_ CONTROL 40H 64 MSR_ Unique LASTBRANCH_0_F ROM_IP Unique Bit Description Control Features in Intel 64Processor. (R/W). see Table B-2 Last Branch Record 0 From IP. (R/W) One of four pairs of last branch record registers on the last branch record stack.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Dec 63H 99 79H 8BH 121 139 Register Name Shared/ Unique MSR_ LASTBRANCH_3_ TO_LIP Unique IA32_BIOS_ UPDT_TRIG Unique IA32_BIOS_ SIGN_ID Unique Bit Description Last Branch Record 3 To IP. (R/W) See description of MSR_LASTBRANCH_0_TO_LIP. BIOS Update Trigger Register. (R/W) see Table B-2 BIOS Update Signature ID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 2:0 • • • • • • • 101B: 100 MHz (FSB 400) 001B: 133 MHz (FSB 533) 011B: 167 MHz (FSB 667) 010B: 200 MHz (FSB 800) 000B: 267 MHz (FSB 1067) 100B: 333 MHz (FSB 1333) 110B: 400 MHz (FSB 1600) 133.33 MHz should be utilized if performing calculation with System Bus Speed when encoding is 001B. 166.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 8 L2 Enabled. (R/W) 1 = L2 cache has been initialized 0 = Disabled (default) Until this bit is set the processor will not respond to the WBINVD instruction or the assertion of the FLUSH# input. 22:9 Reserved. 23 L2 Not Present. (RO) 0 = L2 Present 1 = L2 Not Present 63:24 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 2 MCIP. When set, bit indicates that a machine check has been generated. If a second machine check is detected while this bit is still set, the processor enters a shutdown state. Software should write this bit to 0 after processing a machine check exception. 63:3 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Dec 19AH 410 Register Name IA32_CLOCK_ MODULATION Shared/ Unique Unique Bit Description Clock Modulation. (R/W) see Table B-2 IA32_CLOCK_MODULATION MSR was originally named IA32_THERM_CONTROL MSR. 19BH 19CH 19DH 411 412 413 IA32_THERM_ INTERRUPT Unique IA32_THERM_ STATUS Unique MSR_THERM2_ CTL Unique Thermal Interrupt Control.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 9 Hardware Prefetcher Disable. (R/W) When set, disables the hardware prefetcher operation on streams of data. When clear (default), enables the prefetch queue. Disabling of the hardware prefetcher may impact processor performance. 10 Shared FERR# Multiplexing Enable.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 15:14 Reserved. 16 Shared Enhanced Intel SpeedStep Technology Enable. (R/W) see Table B-2 18 Shared ENABLE MONITOR FSM. (R/W) see Table B-2 19 Shared Adjacent Cache Line Prefetch Disable. (R/W) When set to 1, the processor fetches the cache line that contains data currently required by the processor.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 37 Unique DCU Prefetcher Disable. (R/W) When set to 1, The DCU L1 data cache prefetcher is disabled. The default value after reset is 0. BIOS may write ‘1’ to disable this feature. The DCU prefetcher is an L1 data cache prefetcher.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Dec 1C9H 457 Register Name MSR_ LASTBRANCH_ TOS Shared/ Unique Unique Bit Description Last Branch Record Stack TOS. (R) Contains an index (bits 0-3) that points to the MSR containing the most recent branch record. See MSR_LASTBRANCH_0 (at 40H). 1D9H 473 IA32_DEBUGCTL Unique Debug Control.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Register Name Shared/ Unique Bit Description Hex Dec 26FH 623 IA32_MTRR_FIX4 K_F8000 Unique see Table B-2 277H 631 IA32_CR_PAT Unique see Table B-2 2FFH 767 IA32_MTRR_DEF_ TYPE Unique Default Memory Types. (R/W) see Table B-2 309H 777 IA32_FIXED_CTR0 Unique Fixed-Function Performance Counter Register 0.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Register Name Shared/ Unique Bit Description Hex Dec 38FH 911 IA32_PERF_ GLOBAL_CTRL Unique see Table B-2. See Section 18.15.2, “Global Counter Control Facilities.” 38FH 911 MSR_PERF_ GLOBAL_CTRL Unique See Section 18.15.2, “Global Counter Control Facilities.” 390H 912 IA32_PERF_ GLOBAL_OVF_ CTRL Unique see Table B-2. See Section 18.15.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Register Name Shared/ Unique Bit Description Hex Dec 408H 1032 IA32_MC2_CTL Unique See Section 14.3.2.1, “IA32_MCi_CTL MSRs.” 409H 1033 IA32_MC2_ STATUS Unique See Section 14.3.2.2, “IA32_MCi_STATUS MSRS.” 40AH 1034 IA32_MC2_ADDR Unique See Section 14.3.2.3, “IA32_MCi_ADDR MSRs.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Register Name Shared/ Unique Bit Description Hex Dec 415H 1045 MSR_MC5_ STATUS Unique 416H 1046 MSR_MC5_ADDR Unique 417H 1047 MSR_MC5_MISC Unique 419H 1045 MSR_MC6_ STATUS Unique Apply to Intel Xeon processor 7400 series (processor signature 06_1D) only. See Section 14.3.2.2, “IA32_MCi_STATUS MSRS.” and Appendix E.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Register Name Shared/ Unique Bit Description Hex Dec 488H 1160 IA32_VMX_CR4_FI Unique XED0 Capability Reporting Register of CR4 Bits Fixed to 0. (R/O) see Table B-2. 489H 1161 IA32_VMX_CR4_FI Unique XED1 Capability Reporting Register of CR4 Bits Fixed to 1. (R/O) see Table B-2. See Appendix G.8, “VMX-Fixed Bits in CR4” See Appendix G.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-3. MSRs in Processors Based on Intel Core Microarchitecture (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 107D0 H MSR_EMON_L3_C TR_CTL4 Unique 107D1 H MSR_EMON_L3_C TR_CTL5 Unique 107D2 H MSR_EMON_L3_C TR_CTL6 Unique 107D3 H MSR_EMON_L3_C TR_CTL7 Unique 107D8 H MSR_EMON_L3 _GL_CTL Unique C000_ 0080H IA32_EFER Unique Extended Feature Enables.
MODEL-SPECIFIC REGISTERS (MSRS) B.3 MSRS IN THE INTEL® ATOM™ PROCESSOR FAMILY Table B-4 lists model-specific registers (MSRs) for Intel Atom processor family, architectural MSR addresses are also included in Table B-4. These processors have a CPUID signature with DisplayFamily_DisplayModel of 06_0CH, see Table B-1. The column “Shared/Unique” applies to logical processors sharing the same core in processors based on the Intel Atom microarchitecture.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Register Name Shared/ Unique Bit Description Dec 1 Data Error Checking Enable. (R/W) 1 = Enabled; 0 = Disabled Always 0. 2 Response Error Checking Enable. (R/W) 1 = Enabled; 0 = Disabled Always 0. 3 AERR# Drive Enable. (R/W) 1 = Enabled; 0 = Disabled Always 0. 4 BERR# Enable for initiator bus requests. (R/W) 1 = Enabled; 0 = Disabled Always 0. 5 Reserved 6 Reserved 7 BINIT# Driver Enable.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Register Name Shared/ Unique Bit Description Dec 17:16 APIC Cluster ID. (R/O) Always 00B. 19: 18 Reserved. 21: 20 Symmetric Arbitration ID. (R/O) Always 00B. 26:22 3AH 58 IA32_FEATURE_ CONTROL Integer Bus Frequency Ratio. (R/O) Unique Control Features in Intel 64Processor. (R/W). see Table B-2 40H 64 MSR_ Unique LASTBRANCH_0_F ROM_IP Last Branch Record 0 From IP.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Dec 46H 70 47H 60H 61H 62H 63H 64H 65H 66H 67H B-58 Vol. 3 71 96 97 98 99 100 101 102 103 Register Name Shared/ Unique Bit Description MSR_ Unique LASTBRANCH_6_F ROM_IP Last Branch Record 6 From IP. (R/W) MSR_ Unique LASTBRANCH_7_F ROM_IP Last Branch Record 7 From IP. (R/W) MSR_ LASTBRANCH_0_ TO_LIP Unique Last Branch Record 0 To IP.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Dec 79H 121 8BH 139 Register Name Shared/ Unique IA32_BIOS_ UPDT_TRIG Unique IA32_BIOS_ SIGN_ID Unique Bit Description BIOS Update Trigger Register. (R/W) see Table B-2 BIOS Update Signature ID. (RO) see Table B-2 C1H 193 IA32_PMC0 Unique Performance counter register. see Table B-2 C2H 194 IA32_PMC1 Unique Performance counter register.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Register Name Shared/ Unique Bit Description Dec 8 L2 Enabled. (R/W) 1 = L2 cache has been initialized 0 = Disabled (default) Until this bit is set the processor will not respond to the WBINVD instruction or the assertion of the FLUSH# input. 22:9 Reserved. 23 L2 Not Present. (RO) 0= 1= L2 Present L2 Not Present 63:24 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Register Name Shared/ Unique Bit Description Dec 2 MCIP. When set, bit indicates that a machine check has been generated. If a second machine check is detected while this bit is still set, the processor enters a shutdown state. Software should write this bit to 0 after processing a machine check exception. 63:3 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Register Name Shared/ Unique Bit Description Dec 15:0 Reserved. 16 TM_SELECT. (R/W) Mode of automatic thermal monitor: 0 = Thermal Monitor 1 (thermally-initiated on-die modulation of the stop-clock duty cycle) 1 = Thermal Monitor 2 (thermally-initiated frequency transitions) If bit 3 of the IA32_MISC_ENABLE register is cleared, TM_SELECT has no effect. Neither TM1 nor TM2 are enabled.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Register Name Shared/ Unique Bit Description Dec 12 Shared Precise Event Based Sampling Unavailable. (RO) see Table B-2 13 Shared TM2 Enable. (R/W) When this bit is set (1) and the thermal sensor indicates that the die temperature is at the pre-determined threshold, the Thermal Monitor 2 mechanism is engaged.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Hex Register Name Shared/ Unique Dec 21 Reserved. 22 Unique Limit CPUID Maxval. (R/W) see Table B-2 23 Shared xTPR Message Disable. (R/W) see Table B-2 33:24 34 Reserved. Unique 63:35 1C9H Bit Description 457 MSR_ LASTBRANCH_ TOS XD Bit Disable. (R/W) see Table B-2 Reserved. Unique Last Branch Record Stack TOS.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Register Name Shared/ Unique Bit Description Hex Dec 38EH 910 IA32_PERF_ GLOBAL_STAUS Unique see Table B-2. See Section 18.15.2, “Global Counter Control Facilities.” 38FH 911 IA32_PERF_ GLOBAL_CTRL Unique see Table B-2. See Section 18.15.2, “Global Counter Control Facilities.” 390H 912 IA32_PERF_ GLOBAL_OVF_ CTRL Unique see Table B-2. See Section 18.15.2, “Global Counter Control Facilities.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Register Name Shared/ Unique Bit Description Hex Dec 410H 1040 MSR_MC3_CTL Shared See Section 14.3.2.1, “IA32_MCi_CTL MSRs.” 411H 1041 MSR_MC3_ STATUS Shared See Section 14.3.2.2, “IA32_MCi_STATUS MSRS.” 412H 1042 MSR_MC3_ADDR Shared See Section 14.3.2.3, “IA32_MCi_ADDR MSRs.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-4. MSRs in Intel Atom Processor Family Register Address Register Name Shared/ Unique Bit Description Hex Dec 487H 1159 IA32_VMX_CR0_ FIXED1 Unique Capability Reporting Register of CR0 Bits Fixed to 1. (R/O) see Table B-2. 488H 1160 IA32_VMX_CR4_FI Unique XED0 Capability Reporting Register of CR4 Bits Fixed to 0. (R/O) see Table B-2. See Appendix G.7, “VMX-Fixed Bits in CR0” See Appendix G.
MODEL-SPECIFIC REGISTERS (MSRS) B.4 MSRS IN THE INTEL® MICROARCHITECTURE (NEHALEM) Table B-5 lists model-specific registers (MSRs) for Intel microarchitecture (Nehalem). These include Intel Core i7 processor family. Architectural MSR addresses are also included in Table B-5. These processors have a CPUID signature with DisplayFamily_DisplayModel of 06_1AH, see Table B-1. The column “Scope” represents the package/core/thread scope of individual bit field of an MSR.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Dec 79H 121 8BH 139 Register Name Scope IA32_BIOS_ UPDT_TRIG Core IA32_BIOS_ SIGN_ID Thread Bit Description BIOS Update Trigger Register. (R/W) see Table B-2 BIOS Update Signature ID. (RO) see Table B-2 C1H 193 IA32_PMC0 Thread Performance counter register. see Table B-2 C2H 194 IA32_PMC1 Thread Performance counter register.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Register Name Scope Bit Description Dec 0 RIPV. When set, bit indicates that the instruction addressed by the instruction pointer pushed on the stack (when the machine check was generated) can be used to restart the program. If cleared, the program cannot be reliably restarted 1 EIPV.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Dec 19AH 410 Register Name IA32_CLOCK_ MODULATION Scope Thread Bit Description Clock Modulation. (R/W) see Table B-2 IA32_CLOCK_MODULATION MSR was originally named IA32_THERM_CONTROL MSR. 19BH 19CH 1A0 411 412 416 IA32_THERM_ INTERRUPT Core IA32_THERM_ STATUS Core see Table B-2 Enable Misc. Processor Features.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Register Name Scope Bit Description Dec 34 Thread 37:35 38 XD Bit Disable. (R/W) see Table B-2 Reserved. Package Turbo Mode Disable. (R/W) When set to 1 on processors that support Intel Turbo Boost Technology, the turbo mode feature is disabled and the IDA_Enable feature flag will be clear (CPUID.06H: EAX[1]=0). When set to a 0 on processors that support IDA, CPUID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Register Name Scope Bit Description Dec 63:32 Reserved. 1C8H 456 MSR_LBR_SELECT Thread Last Branch Record Filtering Select Register (R/W) see Section 18.6.2, “Filtering of Last Branch Records.” 1C9H 457 MSR_ LASTBRANCH_ TOS Thread Last Branch Record Stack TOS. (R) Contains an index (bits 0-3) that points to the MSR containing the most recent branch record.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Register Name Scope Bit Description Hex Dec 26DH 621 IA32_MTRR_FIX4 K_E8000 Thread see Table B-2 26EH 622 IA32_MTRR_FIX4 K_F0000 Thread see Table B-2 26FH 623 IA32_MTRR_FIX4 K_F8000 Thread see Table B-2 277H 631 IA32_CR_PAT Thread see Table B-2 2FFH 767 IA32_MTRR_DEF_ TYPE Thread Default Memory Types.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Register Name Scope Bit Description Hex Dec 391H 913 MSR_UNCORE_PE RF_GLOBAL_CTRL Package See Section 18.17.2.1, “Uncore Performance Monitoring Management Facility.” 392H 914 MSR_UNCORE_PE RF_GLOBAL_STAT US Package See Section 18.17.2.1, “Uncore Performance Monitoring Management Facility.” 393H 915 MSR_UNCORE_PE RF_GLOBAL_OVF_ CTRL Package See Section 18.17.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Register Name Scope Bit Description Hex Dec 3C2H 962 MSR_UNCORE_PM C2 Package See Section 18.17.2.2, “Uncore Performance Event Configuration Facility.” 3C3H 963 MSR_UNCORE_PM C3 Package See Section 18.17.2.2, “Uncore Performance Event Configuration Facility.” 3C4H 964 MSR_UNCORE_PM C4 Package See Section 18.17.2.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Register Name Scope Bit Description Dec 63:0 Package C3 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C3 states. Count at the same frequency as the TSC. 3F9H 1017 MSR_PKG_C6_RES IDENCY Package 63:0 Package C6 Residency Counter. (R/O) Value since last reset that this package is in processor-specific C6 states.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Dec 402H 1026 Register Name IA32_MC0_ADDR Scope Unique Bit Description See Section 14.3.2.3, “IA32_MCi_ADDR MSRs.” The IA32_MC0_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC0_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Dec 40EH 1038 Register Name MSR_MC4_ADDR Scope Unique Bit Description See Section 14.3.2.3, “IA32_MCi_ADDR MSRs.” The MSR_MC4_ADDR register is either not implemented or contains no address if the ADDRV flag in the MSR_MC4_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a general-protection exception.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Register Name Scope Bit Description Hex Dec 481H 1153 IA32_VMX_PINBA SED_CTLS 482H 1154 IA32_VMX_PROCB Thread ASED_CTLS Capability Reporting Register of Primary Processor-based VM-execution Controls. (R/O) 483H 1155 IA32_VMX_EXIT_ CTLS Capability Reporting Register of VM-exit Controls. (R/O) see Table B-2.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Dec 48BH 1163 Register Name Scope IA32_VMX_PROCB Thread ASED_CTLS2 Bit Description Capability Reporting Register of Secondary Processor-based VM-execution Controls. (R/O) See Appendix G.3, “VM-Execution Controls” 600H 1536 IA32_DS_AREA Thread DS Save Area. (R/W). see Table B-2 See Section 18.18.4, “Debug Store (DS) Mechanism.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Dec 687H 1671 688H 689H 68AH 68BH 68CH 68DH 68EH 68FH 6C0H 1672 1673 1674 1675 1676 1677 1678 1679 1728 Register Name Scope Bit Description MSR_ Thread LASTBRANCH_7_F ROM_IP Last Branch Record 7 From IP. (R/W) MSR_ Thread LASTBRANCH_8_F ROM_IP Last Branch Record 8 From IP. (R/W) MSR_ Thread LASTBRANCH_9_F ROM_IP Last Branch Record 9 From IP.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Dec 6C1H 1729 6C2H 6C3H 6C4H 6C5H 6C6H 6C7H 6C8H 6C9H 6CAH 6CBH B-84 Vol.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Dec 6CCH 1740 6CDH 6CEH 6CFH 1741 1742 1743 Register Name Scope Bit Description MSR_ Thread LASTBRANCH_12_ TO_LIP Last Branch Record 12 To IP. (R/W) MSR_ Thread LASTBRANCH_13_ TO_LIP Last Branch Record 13 To IP. (R/W) MSR_ Thread LASTBRANCH_14_ TO_LIP Last Branch Record 14 To IP. (R/W) Thread MSR_ LASTBRANCH_15_ TO_LIP Last Branch Record 15 To IP.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-5. MSRs in Processors Based on Intel Microarchitecture (Contd.)(Nehalem) Register Address Hex Register Name Scope Bit Description Dec C000_ 0081H IA32_STAR Thread System Call Target Address. (R/W). see Table B-2 C000_ 0082H IA32_LSTAR Thread IA-32e Mode System Call Target Address. (R/W). see Table B-2 C000_ 0084H IA32_FMASK Thread System Call Flag Mask. (R/W). see Table B-2 C000_ 0100H IA32_FS_BASE Thread Map of BASE Address of FS. (R/W).
MODEL-SPECIFIC REGISTERS (MSRS) B.5 MSRS IN THE PENTIUM® 4 AND INTEL® XEON® PROCESSORS Table B-6 lists MSRs (architectural and model-specific) that are defined across processor generations based on Intel NetBurst microarchitecture. The processor can be identified by its CPUID signatures of DisplayFamily encoding of 0FH, see Table B-1. • MSRs with an “IA32_” prefix are designated as “architectural.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 17H 23 1BH 2AH 27 42 Register Name Fields and Flags Model Availability Shared/ Unique1 IA32_PLATFORM_ID 0, 1, 2, 3, 4, 6 Shared 0, 1, 2, 3, 4, 6 Unique 0, 1, 2, 3, 4, 6 Shared IA32_APIC_BASE MSR_EBC_HARD_ POWERON Bit Description Platform ID. (R).
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec 2 Model Availability Shared/ Unique1 Bit Description In Order Queue Depth. (R) Indicates whether the in order queue depth for the system bus is 1 (1) or up to 12 (0) as set by the strapping of A7#. The value in this bit is written on the deassertion of RESET#; the bit is set to 1 when the address bus signal is asserted.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 7 Bit Description Bus Park Disable. (R) Indicates whether bus park is enabled (0) or disabled (1) as set by the strapping of A15#. The value in this bit is written on the deassertion of RESET#; the bit is set to 1 when the address bus signal is asserted. 11:8 Reserved. 13:12 Agent ID.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 3 Bit Description Address/Request Error Checking Disable. (R/W) Set to disable (default); clear to enable. 4 Initiator MCERR# Disable. (R/W) Set to disable MCERR# driving for initiator bus requests (default); clear to enable. 5 Internal MCERR# Disable.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec 18:16 Model Availability Shared/ Unique1 Bit Description Scalable Bus Speed. (R/W) Indicates the intended scalable bus speed: Encoding 000B 000B 001B 010B 011B 100B Scalable Bus Speed 100 MHz (Model 2) 266 MHz (Model 3 or 4) 133 MHz 200 MHz 166 MHz 333 MHz (Model 6) 133.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 2CH 44 Register Name Fields and Flags MSR_EBC_ FREQUENCY_ID Model Availability Shared/ Unique1 0, 1 Shared Bit Description Processor Frequency Configuration. (R) The bit field layout of this MSR varies according to the MODEL value of the CPUID version information. This bit field layout applies to Pentium 4 and Xeon Processors with MODEL encoding less than 2.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 174H 372 Register Name Fields and Flags Model Availability Shared/ Unique1 IA32_SYSENTER_CS 0, 1, 2, 3, 4, 6 Unique Bit Description CS register target for CPL 0 code. (R/W). see Table B-2 See Section 4.8.7, “Performing Fast Calls to System Procedures with the SYSENTER and SYSEXIT Instructions.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 181H 385 Register Name Fields and Flags MSR_MCG_RBX Model Availability Shared/ Unique1 0, 1, 2, 3, 4, 6 Unique Bit Description Machine Check EBX/RBX Save State. See Section 14.3.2.6, “IA32_MCG Extended Machine Check State MSRs.” 63:0 182H 386 MSR_MCG_RCX Contains register state at time of machine check error.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 63:0 185H 389 MSR_MCG_RDI Bit Description Contains register state at time of machine check error. When in non64-bit modes at the time of the error, bits 63-32 do not contain valid data. 0, 1, 2, 3, 4, 6 Unique Machine Check EDI/RDI Save State. See Section 14.3.2.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 188H 392 Register Name Fields and Flags MSR_MCG_RFLAGS Model Availability Shared/ Unique1 0, 1, 2, 3, 4, 6 Unique Bit Description Machine Check EFLAGS/RFLAG Save State. See Section 14.3.2.6, “IA32_MCG Extended Machine Check State MSRs.” 63:0 189H 393 MSR_MCG_RIP Contains register state at time of machine check error.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Register Name Fields and Flags Hex Dec 18BH 18FH 395 MSR_MCG_ RESERVED1 MSR_MCG_ RESERVED5 190H 400 MSR_MCG_R8 Model Availability Shared/ Unique1 Reserved. 0, 1, 2, 3, 4, 6 Unique 401 MSR_MCG_R9 Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 193H 403 Register Name Fields and Flags MSR_MCG_R11 Model Availability Shared/ Unique1 0, 1, 2, 3, 4, 6 Unique 404 MSR_MCG_R12 Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 63-0 197H 407 MSR_MCG_R15 Bit Description Registers R8-15 (and the associated state-save MSRs) exist only in Intel 64 processors. These registers contain valid information only when the processor is operating in 64-bit mode at the time of the error. 0, 1, 2, 3, 4, 6 Unique Machine Check R15. See Section 14.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex 1A0H Register Name Fields and Flags Dec 416 IA32_MISC_ENABLE Model Availability Shared/ Unique1 3, Shared For Family F, Model 3 processors: When read, specifies the value of the target TM2 transition last written. When set, it sets the next target value for TM2 transition.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec 6 Model Availability Shared/ Unique1 Bit Description Third-Level Cache Disable. (R/W) When set, the third-level cache is disabled; when clear (default) the third-level cache is enabled. This flag is reserved for processors that do not have a third-level cache.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 Bit Description When this bit is clear (0, default), the processor does not change the VID signals or the bus to core ratio when the processor enters a thermal managed state.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 Bit Description Setting this can cause unexpected behavior to software that depends on the availability of CPUID leaves greater than 3. 23 Shared xTPR Message Disable. (R/W) see Table B-2. 24 L1 Data Cache Context Mode.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 18 Bit Description PLATFORM Requirements. When set to 1, indicates the processor has specific platform requirements. The details of the platform requirements are listed in the respective data sheets of the processor. 63:19 1D7H 471 MSR_LER_FROM_LIP Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 31:0 Bit Description From Linear IP. Linear address of the target of the last branch instruction. 63:32 1D8H 472 Reserved. 63:0 Unique From Linear IP. Linear address of the target of the last branch instruction (If IA-32e mode is active). 1D9H 473 MSR_DEBUGCTLA 0, 1, 2, 3, 4, 6 Unique Debug Control.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 1DBH 475 Register Name Fields and Flags Model Availability Shared/ Unique1 MSR_LASTBRANCH_0 0, 1, 2 Unique Bit Description Last Branch Record 0. (R/W) One of four last branch record registers on the last branch record stack. It contains pointers to the source and destination instruction for one of the last four branches, exceptions, or interrupts that the processor took.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 20EH 526 20FH 250H 258H 259H 268H 269H 26AH 26BH 26CH 26DH 527 592 600 601 616 617 618 619 620 621 B-112 Vol.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 26EH 622 26FH 277H 2FFH 623 631 767 Register Name Fields and Flags Model Availability Shared/ Unique1 IA32_MTRR_FIX4K_ F0000 0, 1, 2, 3, 4, 6 Shared IA32_MTRR_FIX4K_ F8000 0, 1, 2, 3, 4, 6 Shared IA32_CR_PAT 0, 1, 2, 3, 4, 6 Unique 0, 1, 2, 3, 4, 6 Shared IA32_MTRR_DEF_ TYPE Bit Description Fixed Range MTRR. See Section 10.11.2.2, “Fixed Range MTRRs.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Hex Dec 30AH 778 MSR_FLAME_ COUNTER2 0, 1, 2, 3, 4, 6 Shared See Section 18.18.2, “Performance Counters.” 30BH 779 MSR_FLAME_ COUNTER3 0, 1, 2, 3, 4, 6 Shared See Section 18.18.2, “Performance Counters.” 3OCH 780 MSR_IQ_COUNTER0 0, 1, 2, 3, 4, 6 Shared See Section 18.18.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Hex Dec 369H 873 MSR_FLAME_CCCR1 0, 1, 2, 3, 4, 6 Shared See Section 18.18.3, “CCCR MSRs.” 36AH 874 MSR_FLAME_CCCR2 0, 1, 2, 3, 4, 6 Shared See Section 18.18.3, “CCCR MSRs.” 36BH 875 MSR_FLAME_CCCR3 0, 1, 2, 3, 4, 6 Shared See Section 18.18.3, “CCCR MSRs.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Hex Dec 3A8H 936 MSR_DAC_ESCR0 0, 1, 2, 3, 4, 6 Shared See Section 18.18.1, “ESCR MSRs.” 3A9H 937 MSR_DAC_ESCR1 0, 1, 2, 3, 4, 6 Shared See Section 18.18.1, “ESCR MSRs.” 3AAH 938 MSR_MOB_ESCR0 0, 1, 2, 3, 4, 6 Shared See Section 18.18.1, “ESCR MSRs.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Register Name Fields and Flags Model Availability Shared/ Unique1 Hex Dec 3B9H 953 MSR_CRU_ESCR1 0, 1, 2, 3, 4, 6 Shared 3BAH 954 MSR_IQ_ESCR0 0, 1, 2 Shared Bit Description See Section 18.18.1, “ESCR MSRs.” See Section 18.18.1, “ESCR MSRs.” This MSR is not available on later processors. It is only available on processor family 0FH, models 01H-02H.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Register Name Fields and Flags Model Availability Shared/ Unique1 Bit Description Hex Dec 3CAH 970 MSR_ALF_ESCR0 0, 1, 2, 3, 4, 6 Shared See Section 18.18.1, “ESCR MSRs.” 3CBH 971 MSR_ALF_ESCR1 0, 1, 2, 3, 4, 6 Shared See Section 18.18.1, “ESCR MSRs.” 3CCH 972 MSR_CRU_ESCR2 0, 1, 2, 3, 4, 6 Shared See Section 18.18.1, “ESCR MSRs.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 26 Bit Description ENABLE_PEBS_OTH_THR. (R/W) Enables PEBS for the target logical processor when set; disables PEBS when clear (default). See Section 18.19.3, “IA32_PEBS_ENABLE MSR,” for an explanation of the target logical processor.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 Bit Description When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 404H 1028 IA32_MC1_CTL 0, 1, 2, 3, 4, 6 Shared See Section 14.3.2.1, “IA32_MCi_CTL MSRs.” 405H 1029 IA32_MC1_STATUS 0, 1, 2, 3, 4, 6 Shared See Section 14.3.2.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 40AH 1034 Register Name Fields and Flags Model Availability Shared/ Unique1 IA32_MC2_ADDR Bit Description See Section 14.3.2.3, “IA32_MCi_ADDR MSRs.” The IA32_MC2_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC2_STATUS register is clear.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 40FH 1039 Register Name Fields and Flags IA32_MC3_MISC Model Availability Shared/ Unique1 0, 1, 2, 3, 4, 6 Shared Bit Description See Section 14.3.2.4, “IA32_MCi_MISC MSRs.” The IA32_MC3_MISC MSR is either not implemented or does not contain additional information if the MISCV flag in the IA32_MC3_STATUS register is clear.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 Bit Description When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 480H 1152 IA32_VMX_BASIC 3, 4, 6 Unique Reporting Register of Basic VMX Capabilities. (R/O). see Table B-2. See Appendix G.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Dec 487H 1159 Register Name Fields and Flags IA32_VMX_CR0_ FIXED1 Model Availability Shared/ Unique1 3, 4, 6 Unique Bit Description Capability Reporting Register of CR0 Bits Fixed to 1. (R/O) See Appendix G.7, “VMX-Fixed Bits in CR0” and see Table B-2. 488H 1160 IA32_VMX_CR4_ FIXED0 3, 4, 6 Unique Capability Reporting Register of CR4 Bits Fixed to 0. (R/O) See Appendix G.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 Bit Description The MSRs at 680H-68FH, 6C0H6CfH are not available in processor releases before family 0FH, model 03H. These MSRs replace MSRs previously located at 1DBH1DEH.which performed the same function for early releases. See Section 18.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Register Name Fields and Flags Model Availability Shared/ Unique1 MSR_LASTBRANCH _12_TO_LIP 3, 4, 6 Unique MSR_LASTBRANCH _13_TO_LIP 3, 4, 6 MSR_LASTBRANCH _14_TO_LIP 3, 4, 6 MSR_LASTBRANCH _15_TO_LIP 3, 4, 6 C000_ 0080H IA32_EFER 3, 4, 6 Unique Extended Feature Enables. see Table B-2 C000_ 0081H IA32_STAR 3, 4, 6 Unique System Call Target Address.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-6. MSRs in the Pentium 4 and Intel Xeon Processors (Contd.) Register Address Hex Register Name Fields and Flags Dec Model Availability Shared/ Unique1 Bit Description NOTES 1. For HT-enabled processors, there may be more than one logical processors per physical unit. If an MSR is Shared, this means that one MSR is shared between logical processors. If an MSR is unique, this means that each logical processor has its own MSR. B.5.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-7. MSRs Unique to 64-bit Intel Xeon Processor MP with Up to an 8 MB L3 Cache (Contd.) Register Address Register Name Fields and Flags Model Availability Shared/ Unique Bit Description 107CFH MSR_IFSB_SNPQ1 3, 4 Shared IFSB SNPQ Event Control and Counter Register. (R/W) 107D0H MSR_EFSB_DRDY0 3, 4 Shared EFSB DRDY Event Control and Counter Register. (R/W) See Section 18.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-8. MSRs Unique to Intel Xeon Processor 7100 Series Register Address 107CCH Register Name Fields and Flags MSR_EMON_L3_CTR_C TL0 Model Availability 6 Shared/ Unique Shared Bit Description GBUSQ Event Control and Counter Register. (R/W) See Section 18.24, “Performance Monitoring on L3 and Caching Bus Controller sub-systems.” 107CDH MSR_EMON_L3_CTR_C TL1 6 Shared GBUSQ Event Control and Counter Register.
MODEL-SPECIFIC REGISTERS (MSRS) B.6 MSRS IN INTEL® CORE™ SOLO AND INTEL® CORE™ DUO PROCESSORS Model-specific registers (MSRs) for Intel Core Solo, Intel Core Duo processors, and Dual-core Intel Xeon processor LV are listed in Table B-9. The column “Shared/Unique” applies to Intel Core Duo processor. “Unique” means each processor core has a separate MSR, or a bit field in an MSR governs only a core independently.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 3 MCERR# Drive Enable. (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 4 Address Parity Enable. (R/W) 1 = Enabled; 0 = Disabled Note: Not all processor implements R/W. 6: 5 Reserved 7 BINIT# Driver Enable.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Dec 3AH 58 40H 64 Register Name Shared/ Unique IA32_FEATURE_ CONTROL Unique MSR_ LASTBRANCH_0 Unique Bit Description Control Features in IA-32 Processor. (R/W) see Table B-2 Last Branch Record 0.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Dec CDH 205 Register Name MSR_FSB_FREQ Shared/ Unique Shared Bit Description Scaleable Bus Speed. (RO) This field indicates the scaleable bus clock speed: 2:0 • 101B: 100 MHz (FSB 400) • 001B: 133 MHz (FSB 533) • 011B: 167 MHz (FSB 667) 133.33 MHz should be utilized if performing calculation with System Bus Speed when encoding is 101B.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 23 L2 Not Present. (RO) 0= 1= 63:24 L2 Present L2 Not Present Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 63:3 Reserved. 186H 390 IA32_ PERFEVTSEL0 Unique see Table B-2 187H 391 IA32_ PERFEVTSEL1 Unique see Table B-2 198H 408 IA32_PERF_STAT US Shared see Table B-2 199H 409 IA32_PERF_CTL Unique see Table B-2 19AH 410 IA32_CLOCK_ MODULATION Unique Clock Modulation.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec 2:0 3 Reserved. Unique Automatic Thermal Control Circuit Enable. (R/W) see Table B-2 6:4 7 Reserved Shared 9:8 10 Performance Monitoring Available. (R). see Table B-2 Reserved Shared FERR# Multiplexing Enable.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Register Name Shared/ Unique Bit Description Dec When this bit is clear (0, default), the processor does not change the VID signals or the bus to core ratio when the processor enters a thermal managed state.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Dec 1D9H 473 Register Name IA32_DEBUGCTL Shared/ Unique Unique Bit Description Debug Control. (R/W) Controls how several debug features are used. Bit definitions are discussed in the referenced section. 1DDH 1DEH 477 478 MSR_LER_FROM_ LIP Unique MSR_LER_TO_LIP Unique Last Exception Record From Linear IP.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Register Name Shared/ Unique Bit Description Hex Dec 401H 1025 IA32_MC0_ STATUS Unique See Section 14.3.2.2, “IA32_MCi_STATUS MSRS.” 402H 1026 IA32_MC0_ADDR Unique See Section 14.3.2.3, “IA32_MCi_ADDR MSRs.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Dec 40EH 1038 Register Name MSR_MC4_ADDR Shared/ Unique Unique Bit Description See Section 14.3.2.3, “IA32_MCi_ADDR MSRs.” The MSR_MC4_ADDR register is either not implemented or contains no address if the ADDRV flag in the MSR_MC4_STATUS register is clear.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Dec 482H 1154 Register Name Shared/ Unique IA32_VMX_PROCB Unique ASED_CTLS Bit Description Capability Reporting Register of Primary Processor-based VM-execution Controls. (R/O) See Appendix G.3, “VM-Execution Controls” (If CPUID.01H:ECX.[bit 9]) 483H 1155 IA32_VMX_EXIT_ CTLS Unique Capability Reporting Register of VM-exit Controls.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-9. MSRs in Intel Core Solo, Intel Core Duo Processors, and Dual-Core Intel Xeon Processor LV (Contd.) Register Address Hex Dec 48AH 1162 Register Name IA32_VMX_ VMCS_ENUM Shared/ Unique Unique Bit Description Capability Reporting Register of VMCS Field Enumeration. (R/O). See Appendix G.9, “VMCS Enumeration” (If CPUID.01H:ECX.[bit 9]) 48BH 1163 IA32_VMX_PROCB Unique ASED_CTLS2 Capability Reporting Register of Secondary Processor-based VM-execution Controls.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors Register Address Register Name Bit Description Hex Dec 0H 0 P5_MC_ADDR See Appendix B.9, “MSRs in Pentium Processors.” 1H 1 P5_MC_TYPE See Appendix B.9, “MSRs in Pentium Processors.” 10H 16 IA32_TIME_STAMP_ COUNTER See Section 18.11, “Time-Stamp Counter.” and see Table B-2 17H 23 IA32_PLATFORM_ID Platform ID. (R).
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Register Name Bit Description Dec 10 MCERR# Observation Enabled. (R/O) 1 = Enabled; 0 = Disabled Always 0 on the Pentium M processor. 11 Reserved. 12 BINIT# Observation Enabled. (R/O) 1 = Enabled; 0 = Disabled Always 0 on the Pentium M processor. 13 Reserved 14 1 MByte Power on Reset Vector. (R/O) 1 = 1 MByte; 0 = 4 GBytes Always 0 on the Pentium M processor. 15 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Dec 42H 66 Register Name MSR_LASTBRANCH_2 Bit Description Last Branch Record 2. (R/W) See description of MSR_LASTBRANCH_0. 43H 67 MSR_LASTBRANCH_3 Last Branch Record 3. (R/W) See description of MSR_LASTBRANCH_0. 44H 68 MSR_LASTBRANCH_4 Last Branch Record 4. (R/W) See description of MSR_LASTBRANCH_0. 45H 69 MSR_LASTBRANCH_5 Last Branch Record 5. (R/W) See description of MSR_LASTBRANCH_0.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Register Name Bit Description Dec 8 L2 Enabled. (R/W) 1 = L2 cache has been initialized 0 = Disabled (default) Until this bit is set the processor will not respond to the WBINVD instruction or the assertion of the FLUSH# input. 22:9 Reserved. 23 L2 Not Present. (RO) 0 = L2 Present 1 = L2 Not Present 63:24 179H 377 Reserved. IA32_MCG_CAP 7:0 Count.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Register Name Bit Description Dec 2 MCIP. When set, this bit indicates that a machine check has been generated. If a second machine check is detected while this bit is still set, the processor enters a shutdown state. Software should write this bit to 0 after processing a machine check exception. 63:3 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Register Name Bit Description Dec 3 Automatic Thermal Control Circuit Enable. (R/W) 1 = Setting this bit enables the thermal control circuit (TCC) portion of the Intel Thermal Monitor feature. This allows processor clocks to be automatically modulated based on the processor's thermal sensor operation. 0 = Disabled (default).
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Register Name Bit Description Dec 12 Precise Event Based Sampling Unavailable. (RO) 1= Processor does not support precise eventbased sampling (PEBS); 0 = PEBS is supported. The Pentium M processor does not support PEBS. 15:13 Reserved. 16 Enhanced Intel SpeedStep Technology Enable. (R/W) 1= Enhanced Intel SpeedStep Technology enabled.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Dec 1DDH 477 Register Name MSR_LER_TO_LIP Bit Description Last Exception Record To Linear IP. (R) This area contains a pointer to the target of the last branch instruction that the processor executed prior to the last exception that was generated or the last interrupt that was handled. See Section 18.9, “Last Branch, Interrupt, and Exception Recording (Pentium M Processors)” and Section 18.10.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Dec 406H 1030 Register Name IA32_MC1_ADDR Bit Description See Section 14.3.2.3, “IA32_MCi_ADDR MSRs.” The IA32_MC1_ADDR register is either not implemented or contains no address if the ADDRV flag in the IA32_MC1_STATUS register is clear. When not implemented in the processor, all reads and writes to this MSR will cause a generalprotection exception. 408H 1032 IA32_MC2_CTL See Section 14.3.2.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-10. MSRs in Pentium M Processors (Contd.) Register Address Hex Dec 600H 1536 Register Name IA32_DS_AREA Bit Description DS Save Area. (R/W). see Table B-2 Points to the DS buffer management area, which is used to manage the BTS and PEBS buffers. See Section 18.18.4, “Debug Store (DS) Mechanism.” 31:0 DS Buffer Management Area. Linear address of the first byte of the DS buffer management area. 63:32 B.8 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.) Register Address Hex Register Name Bit Description Dec 52:50 Platform Id. (R) Contains information concerning the intended platform for the processor. 52 0 0 0 0 1 1 1 1 1BH 27 51 0 0 1 1 0 0 1 1 50 0 1 0 1 0 1 0 1 Processor Flag 0 Processor Flag 1 Processor Flag 2 Processor Flag 3 Processor Flag 4 Processor Flag 5 Processor Flag 6 Processor Flag 7 56:53 L2 Cache Latency Read. 59:57 Reserved.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.) Register Address Hex Register Name Bit Description Dec 2 Response Error Checking Enable FRCERR Observation Enable. (R/W) 1 = Enabled 0 = Disabled 3 AERR# Drive Enable. (R/W) 1 = Enabled 0 = Disabled 4 BERR# Enable for Initiator Bus Requests. (R/W) 1 = Enabled 0 = Disabled 5 Reserved. 6 BERR# Driver Enable for Initiator Internal Errors. (R/W) 1 = Enabled 0 = Disabled 7 BINIT# Driver Enable.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.) Register Address Hex Register Name Bit Description Dec 13 In Order Queue Depth. (R) 1=1 0=8 14 1-MByte Power on Reset Vector. (R) 1 = 1MByte 0 = 4GBytes 15 FRC Mode Enable. (R) 1 = Enabled 0 = Disabled 17:16 APIC Cluster ID. (R) 19:18 System Bus Frequency. (R) 00 = 66MHz 10 = 100Mhz 01 = 133MHz 11 = Reserved 33H 51 21: 20 Symmetric Arbitration ID. (R) 25:22 Clock Frequency Ratio.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.) Register Address Hex Dec 8BH 139 Register Name Bit Description BIOS_SIGN/BBL_CR_D3[6 BIOS Update Signature Register or Chunk 3 data 3:0] register D[63:0].
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.) Register Address Hex Register Name Bit Description Dec 16 USER. Controls the counting of events at Privilege levels of 1, 2, and 3. 17 OS. Controls the counting of events at Privilege level of 0. 18 E. Occurrence/Duration Mode Select 1 = Occurrence 0 = Duration 19 PC. Enabled the signaling of performance counter overflow via BP0 pin 20 INT.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.) Register Address Hex Dec 187H 391 Register Name Bit Description PerfEvtSel1 (EVNTSEL1) 7:0 Event Select. Refer to Performance Counter section for a list of event encodings. 15:8 UMASK (Unit Mask). Unit mask register set to 0 to enable all count options. 16 USER. Controls the counting of events at Privilege levels of 1, 2, and 3. 17 OS. Controls the counting of events at Privilege level of 0 18 E.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.
MODEL-SPECIFIC REGISTERS (MSRS) Table B-11. MSRs in the P6 Family Processors (Contd.) Register Address Register Name Bit Description Hex Dec 404H 1028 MC1_CTL 405H 1029 MC1_STATUS 406H 1030 MC1_ADDR 407H 1031 MC1_MISC 408H 1032 MC2_CTL 409H 1033 MC2_STATUS 40AH 1034 MC2_ADDR 40BH 1035 MC2_MISC 40CH 1036 MC4_CTL 40DH 1037 MC4_STATUS Bit definitions same as MC0_STATUS, except bits 0, 4, 57, and 61 are hardcoded to 1.
MODEL-SPECIFIC REGISTERS (MSRS) B.9 MSRS IN PENTIUM PROCESSORS The following MSRs are defined for the Pentium processors. The P5_MC_ADDR, P5_MC_TYPE, and TSC MSRs (named IA32_P5_MC_ADDR, IA32_P5_MC_TYPE, and IA32_TIME_STAMP_COUNTER in the Pentium 4 processor) are architectural; that is, code that accesses these registers will run on Pentium 4 and P6 family processors without generating exceptions (see Section B.1, “Architectural MSRs”).
MODEL-SPECIFIC REGISTERS (MSRS) B-168 Vol.
APPENDIX C MP INITIALIZATION FOR P6 FAMILY PROCESSORS This appendix describes the MP initialization process for systems that use multiple P6 family processors. This process uses the MP initialization protocol that was introduced with the Pentium Pro processor (see Section 7.5, “Multiple-Processor (MP) Initialization”).
MP INITIALIZATION FOR P6 FAMILY PROCESSORS • Final Boot IPI (FIPI)—Initiates the BIOS initialization procedure for the BSP. This IPI is broadcast to all the processors on the system bus, but only the BSP responds to it. The BSP responds by beginning execution of the BIOS initialization code at the reset vector. • Startup IPI (SIPI)—Initiates the initialization procedure for an AP. The SIPI message contains a vector to the AP initialization code in the BIOS.
MP INITIALIZATION FOR P6 FAMILY PROCESSORS IA32_APIC_BASE MSR. If the vector and APIC ID do not match, the processor selects itself as an AP by entering the “wait for SIPI” state. (Note that in Figure 1, the BIPI from processor 1 is the first BIPI to be handled, so processor 1 becomes the BSP.) 5. The newly established BSP broadcasts an FIPI message to “all including self.” The FIPI is guaranteed to be handled only after the completion of the BIPIs that were issued by the non-BSP processors.
MP INITIALIZATION FOR P6 FAMILY PROCESSORS priate. At the completion of the initialization procedure, the AP executes a CLI instruction (to clear the IF flag in the EFLAGS register) and halts itself. 11.
APPENDIX D PROGRAMMING THE LINT0 AND LINT1 INPUTS The following procedure describes how to program the LINT0 and LINT1 local APIC pins on a processor after multiple processors have been booted and initialized (as described in Appendix C, “MP Initialization For P6 Family Processors,” and Appendix D, “Programming the LINT0 and LINT1 Inputs.” In this example, LINT0 is programmed to be the ExtINT pin and LINT1 is programmed to be the NMI pin. D.
PROGRAMMING THE LINT0 AND LINT1 INPUTS 4. Program LVT2 as NMI, which delivers the signal on the NMI signal of all processor cores listed in the destination. MOV ESI, LVT2 MOV EAX, [ESI] AND EAX, 0FFFE58FFH ; mask off bits 8-10 and 15 OR EAX, 000000400H ; Bit 16=0 for not masked, Bit 15=0 edge ; triggered, Bit 13=0 for high active input ; polarity, Bits 8-10 are 100b for NMI MOV [ESI], EAX; Write to LVT2 ;Unmask 8259 interrupts and allow NMI. D-2 Vol.
APPENDIX E INTERPRETING MACHINE-CHECK ERROR CODES Encoding of the model-specific and other information fields is different across processor families. The differences are documented in the following sections. E.1 INCREMENTAL DECODING INFORMATION: PROCESSOR FAMILY 06H MACHINE ERROR CODES FOR MACHINE CHECK Section E.1 provides information for interpreting additional model-specific fields for external bus errors relating to processor family 06H.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-2. Incremental Decoding Information: Processor Family 06H Machine Error Codes For Machine Check Type Bit No.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-2. Incremental Decoding Information: Processor Family 06H Machine Error Codes For Machine Check (Contd.) Type Bit No.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-2. Incremental Decoding Information: Processor Family 06H Machine Error Codes For Machine Check (Contd.) Type Bit No. Bit Function Bit Description The ROB time-out counter is prescaled by the 8-bit PIC timer which is a divide by 128 of the bus clock the bus clock is 1:2, 1:3, 1:4 of the core clock). When a carry out of the 8-bit PIC timer occurs, the ROB counter counts up by one. While this bit is asserted, it cannot be overwritten by another error.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-2. Incremental Decoding Information: Processor Family 06H Machine Error Codes For Machine Check (Contd.) Type Status register validity indicators1 Bit No. Bit Function Bit Description 55-56 Reserved Reserved. 57-63 NOTES: 1. These fields are architecturally defined. Refer to Chapter 14, “Machine-Check Architecture,” for more information. E.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-4. Incremental Bus Error Codes of Machine Check for Processors Based on Intel Core Microarchitecture Type Bit No.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-4. Incremental Bus Error Codes of Machine Check for Processors Based on Intel Core Microarchitecture Type Bit No.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-4. Incremental Bus Error Codes of Machine Check for Processors Based on Intel Core Microarchitecture Type Bit No. Bit Function Bit Description The ROB time-out counter is prescaled by the 8-bit PIC timer which is a divide by 128 of the bus clock the bus clock is 1:2, 1:3, 1:4 of the core clock). When a carry out of the 8-bit PIC timer occurs, the ROB counter counts up by one. While this bit is asserted, it cannot be overwritten by another error.
INTERPRETING MACHINE-CHECK ERROR CODES E.2.1 Model-Specific Machine Check Error Codes for Intel Xeon Processor 7400 Series Intel Xeon processor 7400 series has machine check register banks that generally follows the description of Chapter 14 and Section E.2. Additional error codes specific to Intel Xeon processor 7400 series is describe in this section. MC4_STATUS[63:0] is the main error logging for the processor’s L3 and front side bus errors for Intel Xeon processor 7400 series.
INTERPRETING MACHINE-CHECK ERROR CODES The Bold faced binary encodings are the only encodings used by the processor for MC4_STATUS[15:0]. E.2.2 Intel Xeon Processor 7400 Model Specific Error Code Field E.2.2.1 Processor Model Specific Error Code Field Type B: Bus and Interconnect Error Note: The Model Specific Error Code field in MC6_STATUS (bits 31:16) Table E-6.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-7.
INTERPRETING MACHINE-CHECK ERROR CODES E.3.1 QPI Machine Check Errors Table E-8. QPI Machine Check Error codes for IA32_MC0_STATUS and IA32_MC1_STATUS Type Bit No.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-9. QPI Machine Check Error codes for IA32_MC0_MISC and IA32_MC1_MISC Type Bit No. Bit Function Bit Description 7-0 QPI Opcode Message class and opcode from the packet with the error 13-8 RTId QPI Request Transaction ID 15-14 Reserved Reserved 18-16 RHNID QPI Requestor/Home Node ID 23-19 Reserved Reserved 24 IIB QPI Interleave/Head Indication Bit Model specific errors1 NOTES: 1. Which of these fields are valid depends on the error type.
INTERPRETING MACHINE-CHECK ERROR CODES NOTES: 1. These fields are architecturally defined. Refer to Chapter 14, “Machine-Check Architecture,” for more information. E.3.3 Memory Controller Errors Table E-11. Incremental Memory Controller Error Codes of Machine Check for IA32_MC8_STATUS Type Bit No.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-12. Incremental Memory Controller Error Codes of Machine Check Type Bit No. Bit Function Bit Description 7-0 RTId Transaction Tracker ID 15-8 Reserved Reserved 17-16 DIMM DIMM ID which got the error 19-18 Channel Channel ID which got the error 31-20 Reserved Reserved 63-32 Syndrome ECC Syndrome Model specific errors1 NOTES: 1. Which of these fields are valid depends on the error type. E.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-13. Incremental Decoding Information: Processor Family 0FH Machine Error Codes For Machine Check (Contd.) Type Bit No. Bit Function Bit Description 20 Processor Signature = 00000F04H.
INTERPRETING MACHINE-CHECK ERROR CODES logging for the processor’s L3 and front side bus errors. It supports the L3 Errors, Bus and Interconnect Errors Compound Error Codes in the MCA Error Code Field. Table E-14. MCi_STATUS Register Bit Definition Bit Field Name Bits Description MCA_Error_Code 15:0 Specifies the machine check architecture defined error code for the machine check error condition detected.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-14. MCi_STATUS Register Bit Definition (Contd.) Bit Field Name Bits Description UC 61 Error uncorrected flag indicates that the processor did not correct the error condition. When clear, this flag indicates that the processor was able to correct the event condition. OVER 62 Machine check overflow flag indicates that a machine check error occurred while the results of a previous error were still in the register bank (i.e.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-15. Incremental MCA Error Code for Intel Xeon Processor MP 7100 (Contd.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-16. Other Information Field Bit Definition Bit Field Name Bits Description 39:32 8-bit Correct able Event Count Holds a count of the number of correctable events since cold reset. This is a saturating counter; the counter begins at 1 (with the first error) and saturates at a count of 255. 41:40 MC4_MI The value in this field specifies the format of information in the SC MC4_MISC register. Currently, only two values are defined.
INTERPRETING MACHINE-CHECK ERROR CODES E.4.3 Processor Model Specific Error Code Field E.4.3.1 MCA Error Type A: L3 Error Note: The Model Specific Error Code field in MC4_STATUS (bits 31:16) Table E-17.
INTERPRETING MACHINE-CHECK ERROR CODES Table E-18.
INTERPRETING MACHINE-CHECK ERROR CODES E.4.3.3 Processor Model Specific Error Code Field Type C: Cache Bus Controller Error Table E-19.
INTERPRETING MACHINE-CHECK ERROR CODES All errors - except for the correctable ECC types - in this table are uncorrectable. The correctable ECC events may supply the ECC syndrome in the Other_Info field of the MC4_STATUS MSR.. Table E-20. Decoding Family 0FH Machine Check Codes for Cache Hierarchy Errors Type Bit No.
INTERPRETING MACHINE-CHECK ERROR CODES NOTES: 1. These fields are architecturally defined. Refer to Chapter 14, “Machine-Check Architecture,” for more information. Vol.
INTERPRETING MACHINE-CHECK ERROR CODES E-26 Vol.
APPENDIX F APIC BUS MESSAGE FORMATS This appendix describes the message formats used when transmitting messages on the serial APIC bus. The information described here pertains only to the Pentium and P6 family processors. F.1 BUS MESSAGE FORMATS The local and I/O APICs transmit three types of messages on the serial APIC bus: EOI message, short message, and non-focused lowest priority message. The purpose of each type of message and its format are described below. F.
APIC BUS MESSAGE FORMATS The checksum is computed for cycles 6 through 9. It is a cumulative sum of the 2-bit (Bit1:Bit0) logical data values. The carry out of all but the last addition is added to the sum. If any APIC computes a different checksum than the one appearing on the bus in cycle 10, it signals an error, driving 11 on the APIC bus during cycle 12. In this case, the APICs disregard the message. The sending APIC will receive an appropriate error indication (see Section 9.6.
APIC BUS MESSAGE FORMATS If the physical delivery mode is being used, then cycles 15 and 16 represent the APIC ID and cycles 13 and 14 are considered don't care by the receiver. If the logical delivery mode is being used, then cycles 13 through 16 are the 8-bit logical destination field. For shorthands of “all-incl-self” and “all-excl-self,” the physical delivery mode and an arbitration priority of 15 (D0:D3 = 1111) are used.
APIC BUS MESSAGE FORMATS Table F-3. Non-Focused Lowest Priority Message (34 Cycles) (Contd.
APIC BUS MESSAGE FORMATS priority arbitration, drives cycle 33. An error in cycle 33 will force the sender to resend the message. F.2.3 APIC Bus Status Cycles Certain cycles within an APIC bus message are status cycles. During these cycles the status flags (A:A) and (A1:A1) are examined. Table F-4 shows how these status flags are interpreted, depending on the current delivery mode and existence of a focus processor. Table F-4.
APIC BUS MESSAGE FORMATS Table F-4. APIC Bus Status Cycles Interpretation (Contd.
APPENDIX G VMX CAPABILITY REPORTING FACILITY The ability of a processor to support VMX operation and related instructions is indicated by CPUID.1:ECX.VMX[bit 5] = 1. A value 1 in this bit indicates support for VMX features. Support for specific features detailed in Chapter 20 and other VMX chapters is determined by reading values from a set of capability MSRs. These MSRs are indexed starting at MSR address 480H.
VMX CAPABILITY REPORTING FACILITY The first processors to support VMX operation use the write-back type. The values used are given in Table G-1. Table G-1. Memory Types Used For VMCS Access Value(s) Field 0 Uncacheable (UC) 1–5 Not used 6 Write Back (WB) 7–15 Not used If software needs to access these data structures (e.g., to modify the contents of the MSR bitmaps), it can configure the paging structures to map them into the linear-address space.
VMX CAPABILITY REPORTING FACILITY Software can discover the default setting of a reserved control by consulting the appropriate VMX capability MSR (see Appendix G.3 through Appendix G.5). Future processors may define new functionality for one or more reserved controls. Such processors would allow each newly defined control to be set either to 0 or to 1. Software that does not desire a control’s new functionality should set the control to its default setting.
VMX CAPABILITY REPORTING FACILITY the IA32_VMX_PINBASED_CTLS MSR are always read as 1. The treatment of these controls by VM entry is determined by bit 55 in the IA32_VMX_BASIC MSR: — If bit 55 in the IA32_VMX_BASIC MSR is read as 0, VM entry fails if any pinbased VM-execution control in the default1 class is 0. — If bit 55 in the IA32_VMX_BASIC MSR is read as 1, the IA32_VMX_TRUE_PINBASED_CTLS MSR (see below) reports which of the pin-based VM-execution controls in the default1 class can be 0 on VM entry.
VMX CAPABILITY REPORTING FACILITY — If bit 55 in the IA32_VMX_BASIC MSR is read as 0, VM entry fails if any of the primary processor-based VM-execution controls in the default1 class is 0. — If bit 55 in the IA32_VMX_BASIC MSR is read as 1, the IA32_VMX_TRUE_PROCBASED_CTLS MSR (see below) reports which of the primary processor-based VM-execution controls in the default1 class can be 0 on VM entry. • Bits 63:32 indicate the allowed 1-settings of these controls.
VMX CAPABILITY REPORTING FACILITY 1, bit X is 1 in the secondary processor-based VM-execution controls, and bit 32+X is 0 in this MSR. The IA32_VMX_PROCBASED_CTLS2 MSR exists only on processors that support the 1-setting of the “activate secondary controls” VM-execution control (only if bit 63 of the IA32_VMX_PROCBASED_CTLS MSR is 1). G.4 VM-EXIT CONTROLS The IA32_VMX_EXIT_CTLS MSR (index 483H) reports on the allowed settings of most of the VM-exit controls (see Section 20.7.
VMX CAPABILITY REPORTING FACILITY G.5 VM-ENTRY CONTROLS The IA32_VMX_ENTRY_CTLS MSR (index 484H) reports on the allowed settings of most of the VM-entry controls (see Section 20.8.1): • Bits 31:0 indicate the allowed 0-settings of these controls. VM entry fails if bit X is 0 in the VM-entry controls and bit X is 1 in this MSR. Exceptions are made for the VM-entry controls in the default1 class (see Appendix G.2).
VMX CAPABILITY REPORTING FACILITY VMX-preemption timer (if it is active) counts down by 1 every time bit X in the TSC changes due to a TSC increment. • Bits 8:6 report, as a bitmap, the activity states supported by the implementation: — Bit 6 reports (if set) the support for activity state 1 (HLT). — Bit 7 reports (if set) the support for activity state 2 (shutdown). — Bit 8 reports (if set) the support for activity state 3 (wait-for-SIPI).
VMX CAPABILITY REPORTING FACILITY larly, if bit X is 0 in IA32_VMX_CR4_FIXED1, then that bit of CR4 is fixed to 0 in VMX operation. It is always the case that, if bit X is 1 in IA32_VMX_CR4_FIXED0, then that bit is also 1 in IA32_VMX_CR4_FIXED1; if bit X is 0 in IA32_VMX_CR4_FIXED1, then that bit is also 0 in IA32_VMX_CR4_FIXED0. Thus, each bit in CR4 is either fixed to 0 (with value 0 in both MSRs), fixed to 1 (1 in both MSRs), or flexible (0 in IA32_VMX_CR4_FIXED0 and 1 in IA32_VMX_CR4_FIXED1). G.
VMX CAPABILITY REPORTING FACILITY • If bit 16 is read as 1, the logical processor allows software to configure EPT PDEs to map a 2-Mbyte page (by setting bit 7). • Support for the INVEPT instruction (see Chapter 5 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 2B and Section 24.3.3.1). — If bit 20 is read as 1, the INVEPT instruction is supported. — If bit 25 is read as 1, the single-context INVEPT type is supported.
APPENDIX H FIELD ENCODING IN VMCS Every component of the VMCS is encoded by a 32-bit field that can be used by VMREAD and VMWRITE. Section 20.10.2 describes the structure of the encoding space (the meanings of the bits in each 32-bit encoding). This appendix enumerates all fields in the VMCS and their encodings. Fields are grouped by width (16-bit, 32-bit, etc.) and type (guest-state, host-state, etc.) H.1 16-BIT FIELDS A value of 0 in bits 14:13 of an encoding indicates a 16-bit field.
FIELD ENCODING IN VMCS Table H-2. Encodings for 16-Bit Guest-State Fields (0000_10xx_xxxx_xxx0B) Field Name Index Encoding Guest CS selector 000000001B 00000802H Guest SS selector 000000010B 00000804H Guest DS selector 000000011B 00000806H Guest FS selector 000000100B 00000808H Guest GS selector 000000101B 0000080AH Guest LDTR selector 000000110B 0000080CH Guest TR selector 000000111B 0000080EH H.1.
FIELD ENCODING IN VMCS H.2.1 64-Bit Control Fields A value of 0 in bits 11:10 of an encoding indicates a control field. These fields are distinguished by their index value in bits 9:1. Table H-4 enumerates the 64-bit control fields. Table H-4.
FIELD ENCODING IN VMCS 4. This field exists only on processors that support the 1-setting of the “enable EPT” VM-execution control. H.2.2 64-Bit Read-Only Data Field A value of 1 in bits 11:10 of an encoding indicates a read-only data field. These fields are distinguished by their index value in bits 9:1. There is only one such 64-bit field as given in Table H-5.(As with other 64-bit fields, this one has two encodings.) Table H-5.
FIELD ENCODING IN VMCS Table H-6. Encodings for 64-Bit Guest-State Fields (0010_10xx_xxxx_xxxAb) Field Name Index Encoding Guest PDPTE2 (full) 000000111B 0000280EH Guest PDPTE2 (high) 000000111B 0000280FH Guest PDPTE3 (full) 000001000B 00002810H Guest PDPTE3 (high) 000001000B 00002811H H.2.4 64-Bit Host-State Fields A value of 3 in bits 11:10 of an encoding indicates a field in the host-state area. These fields are distinguished by their index value in bits 9:1.
FIELD ENCODING IN VMCS Table H-8. Encodings for 32-Bit Control Fields (0100_00xx_xxxx_xxx0B) (Contd.
FIELD ENCODING IN VMCS Table H-9. Encodings for 32-Bit Read-Only Data Fields (0100_01xx_xxxx_xxx0B) Field Name Index Encoding IDT-vectoring information field 000000100B 00004408H IDT-vectoring error code 000000101B 0000440AH VM-exit instruction length 000000110B 0000440CH VM-exit instruction information 000000111B 0000440EH H.3.3 32-Bit Guest-State Fields A value of 2 in bits 11:10 of an encoding indicates a field in the guest-state area.
FIELD ENCODING IN VMCS Table H-10. Encodings for 32-Bit Guest-State Fields (0100_10xx_xxxx_xxx0B) (Contd.) Field Name Index Encoding Guest activity state 000010011B 00004826H Guest SMBASE 000010100B 00004828H Guest IA32_SYSENTER_CS 000010101B 0000482AH VMX-preemption timer value 000010111B 0000482EH The limit fields for GDTR and IDTR are defined to be 32 bits in width even though these fields are only 16-bits wide in the Intel 64 and IA-32 architectures.
FIELD ENCODING IN VMCS Table H-12. Encodings for Natural-Width Control Fields (0110_00xx_xxxx_xxx0B) Field Name Index Encoding CR4 read shadow 000000011B 00006006H CR3-target value 0 000000100B 00006008H CR3-target value 1 000000101B 0000600AH CR3-target value 2 000000110B 0000600CH 000000111B 0000600EH CR3-target value 31 NOTES: 1. If a future implementation supports more than 4 CR3-target values, they will be encoded consecutively following the 4 encodings given here. H.4.
FIELD ENCODING IN VMCS Table H-14. Encodings for Natural-Width Guest-State Fields (0110_10xx_xxxx_xxx0B) (Contd.
FIELD ENCODING IN VMCS ates the natural-width host-state fields. Table H-15.
FIELD ENCODING IN VMCS H-12 Vol.
APPENDIX I VMX BASIC EXIT REASONS Every VM exit writes a 32-bit exit reason to the VMCS (see Section 20.9.1). Certain VM-entry failures also do this (see Section 22.7). The low 16 bits of the exit-reason field form the basic exit reason which provides basic information about the cause of the VM exit or VM-entry failure. Table I-1 lists values for basic exit reasons and explains their meaning. Entries apply to VM exits, unless otherwise noted. Table I-1.
VMX BASIC EXIT REASONS Table I-1. Basic Exit Reasons (Contd.) Basic Exit Reason Description 11 GETSEC. Guest software attempted to execute GETSEC. 12 HLT. Guest software attempted to execute HLT and the “HLT exiting” VM-execution control was 1. 13 INVD. Guest software attempted to execute INVD. 14 INVLPG. Guest software attempted to execute INVLPG and the “INVLPG exiting” VM-execution control was 1. 15 RDPMC.
VMX BASIC EXIT REASONS Table I-1. Basic Exit Reasons (Contd.) Basic Exit Reason Description 31 RDMSR. Guest software attempted to execute RDMSR and either: 1: The “use MSR bitmaps” VM-execution control was 0. 2: The value of RCX is neither in the range 00000000H – 00001FFFH nor in the range C0000000H – C0001FFFH. 3: The value of RCX was in the range 00000000H – 00001FFFH and the nth bit in read bitmap for low MSRs is 1, where n was the value of RCX.
VMX BASIC EXIT REASONS Table I-1. Basic Exit Reasons (Contd.) Basic Exit Reason Description 44 APIC access. Guest software attempted to access memory at a physical address on the APIC-access page and the “virtualize APIC accesses” VM-execution control was 1 (see Section 21.2). 46 Access to GDTR or IDTR. Guest software attempted to execute LGDT, LIDT, SGDT, or SIDT and the “descriptor-table exiting” VM-execution control was 1. 47 Access to LDTR or TR.
INDEX FOR VOLUMES 3A & 3B Numerics 16-bit code, mixing with 32-bit code, 16-1 32-bit code, mixing with 16-bit code, 16-1 32-bit physical addressing description of, 3-25 overview, 3-7 36-bit physical addressing overview, 3-7 using PSE-36 paging mechanism, 3-40 using the PAE paging mechanism, 3-34 64-bit mode call gates, 4-20 code segment descriptors, 4-5, 8-16 control registers, 2-17 CR8 register, 2-18 D flag, 4-5 debug registers, 2-9 descriptors, 4-5, 4-7 DPL field, 4-5 exception handling, 5-22 external int
INDEX AND instruction, 7-5 APIC, 9-21, 9-22, 9-25 APIC bus arbitration mechanism and protocol, 9-53, 9-64 bus message format, 9-65, F-1 diagram of, 9-3, 9-4 EOI message format, 9-34, F-1 message formats, F-1 nonfocused lowest priority message, F-3 short message format, F-2 SMI message, 25-3 status cycles, F-5 structure of, 9-5 See also local APIC APIC flag, CPUID instruction, 9-10 APIC ID, 9-21, 9-29, 9-50 APIC (see I/O APIC or Local APIC) ARPL instruction, 2-30, 4-38 not supported in 64-bit mode, 2-30 Ato
INDEX description of, 8-1 performing, 8-2 Bus errors detected with MCA, 14-28 hold, 17-42 locking, 7-3, 17-42 Byte order, 1-6 C C (conforming) flag, segment descriptor, 4-16 C1 flag, x87 FPU status word, 17-10, 17-20 C2 flag, x87 FPU status word, 17-11 Cache control, 10-30 adaptive mode, L1 Data Cache, 10-26 cache management instructions, 10-25, 10-26 cache mechanisms in IA-32 processors, 17-34 caching terminology, 10-7 CD flag, CR0 control register, 10-15, 17-26 choosing a memory type, 10-12 CPUID featur
INDEX CC0 and CC1 (counter control) fields, CESR MSR (Pentium processor), 18-149 CD (cache disable) flag, CR0 control register, 2-19, 8-8, 10-15, 10-17, 10-20, 10-24, 10-44, 10-45, 17-25, 17-26, 17-34 CESR (control and event select) MSR (Pentium processor), 18-148 CLFLSH feature flag, CPUID instruction, 8-10 CLFLUSH instruction, 2-21, 7-9, 8-10, 10-25 CLI instruction, 5-10 Clocks counting processor clocks, 18-124 Hyper-Threading Technology, 18-124 nominal CPI, 18-124 non-halted clockticks, 18-124 non-halte
INDEX introduction to, 2-9 invalidation of non-global TLBs, 3-51 loading during initialization, 8-13 memory management, 2-8 page directory base address, 2-8 page table base address, 2-7 CR4 control register description of, 2-17 enabling control functions, 17-2 inclusion in IA-32 architecture, 17-24 introduction to, 2-9 VMX usage of, 19-4 CR8 register, 2-9 64-bit mode, 2-18 compatibility mode, 2-18 description of, 2-18 task priority level bits, 2-26 when available, 2-18 CS register, 17-14 state following in
INDEX D/B (default operation size/default stack pointer size and/or upper bound) flag, segment descriptor, 3-15, 4-6 E E (edge detect) flag PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family), 18-48 E (edge detect) flag, PerfEvtSel0 and PerfEvtSel1 MSRs (P6 family processors), 18-145 E (expansion direction) flag segment descriptor, 4-2, 4-6 E (MTRRs enabled) flag IA32_MTRR_DEF_TYPE MSR, 10-33 EFLAGS register identifying 32-bit processors, 17-8 introduction to, 2-9 new flags, 17-7 saved in TSS, 6-5 system flags,
INDEX priorities among simultaneous exceptions and interrupts, 5-11 priority of, 17-29 priority of, x87 FPU exceptions, 17-14 reference information on all exceptions, 5-27 reference information, 64-bit mode, 5-22 restarting a task or program, 5-7 segment not present, 17-16 simple error codes, 14-24 sources of, 5-5 summary of, 5-3 vectors, 5-2 Executable, 3-15 Execute-disable bit capability conditions for, 4-43 CPUID flag, 4-43 detecting and enabling, 4-43 exception handling, 4-47 page sizes, 4-43 page-faul
INDEX debug registers, 7-42 description of, 7-35, 17-5 detecting, 7-51, 7-56, 7-57, 7-58 executing multiple threads, 7-38 execution-based timing loops, 7-73 external signal compatibility, 7-46 halting logical processors, 7-71 handling interrupts, 7-38 HLT instruction, 7-65 IA32_MISC_ENABLE MSR, 7-43, 7-48 initializing IA-32 processors with, 7-37 introduction of into the IA-32 architecture, 17-5 local a, 7-40 local APIC functionality in logical processor, 7-41 logical processors, identifying, 7-52 machine c
INDEX exceptions during initialization, 8-15 feature-enable register, 2-10 gates, 2-6 global and local descriptor tables, 2-5 IA32_EFER MSR, 2-10, 4-43 initialization process, 8-14 interrupt stack table, 5-26 interrupts and exceptions, 2-7 IRET instruction, 5-25 L flag, 3-16, 4-5 logical address, 3-9 MOV CRn, 8-14 MTRR calculations, 10-38 NXE bit, 4-43 PAE mechanism, 3-24 PAE paging, 3-42 page level protection, 4-43 paging, 2-8, 3-42 PDE tables, 4-44 PDP tables, 4-44 PML4 tables, 3-42, 4-44 PTE tables, 4-4
INDEX IA32_PLATFORM_ID, B-36, B-55, B-68, B-90, B-132, B-146, B-155 IA32_STAR MSR, 4-32 IA32_STAR_CS MSR, 2-10 IA32_STATUS MSR, B-96 IA32_SYSCALL_FLAG_MASK MSR, 2-10 IA32_SYSENTER_CS MSR, 4-31, 4-32, 23-26, B-96 IA32_SYSENTER_EIP MSR, 4-31, 23-33, B-96 IA32_SYSENTER_ESP MSR, 4-31, 23-33, B-96 IA32_TERM_CONTROL MSR, B-43, B-61, B-71 IA32_THERM_INTERRUPT MSR, 13-14, 13-17, 13-20, B-102 FORCPR# interrupt enable bit, 13-20 high-temperature interrupt enable bit, 13-20 low-temperature interrupt enable bit, 13-20
INDEX location of software-initialization code, 8-6 machine-check initialization, 14-22 model and stepping information, 8-5 multiple-processor (MP) bootup sequence for P6 family processors, C-1 multitasking environment, 8-14 overview, 8-1 paging, 8-13 processor state after reset, 8-2 protected mode, 8-11 real-address mode, 8-10 RESET# pin, 8-1 setting up exception- and interrupt-handling facilities, 8-13 x87 FPU, 8-6 INIT# pin, 5-4, 8-2 INIT# signal, 2-31, 19-5 INLVPG instruction, 21-3 INS instruction, 18-
INDEX handling through a task gate in virtual-8086 mode , 15-21 handling through a trap or interrupt gate in virtual-8086 mode, 15-18 IA-32e mode, 2-7, 2-17 IDT, 5-12 IDTR, 2-17 initializing for protected-mode operation, 8-13 interrupt descriptor table register (see IDTR) interrupt descriptor table (see IDT) list of, 5-3, 15-8 local APIC, 9-1 maskable hardware interrupts, 2-13 masking maskable hardware interrupts, 5-9 masking when switching stack segments, 5-11 message signalled interrupts, 9-65 on-die sen
INDEX MESI cache protocol, 10-13 LAR instruction, 2-30, 4-35 Larger page sizes introduction of, 17-36 support for, 17-26 Last branch interrupt & exception recording description of, 18-14, 18-20, 18-22, 18-26, 18-35, 18-37, 18-39 record stack, 18-17, 18-23, 18-24, 18-27, 18-28, 18-30, 18-36, 18-38, B-109, B-110, B-124 record top-of-stack pointer, 18-18, 18-23, 18-24, 18-36, 18-38 LastBranchFromIP MSR, 18-40, 18-41 LastBranchToIP MSR, 18-40, 18-41 LastExceptionFromIP MSR, 18-18, 18-30, 18-36, 18-38, 18-40, 1
INDEX shared resources, 7-49 SMI interrupt, 25-3 spurious interrupt, 9-63 spurious-interrupt vector register, 9-11 state after a software (INIT) reset, 9-15 state after INIT-deassert message, 9-15 state after power-up reset, 9-14 state of, 9-64 SVR (spurious-interrupt vector register), 9-11 timer, 9-36 timer generated interrupts, 9-2 TMR (trigger mode register), 9-60 valid interrupts, 9-33 version register, 9-15 Local descriptor table register (see LDTR) Local descriptor table (see LDT) Local vector table
INDEX WB (write back), 10-10 WC (write combining), 10-9 WP (write protected), 10-10 writing values across pages with different memory types, 10-23 WT (write through), 10-9 MemTypeGet() function, 10-41 MemTypeSet() function, 10-42 MESI cache protocol, 10-7, 10-13 Message address register, 9-66 Message data register format, 9-67 Message signalled interrupts message address register, 9-65 message data register format, 9-65 MFENCE instruction, 2-21, 7-9, 7-22, 7-23, 7-25 Microcode update facilities authenticat
INDEX MSR_IFSB_IBUSQ1 MSR, 18-130 MSR_IFSB_ISNPQ0 MSR, 18-131 MSR_IFSB_ISNPQ1 MSR, 18-131 MSR_LASTBRANCH _TOS, B-109 MSR_LASTBRANCH_n MSR, 18-24, 18-27, 18-28, 18-30, B-110 MSR_LASTBRANCH_n_FROM_LIP MSR, 18-17, 18-24, 18-27, 18-28, 18-30, B-124 MSR_LASTBRANCH_n_TO_LIP, 18-25 MSR_LASTBRANCH_n_TO_LIP MSR, 18-17, 18-27, 18-28, 18-30, B-126 MSR_LASTBRANCH_TOS MSR, 18-24, 18-27 MSR_LER_FROM_LIP MSR, 18-18, 18-30, 18-36, 18-38, B-108 MSR_LER_TO_LIP MSR, 18-18, 18-30, 18-36, 18-38, B-108 MSR_PEBS_ MATRIX_VERT MSR
INDEX Nominal CPI method, 18-125 Nonconforming code segments accessing, 4-16 C (conforming) flag, 4-16 description of, 3-18 Non-halted clockticks, 18-125 setting up counters, 18-125 Non-Halted CPI method, 18-125 Nonmaskable interrupt (see NMI) Non-precise event-based sampling defined, 18-91 used for at-retirement counting, 18-114 writing an interrupt service routine for, 18-34 Non-retirement events, 18-90, A-134 Non-sleep clockticks, 18-125 setting up counters, 18-125 NOT instruction, 7-5 Notation bit and
INDEX introduction to, 10-46 memory types that can be encoded with, 10-48 MSR, 10-19 precedence of cache controls, 10-20 programming, 10-49 selecting a memory type with, 10-48 Page base address field, page-table entries, 3-29, 3-42 Page directories, 2-8 Page directory base address, 3-28 base address (PDBR), 6-6 description of, 3-24 introduction to, 2-8 overview, 3-2 setting up during initialization, 8-13 Page directory pointers, 2-8 Page frame (see Page) Page tables, 2-8 description of, 3-24 introduction t
INDEX time-stamp counter, 18-42 Pentium II processor, 1-2 Pentium III processor, 1-2 Pentium M processor last branch, interrupt, and exception recording, 18-37 MSRs supported by, B-145 time-stamp counter, 18-42 Pentium Pro processor, 1-2 Pentium processor, 1-1, 17-9 compatibility with MCA, 14-1 list of performance-monitoring events, A-204 MSR supported by, B-167 performance-monitoring counters, 18-148 PerfCtr0 and PerfCtr1 MSRs (P6 family processors), 18-144, 18-146 PerfEvtSel0 and PerfEvtSel1 MSRs (P6 fam
INDEX microcode update facilities, 8-36 overview of, 7-1 See also: multiple-processor management Processor ordering, description of, 7-8 PROCHOT# log, 13-19 PROCHOT# or FORCEPR# event bit, 13-18 Protected mode IDT initialization, 8-13 initialization for, 8-11 mixing 16-bit and 32-bit code modules, 16-2 mode switching, 8-17 PE flag, CR0 register, 4-1 switching to, 4-1, 8-17 system data structures required during initialization, 8-11, 8-12 Protection combining segment & page-level, 4-41 disabling, 4-1 enabli
INDEX page-directory entry, 4-2, 4-3, 4-40 page-table entries, 3-31 page-table entry, 4-2, 4-3, 4-40 R/W0-R/W3 (read/write) fields DR7 register, 17-27, 18-5 S S (descriptor type) flag segment descriptor, 3-14, 3-16, 4-2, 4-7 SBB instruction, 7-5 Segment descriptors access rights, 4-35 access rights, invalid values, 17-26 automatic bus locking while updating, 7-4 base address fields, 3-14 code type, 4-3 data type, 4-3 description of, 2-5, 3-13 DPL (descriptor privilege level) field, 3-14, 4-2 D/B (default
INDEX switching to SMM, 25-3 synchronous and asynchronous, 25-15 VMX treatment of, 25-22 SMI# pin, 5-4, 25-3, 25-20 SMM asynchronous SMI, 25-15 auto halt restart, 25-18 executing the HLT instruction in, 25-19 exiting from, 25-4 handling exceptions and interrupts, 25-14 introduction to, 2-10 I/O instruction restart, 25-20 I/O state implementation, 25-15 native 16-bit mode, 16-1 overview of, 25-1 revision identifier, 25-17 revision identifier field, 25-17 switching to, 25-3 switching to from other operating
INDEX exceptions/interrupts when switching stacks, 5-11 IA-32e mode, 5-25 inter-privilege level calls, 4-25 Stack-fault exception (#SS), 17-40 Stacks error code pushes, 17-38 faults, 5-48 for privilege levels 0, 1, and 2, 4-26 interlevel RET/IRET from a 16-bit interrupt or call gate, 17-38 interrupt stack table, 64-bit mode, 5-26 management of control transfers for 16- and 32-bit procedure calls, 16-5 operation on pushes and pops, 17-37 pointers to in TSS, 6-6 stack switching, 4-25, 5-25 usage on call to e
INDEX TF (trap) flag, EFLAGS register, 2-12, 5-19, 15-6, 15-29, 18-12, 18-15, 18-26, 18-29, 18-35, 18-37, 18-40, 25-14 Thermal monitoring advanced power management, 13-8 automatic, 13-11 automatic thermal monitoring, 13-9 catastrophic shutdown detector, 13-9, 13-10 clock-modulation bits, 13-16 C-state, 13-8 detection of facilities, 13-17 Enhanced Intel SpeedStep Technology, 13-1 IA32_APERF MSR, 13-2 IA32_MPERF MSR, 13-2 IA32_THERM_INTERRUPT MSR, 13-17 IA32_THERM_STATUS MSR, 13-17, 13-18 interrupt enable/di
INDEX page-directory base address (PDBR), 3-28 pointed to by task-gate descriptor, 6-11 previous task link field, 6-6, 6-16, 6-18 privilege-level 0, 1, and 2 stacks, 4-26 referenced by task gate, 5-20 segment registers, 6-5 T (debug trap) flag, 6-6 task register, 6-9 using 16-bit TSSs in a 32-bit environment, 17-33 virtual-mode extensions, 17-32 TSS descriptor B (busy) flag, 6-7 busy flag, 6-18 initialization for multitasking, 8-14 structure of, 6-7, 6-8 TSS segment selector field, task-gate descriptor, 6-
INDEX basic VM-entry checks, 22-2 checking guest state control registers, 22-10 debug registers, 22-10 descriptor-table registers, 22-13 MSRs, 22-10 non-register state, 22-14 RIP and RFLAGS, 22-14 segment registers, 22-11 checks on controls, host-state area, 22-3 registers and MSRs, 22-8 segment and descriptor-table registers, 22-8 VMX control checks, 22-3 exit-reason numbers, I-1 loading guest state, 22-17 control and debug registers, MSRs, 22-18 RIP, RSP, RFLAGS, 22-20 segment & descriptor-table register
INDEX CPUID instruction emulation, 26-17 debug exceptions, 27-2 debugging facilities, 27-1, 27-2 emulating guest execution, 26-2 emulation responsibilites, 26-2 entering VMX root operation, 26-5 error handling, 26-5 exception bitmap, 27-2 external interrupts, 28-1 fast instruction set emulator, 26-1 index data pairs, usage of, 26-16 interrupt handling, 28-1 interrupt vectors, 28-4 leaving VMX operation, 26-6 machine checks, 28-12, 28-13 memory virtualization, 27-3 microcode update facilities, 27-11 multi-p
INDEX virtual-machine control structure (VMCS), 19-3 virtual-machine monitor (VMM), 19-1 vitualization of system resources, 27-1 VM entries and exits, 19-1 VM exits, 23-1 VMCS pointer, 19-3 VMM life cycle, 19-2 VMXOFF instruction, 19-4 VMXON instruction, 19-4 VMXON pointer, 19-4 VMXON region, 19-4 See also:VMM, VMCS, VM entries, VM exits VMXOFF instruction, 19-4 VMXON instruction, 19-4 W WAIT/FWAIT instructions, 5-36, 17-10, 17-21 WB (write back) memory type, 7-23, 10-10, 10-12 WB (write-back) pin (Pentiu