Intel 64 and IA-32 Architectures Software Developers Manual Volume 3B, System Programming Guide Part 2

Table Of Contents
Vol. 3 18-71
DEBUGGING AND PERFORMANCE MONITORING
18.17.1 Enhancements of Performance Monitoring in the Processor
Core
The notable enhancements in the monitoring of performance events in the processor
core include:
Four general purpose performance counters, IA32_PMCx, associated counter
configuration MSRs, IA32_PERFEVTSELx, and global counter control MSR
supporting simplified control of four counters. Each of the four performance
counter can support precise event based sampling (PEBS) and thread-qualifi-
cation of architectural and non-architectural performance events. Width of
IA32_PMCx supported by hardware has been increased. The width of counter
reported by CPUID.0AH:EAX[23:16] is 48 bits. The PEBS facility in Intel microar-
chitecture (Nehalem) has been enhanced to include new data format to capture
additional information, such as load latency.
Load latency sampling facility. Average latency of memory load operation can be
sampled using load-latency facility in processors based on Intel microarchi-
tecture (Nehalem). The facility can measure average latency of load micro-
operations from dispatch to when data is globally observable (GO). This facility is
used in conjunction with the PEBS facility.
Off-core response counting facility. This facility in the processor core allows
software to count certain transaction responses between the processor core to
sub-systems outside the processor core (uncore). Counting off-core response
requires additional event qualification configuration facility in conjunction with
IA32_PERFEVTSELx. Two off-core response MSRs are provided to use in
conjunction with specific event codes that must be specified with
IA32_PERFEVTSELx.
18.17.1.1 Precise Event Based Sampling (PEBS)
All four general-purpose performance counters, IA32_PMCx, can be used for PEBS if
the performance event supports PEBS. Software usesIA32_MISC_ENABLES[7] and
IA32_MISC_ENABLES[12] to detect whether performance monitoring facility and
PEBS are supported in the processor. The MSR IA32_PEBS_ENABLE provides 4 bits
that software must use to enable which IA32_PMCx overflow condition will cause
PEBS record to be captured.
Additionally, PEBS record is expanded that allow latency information to be captured.
he MSR IA32_PEBS_ENABLE provides 4 additional bits that software must use to
enable latency data recording in the PEBS record upon the respective IA32_PMCx
overflow condition. The layout of IA32_PEBS_ENABLE is shown in Figure 18-25.
When a counter is enabled to capture machine state (PEBS_EN_PMCx = 1), the
processor will write machine state information to a memory buffer specified by soft-
ware as detailed below. When the counter IA32_PMCx overflows from maximum
count to zero, the PEBS hardware is armed.