Intel 64 and IA-32 Architectures Software Developers Manual Volume 3B, System Programming Guide Part 2

Table Of Contents
Vol. 3 18-91
DEBUGGING AND PERFORMANCE MONITORING
At-retirement events (see Table A-10) are events that are counted at the
retirement stage of instruction execution, which allows finer granularity in
counting events and capturing machine state.
The at-retirement counting mechanism includes facilities for tagging μops that
have encountered a particular performance event during instruction execution.
Tagging allows events to be sorted between those that occurred on an execution
path that resulted in architectural state being committed at retirement as well as
events that occurred on an execution path where the results were eventually
cancelled and never committed to architectural state (such as, the execution of a
mispredicted branch).
The Pentium 4 and Intel Xeon processor performance monitoring facilities support
the three usage models described below. The first two models can be used to count
both non-retirement and at-retirement events; the third model is used to count a
subset of at-retirement events:
Event counting — A performance counter is configured to count one or more
types of events. While the counter is counting, software reads the counter at
selected intervals to determine the number of events that have been counted
between the intervals.
Non-precise event-based sampling — A performance counter is configured to
count one or more types of events and to generate an interrupt when it
overflows. To trigger an overflow, the counter is preset to a modulus value that
will cause the counter to overflow after a specific number of events have been
counted.
When the counter overflows, the processor generates a performance monitoring
interrupt (PMI). The interrupt service routine for the PMI then records the return
instruction pointer (RIP), resets the modulus, and restarts the counter. Code
performance can be analyzed by examining the distribution of RIPs with a tool
like the VTune™ Performance Analyzer.
Precise event-based sampling (PEBS) — This type of performance
monitoring is similar to non-precise event-based sampling, except that a
memory buffer is used to save a record of the architectural state of the processor
whenever the counter overflows. The records of architectural state provide
additional information for use in performance tuning. Precise event-based
sampling can be used to count only a subset of at-retirement events.
The following sections describe the MSRs and data structures used for performance
monitoring in the Pentium 4 and Intel Xeon processors.
18.18.1 ESCR MSRs
The 45 ESCR MSRs (see Table 18-26) allow software to select specific events to be
countered. Each ESCR is usually associated with a pair of performance counters (see
Table 18-26) and each performance counter has several ESCRs associated with it
(allowing the events counted to be selected from a variety of events).