Intel 64 and IA-32 Architectures Software Developers Manual Volume 3B, System Programming Guide Part 2

Table Of Contents
Vol. 3 18-113
DEBUGGING AND PERFORMANCE MONITORING
18.18.7 At-Retirement Counting
At-retirement counting provides a means counting only events that represent work
committed to architectural state and ignoring work that was performed speculatively
and later discarded.
The Intel NetBurst microarchitecture used in the Pentium 4 and Intel Xeon proces-
sors performs many speculative activities in an attempt to increase effective
processing speeds. One example of this speculative activity is branch prediction. The
Pentium 4 and Intel Xeon processors typically predict the direction of branches and
then decode and execute instructions down the predicted path in anticipation of the
actual branch decision. When a branch misprediction occurs, the results of instruc-
tions that were decoded and executed down the mispredicted path are canceled. If a
performance counter was set up to count all executed instructions, the count would
include instructions whose results were canceled as well as those whose results
committed to architectural state.
To provide finer granularity in event counting in these situations, the performance
monitoring facilities provided in the Pentium 4 and Intel Xeon processors provide a
mechanism for tagging events and then counting only those tagged events that
represent committed results. This mechanism is called “at-retirement counting.
Tables A-10 through A-14 list predefined at-retirement events and event metrics that
can be used to for tagging events when using at retirement counting. The following
terminology is used in describing at-retirement counting:
Bogus, non-bogus, retire — In at-retirement event descriptions, the term
“bogus” refers to instructions or μops that must be canceled because they are on
a path taken from a mispredicted branch. The terms “retired” and “non-bogus”
refer to instructions or μops along the path that results in committed architec-
tural state changes as required by the program being executed. Thus instructions
and μops are either bogus or non-bogus, but not both. Several of the Pentium 4
and Intel Xeon processors’ performance monitoring events (such as,
Instruction_Retired and Uops_Retired in Table A-10) can count instructions or
μops that are retired based on the characterization of bogus” versus non-bogus.
Tagging — Tagging is a means of marking μops that have encountered a
particular performance event so they can be counted at retirement. During the
course of execution, the same event can happen more than once per μop and a
direct count of the event would not provide an indication of how many μops
encountered that event.
The tagging mechanisms allow a μop to be tagged once during its lifetime and
thus counted once at retirement. The retired suffix is used for performance
metrics that increment a count once per μop, rather than once per event. For
example, a μop may encounter a cache miss more than once during its life time,
but a “Miss Retired” metric (that counts the number of retired μops that
encountered a cache miss) will increment only once for that μop. A “Miss Retired”
metric would be useful for characterizing the performance of the cache hierarchy
for a particular instruction sequence. Details of various performance metrics and
how these can be constructed using the Pentium 4 and Intel Xeon processors