Intel 64 and IA-32 Architectures Software Developers Manual Volume 3B, System Programming Guide Part 2

Table Of Contents
18-114 Vol. 3
DEBUGGING AND PERFORMANCE MONITORING
performance events are provided in the Intel Pentium 4 Processor Optimization
Reference Manual (see Section 1.4, “Related Literature”).
Replay — To maximize performance for the common case, the Intel NetBurst
microarchitecture aggressively schedules μops for execution before all the
conditions for correct execution are guaranteed to be satisfied. In the event that
all of these conditions are not satisfied, μops must be reissued. The mechanism
that the Pentium 4 and Intel Xeon processors use for this reissuing of μops is
called replay. Some examples of replay causes are cache misses, dependence
violations, and unforeseen resource constraints. In normal operation, some
number of replays is common and unavoidable. An excessive number of replays
is an indication of a performance problem.
Assist — When the hardware needs the assistance of microcode to deal with
some event, the machine takes an assist. One example of this is an underflow
condition in the input operands of a floating-point operation. The hardware must
internally modify the format of the operands in order to perform the computation.
Assists clear the entire machine of μops before they begin and are costly.
18.18.7.1 Using At-Retirement Counting
The Pentium 4 and Intel Xeon processors allow counting both events and μops that
encountered a specified event. For a subset of the at-retirement events listed in Table
A-10, a μop may be tagged when it encounters that event. The tagging mechanisms
can be used in non-precise event-based sampling, and a subset of these mechanisms
can be used in PEBS. There are four independent tagging mechanisms, and each
mechanism uses a different event to count μops tagged with that mechanism:
Front-end tagging — This mechanism pertains to the tagging of μops that
encountered front-end events (for example, trace cache and instruction counts)
and are counted with the Front_end_event event
Execution tagging — This mechanism pertains to the tagging of μops that
encountered execution events (for example, instruction types) and are counted
with the Execution_Event event.
Replay tagging — This mechanism pertains to tagging of μops whose
retirement is replayed (for example, a cache miss) and are counted with the
Replay_event event. Branch mispredictions are also tagged with this mechanism.
No tags — This mechanism does not use tags. It uses the Instr_retired and the
Uops_ retired events.
Each tagging mechanism is independent from all others; that is, a μop that has been
tagged using one mechanism will not be detected with another mechanism’s tagged-
μop detector. For example, if μops are tagged using the front-end tagging mecha-
nisms, the Replay_event will not count those as tagged μops unless they are also
tagged using the replay tagging mechanism. However, execution tags allow up to
four different types of μops to be counted at retirement through execution tagging.
The independence of tagging mechanisms does not hold when using PEBS. When
using PEBS, only one tagging mechanism should be used at a time.