HP Caliper 5.3 User Guide (5900-1558, February 2011)

Exclude the interruption state: --measure-on-interrupts off
Only measure the interruption state: --measure-on-interrupts only
Metrics Available from this Measurement
The following metrics are available from this event set. These descriptions do not take into account
any command-line options you might use.
The metrics are:
Raw CPI
The raw CPI is computed using all instructions retired. This includes nops and predicated off
instructions. The relationship between effective and raw CPI values can be obtained from the
cpi measurement.
Itlb
This counts the number of cycles where there are no back-end stalls or flushes, the decoupling
buffer is empty, and the front end is stalled due to an L1 TLB miss that is serviced either by the
L2 TLB or the HPW if an L2 TLB and the TLB entry is found in somewhere is the cache hierarchy.
This does not count cycles attributable to software TLB miss handling when the HPW fails to
find the requisite translation.
Icache
This counts the number of cycles where there are no back-end stalls or flushes, the decoupling
buffer is empty, and the front end is stalled due to an instruction cache miss at any level of
the cache hierarchy (L1, L2, L3).
Branch
This counts the number of stall cycles associated with branch execution. There are two
components to this category. The first is stalls due to execution bubbles caused by a front-end
resteer, that is, a taken branch. The second component is stalls due to the recirculation of
branches while they are waiting for branch history information used in predicting branch
direction.
Unstall Execute
This is the percentage of cycles when the back end is executing instructions without stalling.
Depending on code characteristics and resource limitations, the number of instructions executing
varies from 1 to 6, which is the maximum dispatch for the Itanium 2 processor. Taken branches,
non-double-bundle aligned branch targets, and explicit stop bits are the primary determinants
of code-based execution limitations. You can obtain some idea of this from the dispersal
event set.
BE Flush
This counts the number of stall cycles resulting from a pipeline flush caused by a branch
misprediction, an exception, an ALAT flush, or a serialization flush.
Scoreboard
This counts stall cycles due to dependencies on integer or floating-point operations, floating-point
flushes, and control or application register read or writes.
L1Dtlb
This counts the number of cycles stalled due to a level 1 data TLB miss that hits in the level 2
data TLB. This is sometimes called a L1DTLB transfer stall. If the level 2 TLB misses, the hardware
page walker (HPW) is invoked to insert the required page into the level 2 TLB, which is then
forwarded to the level 1 data TLB.
L2Dtlb
This counts the number of cycles stalled due to a level 2 data TLB miss during the time the
HPW is actively attempting to resolve the requested TLB entry. If the entry is not in the cache,
246 Event Set Descriptions for CPU Metrics