HP Caliper 5.3 User Guide (5900-1558, February 2011)

the HPW will terminate and initiate a trap to software to provide the required TLB entry. This
component counts the stall component only due to the HPW providing the required TLB entry.
Time spent in the software trap handler is not counted in this component.
Dcache
This counts the number of cycles stalled due to data cache misses at any level of the cache
hierarchy (L1, L2, L3). Due to event limitations, it is not possible to distinguish between freg-freg
and freg-load dependencies. This has the unfortunate effect of counting either scoreboard
cycles as data cache cycles or data access cycles as scoreboard cycles. This implementation
allocates all floating-point stalls to the data cache category. This has the implication that some
floating-point register dependency stalls that should be allocated to the scoreboard category
will be incorrectly allocated to the data cache category.
RSE Active
This counts the number of cycles that the pipeline is stalled due to the Register Save Engine
spilling/filling registers to/from memory.
sysbus Event Set
Available only on Itanium 2 and dual-core Itanium 2 systems.
The sysbus event set provides data on system bus utilization and its breakdown into:
Transaction originator (all, local cpu, io)
Transaction type (brl, bril, bil, bwl, partial)
If you use this option, you must use the --bus-speed option.
If you use this event set, the default is to make the measurements irrespective of CPU operating
state (that is, user, system, or interrupt states). By default, the idle state is not included in the
measurement. You can use command-line options to limit the scope of the measurement. Specifically,
you can:
Limit measurement to a specific privilege level: -m event_set[:all|user|kernel]
Include idle: --exclude-idle False
Exclude the interruption state: --measure-on-interrupts off
Only measure the interruption state: --measure-on-interrupts only
Metrics Available from this Measurement
The following metrics are available from this event set. These descriptions do not take into account
any command-line options you might use.
The metrics are:
Avg Lat
Average memory read latency provides a measure of the number of CPU cycles required to
service a memory cache line read from the perspective of the bus request queue (BRQ). The
time measured includes the arbitration cycles, address cycles, memory controller/memory
cycles, and data return cycles.
Load-to-use latency can be computed by adding the processor overhead cycles required to
issue a miss to the BRQ and forward the data from the bus interface to the processor pipeline.
For the Itanium 2 processor, there are an additional 28 to 30 internal cycles, depending on
which bypasses fail, that must be added to the reported value to estimate true load-use cycles.
For the Itanium 2 6M, Itanium 2 9M, and Itanium 2 Low Voltage processors, 22 to 25 cycles
must be added to estimate load-use cycles.
The reported average latency will be incorrect on Itanium 2 steppings earlier than B2.
sysbus Event Set 247