HP Caliper User Guide Release 5.5 (5900-2351, August 2012)

measurement. You can use command-line options to limit the scope of the measurement. Specifically,
you can:
Limit measurement to a specific privilege level: -m event_set[:all|user|kernel]
Include idle: --exclude-idle False
Exclude the interruption state: --measure-on-interrupts off
Only measure the interruption state: --measure-on-interrupts only
Metrics Available from this Measurement
The following metrics are available from this event set. These descriptions do not take into account
any command-line options you might use.
The metrics are:
Avg Lat
Average memory read latency provides a measure of the number of CPU cycles required to
service a memory cache line read from the perspective of the bus request queue (BRQ). The
time measured includes the arbitration cycles, address cycles, memory controller/memory
cycles, and data return cycles.
Load-to-use latency can be computed by adding the processor overhead cycles required to
issue a miss to the BRQ and forward the data from the bus interface to the processor pipeline.
For the Itanium 2 processor, there are an additional 28 to 30 internal cycles, depending on
which bypasses fail, that must be added to the reported value to estimate true load-use cycles.
For the Itanium 2 6M, Itanium 2 9M, and Itanium 2 Low Voltage processors, 22 to 25 cycles
must be added to estimate load-use cycles.
The reported average latency will be incorrect on Itanium 2 steppings earlier than B2.
The average memory read latency on the dual-core Itanium 2 processor will appear greater
than on previous Itanium 2 processors. This is because the reported latency also includes the
latency that the arbiter adds to both the outbound request and inbound data transfer.
Avg Outstand
Average number of outstanding reads per cycle gives some idea of the memory request density,
that is, the probability of one or more memory requests per cycle. For control-dominated code
or for workloads that seldom miss the internal caches, this value will be very small. For
data-flow-type workloads, this number can, if extensive prefetching is employed, be quite
high, up to a maximum of 16, which is the Itanium 2 bus limit.
The reported average latency value will be incorrect on Itanium 2 steppings earlier than B2.
CPU
CPU transaction component is a measure of the percentage of all bus transactions generated
by all CPUs on a shared front side bus (FSB).
I/O
I/O transaction component is a measure of the percentage of all bus transactions initiated by
any I/O agent on a shared FSB.
Util Adrs
Average address bus utilization gives an estimate of total address bus utilization resulting
from all bus transactions to include cache misses, I/O port reads/writes, interprocessor
interrupts, writebacks, cache line invalidates (FC instruction, store hit on shared line), and
clean castouts (if enabled). The utilization is computed as follows:
ADRS UTIL = 100.0 * (total transactions/sec * 3.0) / bus cycles/sec
The constant value (3.0) is the number of address cycles needed for each bus transaction.
260 Event Set Descriptions for CPU Metrics