HP Caliper User Guide Release 5.5 (5900-2351, August 2012)

ManualsBrandsHP ManualsSoftwareHP-UX Caliper Software

251

252

253

254

255

256

257

258

259

260

measurement. You can use command-line options to limit the scope of the measurement. Specifically,

you can:

• Limit measurement to a specific privilege level: -m event_set[:all|user|kernel]

• Include idle: --exclude-idle False

• Exclude the interruption state: --measure-on-interrupts off

• Only measure the interruption state: --measure-on-interrupts only

Metrics Available from this Measurement

The following metrics are available from this event set. These descriptions do not take into account

any command-line options you might use.

The metrics are:

• Avg Lat

Average memory read latency provides a measure of the number of CPU cycles required to

service a memory cache line read from the perspective of the bus request queue (BRQ). The

time measured includes the arbitration cycles, address cycles, memory controller/memory

cycles, and data return cycles.

Load-to-use latency can be computed by adding the processor overhead cycles required to

issue a miss to the BRQ and forward the data from the bus interface to the processor pipeline.

For the Itanium 2 processor, there are an additional 28 to 30 internal cycles, depending on

which bypasses fail, that must be added to the reported value to estimate true load-use cycles.

For the Itanium 2 6M, Itanium 2 9M, and Itanium 2 Low Voltage processors, 22 to 25 cycles

must be added to estimate load-use cycles.

The reported average latency will be incorrect on Itanium 2 steppings earlier than B2.

The average memory read latency on the dual-core Itanium 2 processor will appear greater

than on previous Itanium 2 processors. This is because the reported latency also includes the

latency that the arbiter adds to both the outbound request and inbound data transfer.

• Avg Outstand

Average number of outstanding reads per cycle gives some idea of the memory request density,

that is, the probability of one or more memory requests per cycle. For control-dominated code

or for workloads that seldom miss the internal caches, this value will be very small. For

data-flow-type workloads, this number can, if extensive prefetching is employed, be quite

high, up to a maximum of 16, which is the Itanium 2 bus limit.

The reported average latency value will be incorrect on Itanium 2 steppings earlier than B2.

• CPU

CPU transaction component is a measure of the percentage of all bus transactions generated

by all CPUs on a shared front side bus (FSB).

• I/O

I/O transaction component is a measure of the percentage of all bus transactions initiated by

any I/O agent on a shared FSB.

• Util Adrs

Average address bus utilization gives an estimate of total address bus utilization resulting

from all bus transactions to include cache misses, I/O port reads/writes, interprocessor

interrupts, writebacks, cache line invalidates (FC instruction, store hit on shared line), and

clean castouts (if enabled). The utilization is computed as follows:

ADRS UTIL = 100.0 * (total transactions/sec * 3.0) / bus cycles/sec

The constant value (3.0) is the number of address cycles needed for each bus transaction.

260 Event Set Descriptions for CPU Metrics