HP Caliper 5.3 User Guide (5900-1558, February 2011)

ManualsBrandsHP ManualsSoftwareHP-UX Caliper Software

241

242

243

244

245

246

247

248

249

250

the HPW will terminate and initiate a trap to software to provide the required TLB entry. This

component counts the stall component only due to the HPW providing the required TLB entry.

Time spent in the software trap handler is not counted in this component.

• Dcache

This counts the number of cycles stalled due to data cache misses at any level of the cache

hierarchy (L1, L2, L3). Due to event limitations, it is not possible to distinguish between freg-freg

and freg-load dependencies. This has the unfortunate effect of counting either scoreboard

cycles as data cache cycles or data access cycles as scoreboard cycles. This implementation

allocates all floating-point stalls to the data cache category. This has the implication that some

floating-point register dependency stalls that should be allocated to the scoreboard category

will be incorrectly allocated to the data cache category.

• RSE Active

This counts the number of cycles that the pipeline is stalled due to the Register Save Engine

spilling/filling registers to/from memory.

sysbus Event Set

Available only on Itanium 2 and dual-core Itanium 2 systems.

The sysbus event set provides data on system bus utilization and its breakdown into:

• Transaction originator (all, local cpu, io)

• Transaction type (brl, bril, bil, bwl, partial)

If you use this option, you must use the --bus-speed option.

If you use this event set, the default is to make the measurements irrespective of CPU operating

state (that is, user, system, or interrupt states). By default, the idle state is not included in the

measurement. You can use command-line options to limit the scope of the measurement. Specifically,

you can:

• Limit measurement to a specific privilege level: -m event_set[:all|user|kernel]

• Include idle: --exclude-idle False

• Exclude the interruption state: --measure-on-interrupts off

• Only measure the interruption state: --measure-on-interrupts only

Metrics Available from this Measurement

The following metrics are available from this event set. These descriptions do not take into account

any command-line options you might use.

The metrics are:

• Avg Lat

Average memory read latency provides a measure of the number of CPU cycles required to

service a memory cache line read from the perspective of the bus request queue (BRQ). The

time measured includes the arbitration cycles, address cycles, memory controller/memory

cycles, and data return cycles.

Load-to-use latency can be computed by adding the processor overhead cycles required to

issue a miss to the BRQ and forward the data from the bus interface to the processor pipeline.

For the Itanium 2 processor, there are an additional 28 to 30 internal cycles, depending on

which bypasses fail, that must be added to the reported value to estimate true load-use cycles.

For the Itanium 2 6M, Itanium 2 9M, and Itanium 2 Low Voltage processors, 22 to 25 cycles

must be added to estimate load-use cycles.

The reported average latency will be incorrect on Itanium 2 steppings earlier than B2.

sysbus Event Set 247