HP Caliper User Guide Release 5.5 (5900-2351, August 2012)

Util Data
Data bus utilization gives a lower bound estimate of total data bus utilization resulting from
bus transactions that result in a data transfer, that is, BRL, BRIL, BWL, and nonzero byte
BRP/BWP transactions. A lower bound data bus utilization is computed as follows:
DATA BUS CYCLES/SEC = ((BRL + BRIL + BWL + IMPLICIT WB)/sec * 4.0)
+
((nonzero byte BRP's/BWP's)/sec * 1.0)
DATA UTIL = 100 * (DATA BUS CYCLES/SEC) / BUS CYCLES SEC
The constants (4.0 and 1.0) represent the number of cycles that the data bus is occupied to
perform the requisite data transfer. All cache line transfers (brl, bril, bwl) require four cycles.
The nonzero BRP's/BWP's require one or two cycles (16, 32, 64 bytes). Since most of the
nonzero BRP's/BWP's are to I/O ports and semaphores, it was decided to assume a
single-cycle transfer. Thus, there is a small possibility of undercounting cycles.
BRL
Bus Read Line is the transaction used to read cache lines, due either to an instruction cache
miss or to a load data miss.
BRIL
Bus Read Invalidate Line is the transaction used when a store miss occurs, thus a read for
ownership. In Itanium 2, this transaction is also used when a store hit occurs on a shared line.
In this case, the BRIL is used to invalidate all remote copies on this cache line and have the
memory controller return the line we already have to the cache. Itanium 2 does not implement
the BIL optimization, which would have allowed remote copies to be invalidated without
performing a superfluous memory request.
BWL
Bus Writeback Line is used when a dirty cache line is replaced as a consequence of servicing
a BRL or BRIL bus transaction.
BRC
This is the number of current memory read transactions on the bus.
BIL
Bus Invalidate Line is used to cause lines to be flushed from the cache. Since Itanium 2 does
not implement the BIL optimization, this can only be generated by the fc (flush cache)
instruction. This is a zero-byte memory read transaction, although an implicit writeback will
occur if the BIL hits a modified line.
Ccast Out
These zero-byte write transactions would normally only occur in systems that use directory-based
cache coherence. The purpose of this transaction is to inform the coherency directory that a
clean cache was evicted from the CPU's cache (that is, it is no longer an owner of the cache
line). Snoopy-based cache coherency systems do not require this notification, because all
caches are automatically interrogated on all memory cache line reads/writes.
PRTL
This is the number of partial (less than 128 byte) reads (BRP) or writes (BWP) per second.
Partial transactions are normally due to reading/writing memory-mapped I/O control registers,
semaphore operations, clean castouts (if monitoring a system with directory-based cache
coherency), and sending interprocessor interrupts.
threadswitch Event Set
Available only on dual-core Itanium 2 and newer systems.
The threadswitch event set provides data about the impact of HyperThreading on the measured
process. It provides a full statistical breakdown of thread switch activity.
threadswitch Event Set 261