HP Caliper 5.3 User Guide (5900-1558, February 2011)

ManualsBrandsHP ManualsSoftwareHP-UX Caliper Software

241

242

243

244

245

246

247

248

249

250

The average memory read latency on the dual-core Itanium 2 processor will appear greater

than on previous Itanium 2 processors. This is because the reported latency also includes the

latency that the arbiter adds to both the outbound request and inbound data transfer.

• Avg Outstand

Average number of outstanding reads per cycle gives some idea of the memory request density,

that is, the probability of one or more memory requests per cycle. For control-dominated code

or for workloads that seldom miss the internal caches, this value will be very small. For

data-flow-type workloads, this number can, if extensive prefetching is employed, be quite

high, up to a maximum of 16, which is the Itanium 2 bus limit.

The reported average latency value will be incorrect on Itanium 2 steppings earlier than B2.

• CPU

CPU transaction component is a measure of the percentage of all bus transactions generated

by all CPUs on a shared front side bus (FSB).

• I/O

I/O transaction component is a measure of the percentage of all bus transactions initiated by

any I/O agent on a shared FSB.

• Util Adrs

Average address bus utilization gives an estimate of total address bus utilization resulting

from all bus transactions to include cache misses, I/O port reads/writes, interprocessor

interrupts, writebacks, cache line invalidates (FC instruction, store hit on shared line), and

clean castouts (if enabled). The utilization is computed as follows:

ADRS UTIL = 100.0 * (total transactions/sec * 3.0) / bus cycles/sec

The constant value (3.0) is the number of address cycles needed for each bus transaction.

• Util Data

Data bus utilization gives a lower bound estimate of total data bus utilization resulting from

bus transactions that result in a data transfer, that is, BRL, BRIL, BWL, and nonzero byte

BRP/BWP transactions. A lower bound data bus utilization is computed as follows:

DATA BUS CYCLES/SEC = ((BRL + BRIL + BWL + IMPLICIT WB)/sec * 4.0)

((nonzero byte BRP's/BWP's)/sec * 1.0)

DATA UTIL = 100 * (DATA BUS CYCLES/SEC) / BUS CYCLES SEC

The constants (4.0 and 1.0) represent the number of cycles that the data bus is occupied to

perform the requisite data transfer. All cache line transfers (brl, bril, bwl) require four cycles.

The nonzero BRP's/BWP's require one or two cycles (16, 32, 64 bytes). Since most of the

nonzero BRP's/BWP's are to I/O ports and semaphores, it was decided to assume a

single-cycle transfer. Thus, there is a small possibility of undercounting cycles.

• BRL

Bus Read Line is the transaction used to read cache lines, due either to an instruction cache

miss or to a load data miss.

• BRIL

Bus Read Invalidate Line is the transaction used when a store miss occurs, thus a read for

ownership. In Itanium 2, this transaction is also used when a store hit occurs on a shared line.

In this case, the BRIL is used to invalidate all remote copies on this cache line and have the

memory controller return the line we already have to the cache. Itanium 2 does not implement

the BIL optimization, which would have allowed remote copies to be invalidated without

performing a superfluous memory request.

248 Event Set Descriptions for CPU Metrics