HP Caliper 5.3 User Guide (5900-1558, February 2011)

that caused the PMU overflow will have occurred some number of cycles, typically in the low tens,
before the address being sampled. Thus, the address recorded might or might not point to the
instruction causing the event, depending on pipeline stalls.
The latency between the event triggering the sample and the actual sample is not a problem if you
are using fprof to find hot spots in your application. It is only an issue if you try to use fprof
to find particular instructions that cause the events recorded by the PMU, in which case you must
take the latency into account.
icache Measurement Report Description
With the icache measurement, produced by the icache measurement configuration file, HP Caliper
measures and reports on instruction cache metrics. This measurement is similar to the dcache
measurement.
The report shows two levels of information:
Exact counts of instruction cache metrics summed across the entire run of an application
Sampled instruction cache metrics that are associated with particular locations in the application
The report shows measured data by thread, load module, function, statement, cache line, and
instruction.
Command-line options allow you to control the amount of data reported, how the data are sorted,
and the number of statements and instructions reported for each sampled program location.
Example Command Line for Text Report
$ caliper icache -o reports/icachem.txt ./matmul
Example Command Line for CSV Report
$ caliper icache --csv csvout ./matmul
icache Metrics Summed for Entire Run
This section describes the metrics summed over the entire run of your application under HP Caliper.
Metrics for Integrity Servers Itanium 2 Systems
L1I_READS Provides information about the number of demand fetch
reads, that is, all accesses regardless of hit or miss, to the
L1 instruction cache (32-byte chunks).
If demand fetches have an L1 instruction TLB miss, have an
L1 instruction cache miss, and collide with a fill-recirculate
to the instruction cache, they are not counted in this
measurement even though they are counted in
L2_INST_DEMAND_READS.
L2_INST_DEMAND_READS Number of instruction requests to L2 due to L1 instruction
demand fetch misses. This event counts the number of
demand fetches that miss both the L1 instruction cache and
the ISB regardless of whether they hit or miss in the RAB.
If a demand fetch does not have an L1 instruction TLB miss,
L2_INST_DEMAND_READS and L1_READS line up in time.
If a demand fetch does not have an L2 instruction TLB miss,
L2_INST_DEMAND_READS follows L1I_READS by 3-4 clocks
(unless a flushed iwalk is pending ahead of it, which will
increase the delay until the pending iwalk is finished).
If demand fetch has an L2 instruction TLB miss, the skew
between L2_INST_DEMAND_READS and L1I_READS is not
deterministic.
204 Descriptions of Measurement Reports