HP Caliper User Guide Release 5.5 (5900-2351, August 2012)

You can potentially get a rough estimate of the total number of data cache misses incurred by a
particular instruction, for example, by doing the following:
1. Determine a scaling factor based on total misses and number of misses accounted for by
sampling:
scale = total L1 misses / (total sampled misses * sampling rate)
2. Multiply the number of sampled misses associated with an instruction by the scaling factor:
total misses for instruction = scale * sampled misses for instruction
However, depending on the density of floating-point load misses incurred by your application,
such estimates could be very misleading.
Floating-point loads are serviced directly from the L2 cache. The PMU treats both L1 data cache
misses and L2 floating-point load misses as data cache miss events for sampling purposes. Therefore,
if your application makes frequent floating-point loads, then multiplying total samples by sampling
rate might yield a data cache miss count that exceeds the total number of L1 data cache misses.
More frequent sampling increases HP Caliper's perturbation of your application. In the extreme
case of taking one sample for each cache miss event, the kernel will trap on every event, making
the resulting data of limited, if any, value.
How Latency Bucket Metrics Are Obtained
The PMU's data event address register (D-EAR) provides the number of cycles of latency for each
sampled miss. HP Caliper places a data cache miss into one of the latency buckets based on the
latency of the miss. HP Caliper uses its built-in table of expected latencies to determine whether a
miss is serviced by the L2 cache, L3 cache, cell local memory, C2C, 1–hop memory, 2–hop
memory, and so forth. HP Caliper uses different expected latencies depending on the CPU type,
CPU frequency, and system model.
How the Data Summary Information Is Obtained
The PMU's data event address register (D-EAR) provides the data address along with the number
of cycles of latency for each sampled data cache miss. HP Caliper creates a histogram of samples
by data addresses, by aggregating all samples falling into the same data address. After creating
such a histogram, the data addresses are mapped to global variables. All samples whose data
addresses belong to the same global variable are aggregated. If a data address does not belong
to any global variable, it is assigned to a region in the process. HP Caliper creates a map of
different regions within a process. This map is used to assign sample data addresses to a process
region.
dtlb Measurement Report Description
With the dtlb measurement, produced by the dtlb measurement configuration file, HP Caliper
measures and reports two levels of information:
Exact counts of data translation lookaside buffer (TLB) metrics summed across the entire run
of an application.
Sampled data TLB metrics that are associated with particular locations in the measured
application. Data TLB misses can hit the L2 TLB, can be handled by the hardware page walker
(HPW), or can be handled by software.
The report shows measured data by thread, load module, function, statement, and instruction.
Command-line options allow you to control the amount of data reported, how the data is sorted,
and the number of statements and instructions reported for each sampled program location.
Example Command Line for Text Report
$ caliper dtlb -o reports/dtlbm.txt ./wordplay thequickbrownfox
Example Command Line for CSV Report
$ caliper dtlb --csv csvout ./wordplay thequickbrownfox
dtlb Measurement Report Description 199