HP Caliper User Guide Release 5.5 (5900-2351, August 2012)

ManualsBrandsHP ManualsSoftwareHP-UX Caliper Software

191

192

193

194

195

196

197

198

199

200

You can potentially get a rough estimate of the total number of data cache misses incurred by a

particular instruction, for example, by doing the following:

1. Determine a scaling factor based on total misses and number of misses accounted for by

sampling:

scale = total L1 misses / (total sampled misses * sampling rate)

2. Multiply the number of sampled misses associated with an instruction by the scaling factor:

total misses for instruction = scale * sampled misses for instruction

However, depending on the density of floating-point load misses incurred by your application,

such estimates could be very misleading.

Floating-point loads are serviced directly from the L2 cache. The PMU treats both L1 data cache

misses and L2 floating-point load misses as data cache miss events for sampling purposes. Therefore,

if your application makes frequent floating-point loads, then multiplying total samples by sampling

rate might yield a data cache miss count that exceeds the total number of L1 data cache misses.

More frequent sampling increases HP Caliper's perturbation of your application. In the extreme

case of taking one sample for each cache miss event, the kernel will trap on every event, making

the resulting data of limited, if any, value.

How Latency Bucket Metrics Are Obtained

The PMU's data event address register (D-EAR) provides the number of cycles of latency for each

sampled miss. HP Caliper places a data cache miss into one of the latency buckets based on the

latency of the miss. HP Caliper uses its built-in table of expected latencies to determine whether a

miss is serviced by the L2 cache, L3 cache, cell local memory, C2C, 1–hop memory, 2–hop

memory, and so forth. HP Caliper uses different expected latencies depending on the CPU type,

CPU frequency, and system model.

How the Data Summary Information Is Obtained

The PMU's data event address register (D-EAR) provides the data address along with the number

of cycles of latency for each sampled data cache miss. HP Caliper creates a histogram of samples

by data addresses, by aggregating all samples falling into the same data address. After creating

such a histogram, the data addresses are mapped to global variables. All samples whose data

addresses belong to the same global variable are aggregated. If a data address does not belong

to any global variable, it is assigned to a region in the process. HP Caliper creates a map of

different regions within a process. This map is used to assign sample data addresses to a process

region.

dtlb Measurement Report Description

With the dtlb measurement, produced by the dtlb measurement configuration file, HP Caliper

measures and reports two levels of information:

• Exact counts of data translation lookaside buffer (TLB) metrics summed across the entire run

of an application.

• Sampled data TLB metrics that are associated with particular locations in the measured

application. Data TLB misses can hit the L2 TLB, can be handled by the hardware page walker

(HPW), or can be handled by software.

The report shows measured data by thread, load module, function, statement, and instruction.

Command-line options allow you to control the amount of data reported, how the data is sorted,

and the number of statements and instructions reported for each sampled program location.

Example Command Line for Text Report

$ caliper dtlb -o reports/dtlbm.txt ./wordplay thequickbrownfox

Example Command Line for CSV Report

$ caliper dtlb --csv csvout ./wordplay thequickbrownfox

dtlb Measurement Report Description 199