HP Caliper User Guide Release 5.5 (5900-2351, August 2012)

ManualsBrandsHP ManualsSoftwareHP-UX Caliper Software

191

192

193

194

195

196

197

198

199

200

The list of processor metrics you can use for the sampling event are available from the file

itanium2_cpu_counters.txt, located in the HP Caliper home directory in the doc/text

subdirectory.

The ETB collected at each sampling point can contain up to 16 IPs. By default, cycles will pick

the youngest IP sample from the ETB. However, all the 16 IP entries are processed to collect the

elapsed cycles (Cycles Per Bundle) information. If two consecutive IP entries have the same bundle

address, it is treated as a split issue (that is, it required multiple cycles to issue that bundle).

You can use the --etb-freeze-delay and --etb-walkback-cycles options to correlate

the performance monitoring events to IP values. The option --etb-freeze-delay changes the

way in which the ETB collects the IP samples. The option --etb-walkback-cycles changes

the way in which HP Caliper picks the IP sample from the 16 IP entries in an ETB sample.

dcache Measurement Report Description

With the dcache measurement, produced by the dcache measurement configuration file, HP

Caliper measures and reports on data cache metrics. This measurement is similar to the icache

measurement.

The report shows two levels of information:

• Exact counts of data cache metrics summed across the entire run of an application

• Sampled data cache metrics that are associated with particular locations in the application

The sampled metrics also provide detailed latency information by breaking up the misses into eight

different latency buckets based on latency cycles. The different buckets provide percentage of

misses with different latency ranges.

A latency bucket is a grouping of latency data associated with data accesses serviced by particular

levels of CPU cache and system memory. The different latency buckets can be one of the following:

L2 cache access, L3 cache access, and memory access. On cell-based systems, the following

additional buckets are provided: cell local memory access, 1–hop memory access, 2–hop memory

access, and cache-to-cache (C2C) access.

The latency bucket information is particularly useful for understanding data cache access behavior

of large-enterprise multithreaded, multiprocess applications and fine-tuning the applications. For

example, if a large percentage of data cache misses are due to 1– or 2–hop C2C accesses, this

could indicate that the processes are sharing data and running on CPUs in two different cells. You

can possibly improve performance significantly by scheduling those processes to run on CPUs

within the same cell.

You can turn off the latency bucket information by using the --latency-buckets False option.

On HP-UX, HP Caliper uses the model command to determine what the CPU type and CPU

frequency are.

On Linux, you need to use the --system-model option to help HP Caliper determine the CPU

type and CPU frequency. If you do not use this option, HP Caliper will break up the misses into

the following three buckets by default: L2 cache access, L3 cache access, and memory access.

The report shows measured data by thread, load module, function, statement, and instruction.

Command-line options let you control the amount of data reported, how the data is sorted, and

the number of statements and instructions reported for each sampled program location.

You can use the --dcache-data-profile option to get Data Summary output with a report.

See “Using the --dcache-data-profile Option to Produce a Data Summary” (page 197)

You can use the --dcache-stores option to get Data Cache Store Profile output report.See

“Using the --dcache-stores Option to Produce a Data Cache Store Profile ” (page 198)

Example Command Line for Text Report

$ caliper dcache -o reports/dcachem.txt ./matmul

Example Command Line for CSV Report

$ caliper dcache --csv csvout ./wordplay thequickbrownfox

dcache Measurement Report Description 191