HP Caliper 5.3 User Guide (5900-1558, February 2011)

Data Summary
---------------------------------------------------------------------------------------------------------------
% Total Avg. ---Latency buckets as % Misses--
Dcache Cumulat Sampled Dcache Dcache L2 --L3-- ------Memory-------
Latency % of Dcache Latency Laten.
Cycles Total Misses Cycles Cycles 7 14 64 150 250 350 450 > Data Entry
---------------------------------------------------------------------------------------------------------------
66.82 66.82 42 580 13.8 62 29 7 0 0 2 0 0 Heap
7.72 74.54 10 67 6.7 80 20 0 0 0 0 0 0 Memory mapped shared library
5.65 80.18 5 49 9.8 40 60 0 0 0 0 0 0 Process Text Region
4.84 85.02 4 42 10.5 25 50 25 0 0 0 0 0 libc.so.1::_arena_rmutex
4.72 89.75 5 41 8.2 40 60 0 0 0 0 0 0 Process Data Region
The Data Entry column shows the global variable name, process region name, or unknown data
address.
The process regions are:
Process Text Region - the address space occupied by the process text/instructions
Process Data Region - the address space occupied by initialized data and uninitialized data
(.bss)
Heap - the address space where dynamically allocated memory resides
Data and Heap combined - when HP Caliper cannot discover the data and heap regions
separately
Process Stack Region - the user stack area
Shared mem - all the shared memory areas mapped to the process
RSE Stack - the RSE stack area
Memory mapped shared library - the data area of the shared libraries mapped to the process
Memory mapped region - all other memory mapped regions
If there is more than one region of the same type, they are combined and reported as a single
entry.
The Data Summary report is generated per-process. For a per-thread report, use the --thread
all option. For a per-module report, use the --per-module-data True option.
The Data Summary report can be merged or differenced across two databases that contain the
Data Summary information.
If a process exec()s, HP Caliper does not discover the process regions. In this case, the data
addresses are mapped to global variables, and any unassigned samples are reported as unknown
samples. A diagnostics message is generated with the report.
How Data Cache Metrics Are Obtained
HP Caliper obtains data cache metrics from the processor's performance monitoring unit (PMU).
Exact counts are obtained from the PMU's set of performance monitor configuration
(PMC)/performance monitor data (PMD) register pairs. Sampled data cache metrics are obtained
from the PMU's data event address register (D-EAR). Both sets of metrics focus on the L1 cache,
with notable exceptions.
HP Caliper takes samples every Nth data cache miss, where N is defined in the dcache measurement
configuration file in the HP Caliper home directory config subdirectory. At each sample point,
HP Caliper records both the instruction that resulted in a data cache miss and the latency (number
of clock cycles) incurred by the miss. You can override the value in the measurement configuration
file by using the -s option.
For data cache miss sampling, the PMU can monitor only one data cache load at a time. Since
there are likely to be multiple loads in progress at any given moment, the PMU can process only
a subset of data cache misses. The PMU randomizes which loads it monitors.
This means that the number of data cache misses observed through sampling—number of sampled
misses multiplied by sampling rate—is only a subset of the total number of actual data cache
misses. Therefore, it is best to interpret sampling data not as an indication of how many data cache
192 Descriptions of Measurement Reports