Parallel Programming Guide for HP-UX Systems

MPI
Profiling
Chapter 272
Specifications you make using mpirun -i override any specifications you make using the
MPI_INSTR environment variable.
MPIHP_Trace_on and MPIHP_Trace_off By default, the entire application is profiled
from MPI_Init to MPI_Finalize. However, HP MPI provides the nonstandard
MPIHP_Trace_on and MPIHP_Trace_off routines to collect profile information for selected
code sections only.
To use this functionality:
1. Insert the MPIHP_Trace_on and MPIHP_Trace_off pair around code that you want to
profile.
2. Build the application and invoke mpirun with the -i off option.
-i off specifies that counter instrumentation is enabled but initially turned off. Data
collection begins after all processes collectively call MPIHP_Trace_on. HP MPI collects
profiling information only for code between MPIHP_Trace_on and MPIHP_Trace_off
CAUTION MPIHP_Trace_on and MPIHP_Trace_off are collective routines and must be
called by all ranks in your application. Otherwise, the application deadlocks.
Viewing ASCII instrumentation data
The ASCII instrumentation profile is a text file with the .instr extension. For example, to view
the instrumentation file for the compute_pi.f application, you can print the
prefix
.instr file.
If you defined
prefix
for the file as compute_pi, as you did when you created the
instrumentation file in “Creating an instrumentation profile” on page 71, you would print
compute_pi.instr.
The ASCII instrumentation profile provides the version, the date your application ran, and
summarizes information according to application, rank, and routines. Figure 2-3 on page 73 is
an example of an ASCII instrumentation profile.
The information available in the
prefix
.instr file includes:
Overhead time—The time a process or routine spends inside MPI. For example, the time a
process spends doing message packing.
Blocking time—The time a process or routine is blocked waiting for a message to arrive
before resuming execution.
NOTE If spin-yield time is changed, overhead and blocking times become less
accurate.