HP C Programmer's Guide (92434-90009)

Chapter 4 109
Optimizing HP C Programs
Parallel Execution
routine or to a user-defined routine with the same name as a system routine. If the call is
to a system routine, the code inhibits parallel execution.
NOTE
If your program makes explicit use of threads, do not attempt to parallelize it.
Parallel Execution and Shared Memory
A program compiled with the +Oparallel option and executing on more than one
processor mostly uses shared memory instead of the normal process data and stack
segments. (If it executes on one processor, it uses the normal process data segment instead
of shared memory.) If a parallel-executing program requires large amounts of memory, you
may need to increase shmmax, the HP-UX kernel configuration parameter that sets the
maximum size of a shared-memory segment.
A program compiled with +Oparallel sizes its shared-memory stack with the smaller of
shmmax and the default stack size, which is set by maxssiz, another HP-UX kernel
configuration parameter.
To set these configuration parameters, run the System Administration Manager (SAM)
and get to the configuration area.
Profiling Parallelized Programs
Profiling a program that has been compiled for parallel execution is performed in much the
same way as it is for non-parallel programs:
1. Compile the program with the -G option.
2. Run the program to produce profiling data.
3. Run gprof against the program.
4. View the output from gprof.
The differences are:
Running the program in Step 2 produces a gmon.out file for the master process and
gmon.out.1, gmon.out.2, and so on for each of the slave processes. Thus, if your
program is to execute on two processors, Step 2 will produce two files, gmon.out and
gmon.out.1.
The flat profile that you view in Step 4 indicates loops that were parallelized with the
following notation:
routine_name
##pr_line_0123
where
routine_name
is the name of the routine containing the loop, pr (parallel region)
indicates that the loop was parallelized, and 0123 is the line number of the beginning of
the loop or loops that are parallelized.
Conditions Inhibiting Loop Parallelization
The following sections describe different conditions that can inhibit parallelization.
Additionally, +Onoloop_transform and +Onoinline may be helpful options if you
experience any problem while using +Oparallel.