Parallel Programming Guide for HP-UX Systems

MPI
Tuning
Chapter 270
Coding considerations
The following are suggestions and items to consider when coding your MPI applications to
improve performance:
Use HP MPI collective routines instead of coding your own with point-to-point routines
because HP MPI’s collective routines are optimized to use shared memory where possible
for performance.
Use commutative MPI reduction operations.
Use the MPI predefined reduction operations whenever possible because they are
optimized.
When defining your own reduction operations, make them commutative.
Commutative operations give MPI more options when ordering operations allowing it
to select an order that leads to best performance.
Use MPI derived datatypes when you exchange several small size messages that have no
dependencies.
Minimize your use of MPI_Test() polling schemes to minimize polling overhead.
Code your applications to avoid unnecessary synchronization. In particular, strive to avoid
MPI_Barrier calls. Typically an application can be modified to achieve the same end
result using targeted synchronization instead of collective calls. For example, in many
cases a token-passing ring may be used to achieve the same coordination as a loop of
barrier calls.