Parallel Programming Guide for HP-UX Systems

MPI
Tuning
Chapter 266
Message bandwidth is the reciprocal of the time needed to transfer a byte. Bandwidth is
normally expressed in megabytes per second. Bandwidth becomes important when message
sizes are large.
To improve latency or bandwidth or both:
Reduce the number of process communications by designing
coarse-grained applications.
Use derived, contiguous data types for dense data structures to eliminate unnecessary
byte-copy operations in certain cases. Use derived data types instead of MPI_Pack and
MPI_Unpack if possible. HP MPI optimizes noncontiguous transfers of derived data types.
Use collective operations whenever possible. This eliminates the overhead of using
MPI_Send and MPI_Recv each time when one process communicates with others. Also, use
the HP MPI collectives rather than customizing your own.
Specify the source process rank whenever possible when calling
MPI routines. Using MPI_ANY_SOURCE may increase latency.
Double-word align data buffers if possible. This improves byte-copy performance between
sending and receiving processes because of double-word loads and stores.
Use MPI_Recv_init and MPI_Startall instead of a loop of MPI_Irecv calls in cases
where requests may not complete immediately.
For example, suppose you write an application with the following code section:
j = 0
for (i=0; i<size; i++) {
if (i==rank) continue;
MPI_Irecv(buf[i], count, dtype, i, 0, comm, &requests[j++]);
}
MPI_Waitall(size-1, requests, statuses);
Suppose that one of the iterations through MPI_Irecv does not complete before the next
iteration of the loop. In this case, HP MPI tries to progress both requests. This progression
effort could continue to grow if succeeding iterations also do not complete immediately,
resulting in a higher latency.
However, you could rewrite the code section as follows:
j = 0
for (i=0; i<size; i++) {
if (i==rank) continue;
MPI_Recv_init(buf[i], count, dtype, i, 0, comm,
&requests[j++]);
}
MPI_Startall(size-1, requests);
MPI_Waitall(size-1, requests, statuses);