Parallel Programming Guide for HP-UX Systems

MPI

Tuning

Chapter 2 69

When a host is over subscribed, application performance decreases because of increased

context switching.

Context switching can degrade application performance by slowing the computation phase,

increasing message latency, and lowering message bandwidth. Simulations that use

timing–sensitive algorithms can produce unexpected or erroneous results when run on an

over-subscribed system.

In a situation where your system is oversubscribed but your MPI application is not, you can

use gang scheduling to improve performance.

MPI routine selection

To achieve the lowest message latencies and highest message bandwidths for point-to-point

synchronous communications, use the MPI blocking routines MPI_Send and MPI_Recv. For

asynchronous communications, use the MPI nonblocking routines MPI_Isend and MPI_Irecv.

When using blocking routines, try to avoid pending requests. MPI must advance nonblocking

messages, so calls to blocking receives must advance pending requests, occasionally resulting

in lower application performance.

For tasks that require collective operations, use the appropriate MPI collective routine. HP

MPI takes advantage of shared memory to perform efﬁcient data movement and maximize

your application’s communication performance.

Multilevel parallelism

There are several ways to improve the performance of applications that use multilevel

parallelism:

• Use the MPI library to provide coarse-grained parallelism and a parallelizing compiler to

provide ﬁne-grained (that is, thread-based) parallelism. An appropriate mix of coarse- and

ﬁne-grained parallelism provides better overall performance.

• Assign only one multithreaded process per host when placing application processes. This

ensures that enough processors are available as different process threads become active.

Over subscribed More active processes than

processors

Table 2-7 Subscription types (Continued)

Subscription type Description