HP-MPI Version 2.2 for HP-UX Release Note

HP-MPI V2.2 for HP-UX Release Note
What’s in This Version
15
MPI_RDMA_NONESIDED=N
Specifies the number of one-sided operations that can be posted concurrently for each rank, no
matter the destination. The default is 8.
MPI_MAX_REMSH=N
This release includes a startup scalability enhancement when using the -f option to mpirun.
This enhancement allows a large number of HP-MPI daemons (mpid) to be created without
requiring mpirun to maintain a large number of remote shell connections.
When running with a very large number of nodes, the number of remote shells normally
required to start all of the daemons can exhaust the available file descriptors. To create the
necessary daemons, mpirun now uses the remote shell specified with MPI_REMSH to create up
to 20 daemons only, by default. This number can be changed using the environment variable
MPI_MAX_REMSH. When the number of daemons required is greater than MPI_MAX_REMSH,
mpirun will create only MPI_MAX_REMSH number of remote daemons directly. The directly
created daemons will then create the remaining daemons using an n-ary tree, where n is the
value of MPI_MAX_REMSH. Although this process is generally transparent to the user, the new
startup requires that each node in the cluster is able to use the specified MPI_REMSH command
(e.g. rsh, ssh) to each node in the cluster without a password. The value of MPI_MAX_REMSH is
used on a per-world basis. Therefore, applications which spawn a large number of worlds may
need to use a small value for MPI_MAX_REMSH. MPI_MAX_REMSH is only relevant when using the
-f option to mpirun. The default value is 20.
MPI_RANKMEMSIZE=d
Specifies the shared memory for each rank. 12.5% is used as generic. 87.5% is used as
fragments. The only way to change this ratio is to use MPI_SHMEMCNTL. MPI_RANKMEMSIZE is
the opposite of MPI_GLOBMEMSIZE, where the total shared memory across all the ranks are
specified. MPI_RANKMEMSIZE takes precedence over MPI_GLOBMEMSIZE if both are set. Both
MPI_RANKMEMSIZE and MPI_GLOBMEMSIZE are mutually exclusive to MPI_SHMEMCNTL. If
MPI_SHMEMCNTL is set, then the user cannot set the other two, and vice versa.
MPI_GLOBMEMSIZE=e Where e is the total bytes of shared memory of the job. If the job size is
N, then each rank has e/N bytes of shared memory. 12.5% is used as generic. 87.5% is used as
fragments. The only way to change this ratio is to use MPI_SHMEMCNTL.
MPI_SHMEMCNTL=a,b,c
a The number of envelopes for shared memory communication. The default is
8.
b The bytes of shared memory to be used as fragments for messages. The
default is 12.5% of the total memory.