HP-MPI Version 2.2 for HP-UX Release Note

HP-MPI V2.2 for HP-UX Release Note
Important Notice for InfiniBand Customers
17
Important Notice for InfiniBand Customers
Under certain circumstances, a running MPI application may hang in a partial teardown
state, with only a subset of the MPI ranks still running. The most likely cause for this is a
reboot of the system where mpirun is running for that application. In this scenario, it is
possible for the hosts with the remaining processes to experience intermittent failures when
attempting to start new MPI jobs. The error message looks like:
a.out: Rank 0:1 MPI_Init: it_evd_wait1 unexpected event number 8197
To clear up this problem, the remaining processes from the hung application have to be
terminated.
If the host where mpirun was launched is up, you can first run mpijob on that host to
determine the jobid of the hung application. Then use mpiclean to clean up the application. If
the host where mpirun was launched is unavailable, you will need to search the execution
hosts using the UNIX ps command, and use the UNIX kill command to manually terminate
the remaining processes.
NOTE InfiniBand is not available on PA-RISC platforms for HP-MPI.
MLOCK Privilege for InfiniBand Use on HP-UX
When setting up InfiniBand on an HP-UX system, all users (other than root) who wish to use
InfiniBand need to have their group id in the /etc/privgroup file and the permissions for
access must be enabled via:
% setprivgrp -f /etc/privgroup
The above may be done automatically at boot time, but should also be performed once
manually after setup of the InfiniBand drivers to ensure access. For example:
% grep user /etc/passwd
user:UJqaKNCCsESLo,O.fQ:836:1007:User Name:/home/user:/bin/tcsh
% grep epm /etc/group
epm::1007:
% cat /etc/privgroup