User guide
6–SHMEM Description and Configuration
Sizing Global Shared Memory
IB0054606-02 A 6-9
The salloc allocates 16 nodes and runs one copy of shmemrun on the first 
allocated node which then creates the SHMEM processes. shmemrun invokes 
mpirun, and mpirun determines the correct set of hosts and required number of 
processes based on the slurm allocation that it is running inside of. Since 
shmemrun is used in this approach there is no need for the user to set up the 
environment.
No Integration
This approach allows a job to be launched inside a slurm allocation but with no 
integration. This approach can be used for any supported MPI implementation. 
However, it requires that a wrapper script is used to generate the hosts file. slurm 
is used to allocate nodes for the job, and the job runs within that allocation but not 
under the control of the slurm daemon. One way to use this approach is:
salloc -N 16 shmemrun_wrapper shmem-test-world
Where shmemrun_wrapper is a user-provided wrapper script that creates a 
hosts file based on the current slurm allocation and simply invokes mpirun with 
the hosts file and other appropriate options. Note that ssh/rsh will be used for 
starting processes not slurm.
Sizing Global Shared Memory
SHMEM provides shmalloc, shrealloc and shfree calls to allocate and 
release memory using a symmetric heap. These functions are called collectively 
across the processing elements (PEs) so that the memory is managed 
symmetrically across them. The extent of the symmetric heap determines the 
amount of global shared memory per PE that is available to the application.
This is an important resource and this section discusses the mechanisms 
available to size it. Applications can access this memory in various ways and this 
maps into quite different access mechanisms:
 Accessing global shared memory on my PE: This is achieved by direct loads 
and stores to the memory.
 Accessing global shared memory on a PE on the same host: This is 
achieved by mapping the global shared memory using the local shared 
memory mechanisms (for example, System V shared memory) operating 
system and then accessing the memory by direct loads and stores. This 
means that each PE on a host needs to map the global shared memory of 
each other PE on that host. These accesses do not use the adapter and 
interconnect.
 Accessing global shared memory on a PE on a different host: This is 
achieved by sending put, get, and atomic requests across the interconnect.










