SLURM Reference Manual for HP XC System Software

SRUN (Submit Jobs)
SRUN Roles and Modes
SRUN executes tasks ("jobs") in parallel on multiple compute nodes at the same time (on machines
where SLURM manages the resources). SRUN options let you both:
Specify the parallel environment for your job(s), such as the number of nodes used, node partition,
distribution of processes among nodes, and total time, and also
Control the behavior of your parallel job as it runs, such as by redirecting or labeling its output,
sending it signals, or specifying its reporting verbosity.
Because it performs several different roles, SRUN can be used in ve distinct ways or "modes":
SIMPLE.
The simplest way to use SRUN is to distribute execution of a serial program (such as a UNIX utility)
across a specied number or range of compute nodes. For example,
srun -N 8 cp ~/data1 /var/tmp/data1
copies (CP) le data1 from your common home directory into local disk space on each of eight
compute nodes. This is very like running simple programs in parallel under AIX by using IBM's
POE command (except that SRUN lets you set relevant environment variables on its own execute
line, unlike POE). In simple mode, SRUN submits your job to the local SLURM job controller,
initiates all processes on the specied nodes, and blocks until needed resources are free to run the
job if necessary. Many control options can change the details of this general pattern.
BATCH (WITHOUT LCRM).
SRUN can also directly submit complex scripts to the (Trivial Batch System, TBS) job queue(s)
managed by SLURM for later execution when needed resources become available and when no
higher priority jobs are pending. For example,
srun -N 16 -b myscript.sh
uses SRUN's -b option to place myscript.sh into the TBS queue to later run on 16 nodes. Scripts in
turn normally contain either MPI programs or other, simple invocations of SRUN itself (as shown
above). SRUN's -b option thus supports basic, local batch service even on machines where LC's
metabatch system LCRM has not yet been installed (see below). On BlueGene/L only, scripts must
invoke MPIRUN instead of simple SRUN to start tasks.
ALLOCATE.
To combine the job complexity of scripts with the immediacy of interactive execution, you can use
SRUN's "allocate" mode. For example,
srun -A -N 4 myscript.sh
uses SRUN's (uppercase) -A option to allocate specied resources (here, four nodes), spawn a subshell
with access to those resources, and then run multiple jobs using simple SRUN commands within the
specied script (here, myscript.sh) that the subshell immediately starts to execute. This is very like
allocating resources by setting AIX environment variables at the beginning of a script, and then using
them for scripted tasks. No job queues are involved.
SLURM Reference Manual - 17