HP XC System Software User's Guide Version 3.2

9 Using SLURM
HP XC uses the Simple Linux Utility for Resource Management (SLURM) for system resource
management and job scheduling.
This chapter addresses the following topics:
“Introduction to SLURM” (page 91)
“SLURM Utilities” (page 91)
“Launching Jobs with the srun Command” (page 91)
“Monitoring Jobs with the squeue Command” (page 92)
“Terminating Jobs with the scancel Command” (page 93)
“Getting System Information with the sinfo Command” (page 93)
“Job Accounting” (page 94)
“Fault Tolerance” (page 94)
“Security” (page 94)
9.1 Introduction to SLURM
SLURM is a reliable, efficient, open source, fault-tolerant, job and compute resource manager
with features that make it suitable for large-scale, high performance computing environments.
SLURM can report on machine status, perform partition management, job management, and job
scheduling.
The SLURM Reference Manual is available on the HP XC Documentation CD-ROM and from the
following Web site:
http://www.llnl.gov/LCdocs/slurm/.
SLURM manpages are also available online on the HP XC system.
As a system resource manager, SLURM has the following key functions:
Allocate exclusive and/or non-exclusive access to resources (compute nodes) to users for
some duration of time so they can perform work
Provide a framework for starting, executing, and monitoring work (normally a parallel job)
on the set of allocated nodes
Arbitrate conflicting requests for resources by managing a queue of pending work
“How LSF-HPC and SLURM Interact” describes the interaction between SLURM and LSF-HPC.
9.2 SLURM Utilities
You interact with SLURM through its command line utilities. The basic utilities are listed here:
srun
squeue
scancel
sinfo
scontrol
For more information on any of these utilities, see the SLURM Reference Manual or the
corresponding manpage.
9.3 Launching Jobs with the srun Command
The srun command submits and controls jobs that run under SLURM management. The srun
command is used to submit interactive and batch jobs for execution, allocate resources, and
initiate job steps.
9.1 Introduction to SLURM 91