SLURM Reference Manual for HP XC System Software

SLURM Features
SLURM Components
SLURM consists of two kinds of daemon (discussed here) and ve command-line user utilities (next
section (page 16)), whose relationships appear in this simplied architecture diagram:
user>>SRUN -| -------------
| | |
SCANCEL-|--------| SLURMCTLD |--------| SCONTROL
| | |
SQUEUE -| -------------
| |
SINFO -| ---------------------
| | |
SLURMD SLURMD SLURMD
(...compute nodes...)
SLURMCTLD
SLURM's central control daemon is called SLURMCTLD. Unlike the Portable Batch System daemon,
SLURMCTLD is multi-threaded, so some threads can handle problems without delaying service to
continuing normal jobs that also need attention. SLURMCTLD runs on a single management node (with
a fail-over spare copy elsewhere for safety), reads the SLURM conguration le, and maintains state
information on:
nodes (the basic compute resource),
partitions (logically disjoint sets of nodes),
jobs (or resource allocations to run jobs for a time period), and
job steps (parallel tasks within a job). Job steps are not supported on BlueGene/L.
The SLURMCTLD daemon in turn consists of three software subsystems, each with a specic role:
Node Manager
monitors the state and conguration of each node in the cluster. It receives state-change
messages from each compute node's SLURMD daemon asynschonously, and it also
actively polls those daemons periodically for status reports.
Partition Manager
groups nodes into disjoint sets (partitions) and assigns job limits and access controls
to each partition. The partition manager also allocates nodes to jobs (at the request
of the Job Manager, below) based on job and partition properties. SCONTROL is the
(privileged) user utility that can alter partition properties.
SLURM Reference Manual - 10