SLURM Reference Manual for HP XC System Software

Fault Tolerant--
Innovative scientic computing systems are often much less stable than routine business clusters,
so a good local resource manager should recover well from many kinds of system failures (without
terminating its workload), including failure of the node where its own control functions execute.
Open Source--
The software (source code) should be freely sharable under the GNU General Public License, as
with other nonproprietary CHAOS components.
Modular--
An approach that clearly separates high-level job-scheduling functions from low-level
cluster-administration functions allows for easier changes in scheduling policy without having to
sacrice working, familiar cluster-resource tools or features.
No commercial (or existing open source) resource manager meets all of these needs. So since 2001
Livermore Computing, in collaboration with Linux NetworX and Brigham Young University, has developed
and rened the "Simple Linux Utility for Resource Management" (SLURM).
SLURM Reference Manual - 7