SLURM Reference Manual for HP XC System Software

ManualsBrandsHP ManualsSoftwareHP XC System 3.x Software

Table Of Contents

UCRL-WEB-201386

SLURM Reference Manual

SLURM Reference Manual - 1

Summary of content (73 pages)

PAGE 1
UCRL-WEB-201386 SLURM Reference Manual SLURM Reference Manual - 1
PAGE 2
Table of Contents Preface Introduction SLURM Goals and Roles SLURM Goals SLURM Roles SLURM and Operating Systems SLURM Features SLURM Components SLURMCTLD SLURMD Portability (Plugins) User Impact Scheduler Types SLURM Operation SLURM Utilities SRUN (Submit Jobs) SRUN Roles and Modes Comparison with POE SRUN Run-Mode Options SRUN Resource-Allocation Options SRUN Control Options Node Management Working Features Resource Control Help and Message Options Prolog and Epilog Options Debug (Root) Options SRUN I/O O
PAGE 3
SINFO Output Fields SINFO Node States SINFO Examples SMAP (Show Job Geometry) SCONTROL (Manage Configurations) Disclaimer Keyword Index Alphabetical List of Keywords Date and Revisions SLURM Reference Manual - 3 58 60 61 63 65 67 68 70 72
PAGE 4
Preface Scope: This manual explains the design goals and unique roles of LC's locally developed Simple Linux Utility for Resource Management (SLURM), intended as a customized replacement for RMS or NQS in allocating compute resources (mostly nodes) to queued jobs on machines running the CHAOS operating system. Sections describe the features of both control daemon SLURMCTLD and local daemon SLURMD, as well as SLURM's adaptability by means of plugin modules.
PAGE 5
Introduction SLURM is LC's locally developed C-language Simple Linux Utility for Resource Management. SLURM is a job- and compute-resource manager that can run reliably and efficiently on Linux (CHAOS) clusters as large as several thousand nodes. Its features suit it to large-scale, high-performance computing environments, and its design avoids known weaknesses (such as inflexibility or fault intolerance) in available commercial resource management products for supercomputers.
PAGE 6
SLURM Goals and Roles SLURM Goals SLURM was developed specifically to meet locally important criteria for a helpful, efficient way to manage compute resources on large (Linux/CHAOS) clusters. The primary threefold purpose of a cluster resource manager (such as LoadLeveler on LC's IBM ASC machines or the Resource Management System (RMS) from Quadrics) is to: • Allocate nodes-- give users access (perhaps even exclusive access) to compute nodes for some specified time range so their job(s) can run.
PAGE 7
• Fault Tolerant-- Innovative scientific computing systems are often much less stable than routine business clusters, so a good local resource manager should recover well from many kinds of system failures (without terminating its workload), including failure of the node where its own control functions execute. • Open Source-- The software (source code) should be freely sharable under the GNU General Public License, as with other nonproprietary CHAOS components.
PAGE 8
SLURM Roles SLURM fills a crucial but mostly hidden role in running large parallel programs on large clusters. Most users who run batch jobs at LC use job-control utilities (such as PSUB or PALTER) that talk to the Livermore Computing Resource Management system (LCRM, formerly called DPCS), LC's locally designed metabatch system. LCRM: • Provides a common user interface for batch-job submittal across all LC machines and clusters. • Monitors resource use across machines and clusters.
PAGE 9
SLURM and Operating Systems SLURM was originally used as a resource manager for Linux (specifically for CHAOS) systems. But starting in 2006, LC began gradually replacing IBM's native LoadLeveler with SLURM on its AIX systems as well. The AIX-SLURM combination behaves (and has been configured by LC system administrators to behave) slightly differently than the CHAOS-SLURM combination, however.
PAGE 10
SLURM Features SLURM Components SLURM consists of two kinds of daemon (discussed here) and five command-line user utilities (next section (page 16)), whose relationships appear in this simplified architecture diagram: user>>SRUN -| ------------| | | SCANCEL-|--------| SLURMCTLD |--------| SCONTROL | | | SQUEUE -| ------------| | SINFO -| --------------------| | | SLURMD SLURMD SLURMD (...compute nodes...) SLURMCTLD SLURM's central control daemon is called SLURMCTLD.
PAGE 11
Job Manager accepts job requests (from SRUN (page 17) or a metabatch system like LCRM), places them in a priority-ordered queue, and reviews that queue periodically or when any state change might allow a new job to start. Qualifying jobs are allocated resources and that information transfers to (SLURMD on) the relevant nodes so the job can execute. When all nodes assigned to a job report that their work is done, the Job Manager revises its records and reviews the pending-job queue again.
PAGE 12
SLURMD The SLURMD daemon runs on every compute node of every cluster that SLURM manages and it performs the lowest level work of resource management. Like SLURMCTLD (above), SLURMD is multi-threaded for efficiency, but unlike SLURMCTLD it runs with root privilege (so it can initiate jobs on behalf of other users).
PAGE 13
Portability (Plugins) SLURM achieves portability (hardware independence) by using a general plugin mechanism. SLURM's configuration file tells it which plugin modules to accept. A SLURM plugin is a dynamically linked code object that the SLURM libraries load explicitly at run time. Each plugin provides a customized implementation of a well-defined API connected to some specific tasks. By means of this plugin approach, SLURM can easily change its: • interconnect support (default is Quadrics QsNet).
PAGE 14
User Impact The primary SLURM job-control tool is SRUN, (page 17) which fills the general role of PRUN (on former Compaq machines) or POE (on IBM computers). Your choice of run mode ("batch" or interactive) and your allocation of resources with SRUN strongly affect your job's behavior on machines where SLURM manages parallel jobs. SLURM works collaboratively with POE on AIX machines where SLURM has replaced IBM's LoadLeveler.
PAGE 15
Scheduler Types The system administrator for each machine can configure SLURM to invoke any of several alternative local job schedulers. You can discover which scheduler SLURM currently invokes on any machine by executing scontrol show config | grep SchedulerType where the returned string will have one of these values: builtin (default) is a first-in-first-out scheduler. SLURM executes jobs strictly in the order that they were submitted (for each resource partition).
PAGE 16
SLURM Operation SLURM Utilities SLURM's five command-line utilities provide its direct interface for users (while LCRM utilities, as explained in EZJOBCONTROL (URL: http://www.llnl.gov/LCdocs/ezjob), provide an indirect interface). These utilities are: SRUN submits jobs to run under SLURM management.
PAGE 17
SRUN (Submit Jobs) SRUN Roles and Modes SRUN executes tasks ("jobs") in parallel on multiple compute nodes at the same time (on machines where SLURM manages the resources).
PAGE 18
• ATTACH. You can monitor or intervene in an already running SRUN job, either batch (started with -b) or interactive ("allocated," started with -A), by executing SRUN again and "attaching" (-a, lowercase) to that job. For example, srun -a 6543 -j forwards the standard output and error messages from the running job with SLURM ID 6543 to the attaching SRUN to reveal the job's current status, and (with -j, lowercase) also "joins" the job so that you can send it signals as if this SRUN had initiated the job.
PAGE 19
Comparison with POE SRUN and AIX's POE (Parallel Operating Environment) both use UNIX environment variables to manage the resources for each parallel job that they run. Of course, variables with comparable roles have different names under each system (and both systems have many other environment variables for other purposes too).
PAGE 20
SRUN Run-Mode Options For a strategic comparison (with examples) of the five different ways to use SRUN, see "SRUN Roles and Modes," above. (page 17) This section explains the mutually exclusive SRUN options that enable its different run modes. Each option has a one-character (UNIX) and a longer (Linux) alternative syntax. -b (--batch) runs a script (whose name appears at the end of the SRUN execute line, not as an argument to -b) in batch mode. You cannot use -b with -A or -a. RESULT.
PAGE 21
-a jobid (lowercase, --attach=jobid) attaches (or reattaches) your current SRUN session to the already running job whose SLURM ID is jobid. The job to which you attach must have its resources managed by SLURM, but it can be either interactive ("allocated," started with -A) or batch (started with -b). This option allows you to monitor or intervene in previously started SRUN jobs. You cannot use -a with -b or -A.
PAGE 22
SRUN Resource-Allocation Options These SRUN options (used alone or in combination) assign compute resources to your parallel SLURM-managed job. Each option has a one-character (UNIX) and a longer (Linux) alternative syntax. See also SRUN's other options that can affect node management for your job, especially the control (page 24) options and constraint (page 36) options, in separate subsections below. -n procs (lowercase, --nprocs=procs) requests that SRUN execute procs processes.
PAGE 23
-c cpt (lowercase, --cpus-per-task=cpt) assigns cpt CPUs per process for this job (default is one CPU/process). This option supports multithreaded programs that require more than a single CPU/process for best performance. -n/-c COMBINATIONS. For multithreaded programs where the density of CPUs is more important than a specific node count, use both -n and -c on the same SRUN execute line (rather than -N). Thus -n 16 -c 2 results in whatever node allocation is needed to yield the requested 2 CPUs/process.
PAGE 24
SRUN Control Options These SRUN options control how a SLURM job manages its nodes and other resources, what its working features (such as job name) are, and how it gives you help. Separate "constraint" options (page 36) (which behave like PSUB constraints) and I/O options (page 33) appear in other subsections on SRUN. Most control options have a one-character, one-hyphen (UNIX) format and an alternative keyword, two-hyphen (Linux) format, shown together here.
PAGE 25
On BlueGene/L ONLY: --geometry=N[xM[xO]] specifies your job's size in "nodes" in each direction within BG/L's field of nodes (e.g., geometry=1x2x4 for 8 nodes). SLURM regards each BG/L 512-node dual-processor "base partition" as a single 1024-processor node. Use SLURM's SMAP utility (page 63) on BG/L to visualize job layout and the geometric intermixing of several jobs. If you omit --geometry on BG/L, then SRUN uses 1x1x1 as the default (or if you also use -N num then SRUN uses numx1x1 as the default).
PAGE 26
Working Features --begin=date|time|delay|special defers job start until the specified time value, which may be any one of these formats: --core=ctype date is any calendar date in the format month day|MMDDYY|MM/DD/YY|DD.MM.
PAGE 27
-J jobname (uppercase, --job-name=jobname) specifies jobname as the identifying string for this job (along with its system-supplied job ID, as stored in SLURM_JOBID) in responses to your queries about job status (the default jobname is the executable program's name). --jobid=jid initiates a job step under the already allocated job whose ID is jid (assigning jid to the environment variable SLURM_JOBID has the same effect).
PAGE 28
-X (uppercase, --disable-status) disables the (default) report of task status when SRUN receives a single CTRL-C (SIGINT), and instead forwards the interrupt to the running job. A second CTRL-C within one second terminates the job as well as SRUN.
PAGE 29
Resource Control -I (uppercase, --immediate) exits if requested resources are not available at once (by default, SRUN blocks until requested resources become available). -O (uppercase oh, --overcommit) overcommits CPUs. By default, SRUN never allocates more than one process per CPU. If you intend to assign multiple processes per CPU, you must invoke the -O option along with -n and -N (thus -n 16 -N 4 -O together allow 2 processes/CPU on the 4 allocated 2-CPU nodes).
PAGE 30
Help and Message Options --help lists the long (Linux) and, if there is one, the corresponding short (UNIX, one-character) name for every SRUN option, with a one-line description of each. Options appear in categories by function, not alphabetically. --mail-type=mtype notifies by e-mail the user specified by --mail-user when events of type mtype occur, where mtype can be any one of: begin reveals the start of this job. end reveals the successful completion of this job.
PAGE 31
Prolog and Epilog Options These SRUN options let you supplement your basic job with programs that precede or follow it. --prolog=executable causes SRUN to run executable just before launching a job step (if NONE, the default executable, then no prolog is run). This option overrides the SrunProlog parameter in the slurm.conf file. --epilog=executable causes SRUN to run executable just after a job step completes (if NONE, the default executable, then no epilog is run).
PAGE 32
Debug (Root) Options These special SRUN options allow root users to launch jobs as a user or group other than themselves for testing or debugging. --gid=ggroup (for root SRUN users only) submits this job with ggroup's group access permissions, there ggroup may be either the intended group name or the numerical group ID. --uid=uuser (for root SRUN users only) submits this job as uuser instead of the actual submitting user.
PAGE 33
SRUN I/O Options I/O Commands These SRUN commands manage and redirect the standard input to, as well as the standard output and error messages from, parallel jobs executed under SLURM. Three of these commands let you choose from among any of five I/O redirection alternatives ("modes") that are explained in the next section.
PAGE 34
I/O Redirection Alternatives SRUN I/O options (page 33) -i (--input), -o (--output), and -e (--error) all take as arguments any of five I/O redirection alternatives ("modes") summarized in this table and explained in more detail below it: Redirection Alternatives all [default] none taskid filename fstring File-Naming Subchoices %J [uc] %j [lc] %s [lc] %N [uc] %n [lc] %t [lc] Tasks Covered all tasks all tasks one selected task all tasks many separate tasks: all with jobid.
PAGE 35
Available parameters with which to construct fstring (and thereby to split the I/O among separate files) include: %J (uppercase) creates one file for each job ID/step ID combination for this running job, and imbeds jobid.stepid in each file's name (for example, out%J might yield files out4812.0, out4812.1, etc.). %j (lowercase) creates one file for each job ID and imbeds jobid in its name (for example, job%j might yield file job4812).
PAGE 36
SRUN Constraint Options These SRUN options all limit the nodes on which your job will execute to only those nodes having the properties ("constraints") that you specify here. General Constraints These SRUN constraints can apply to any job (unlike those in the next subsection).
PAGE 37
filename is a file that contains node information in either of the previous two formats (SRUN interprets any string containing the slash (/) character as a file name). -x hosts (lowercase, --exclude=hosts) specifies by name the individual nodes that must be excluded from the set of nodes on which your job runs (perhaps along with others unspecified). Option -x is incompatible with SRUN option -r (--relative). Here hosts may have any of three formats: host1,host2,...
PAGE 38
Affinity or NUMA Constraints These SRUN constraints apply only to machines where the task-affinity or the NUMA (NonUniform Memory Access) plugins have been enabled by the operating system. At LC, that includes only BlueGene/L. --cpu_bind=[quiet,|verbose,]type binds tasks to CPUs (to prevent the operating system scheduler from moving the tasks and spoiling possible memory optimization arrangements). q[uiet] (default) quietly binds CPUs before the tasks run.
PAGE 39
q[uiet] (default) quietly binds memory before the tasks run. v[erbose] verbosely reports memory binding before the tasks run. Here type can be any one of these mutually exclusive alternatives: no[ne] (default) does not bind tasks to memory. rank binds tasks to memory by task rank. local uses memory local to the processor on which each task runs. map_mem:idlist binds by mapping a node's memory to tasks as specified in idlist, a comma-delimited list cpuid1,cpuid2,...,cpuidn.
PAGE 40
Environment Variables To see how the SLURM environment variables discussed here fit into the larger context of all environment variables used at LC to manage jobs (both interactively and by LCRM in particular), consult the comparative sections of LC's Environment Variables user manual (URL: http://www.llnl.gov/LCdocs/ev). Option Variables. Many SRUN options have corresponding environment variables (analogous to the approach used with POE).
PAGE 41
Task-Environment Variables. In addition, SRUN sets these environment variables (a few are the same as option variables listed above) for each executing task on each remote compute node (any operating system). SLURM_CPU_BIND_VERBOSE affects the reporting of CPU/task binding, as explained in the "Affinity or NUMA Constraints" section (page 38) under --cpu_bind. SLURM_CPU_BIND_TYPE affects the binding of CPUs to tasks, as explained in the "Affinity or NUMA Constraints" section (page 38) under --cpu_bind.
PAGE 42
SLURM_NNODES is the actual number of nodes assigned to run your job (which may exceed the number of nodes that you explicitly requested with SRUN's -N option (page 22)). SLURM_NODEID specifies the relative node ID of the current node. SLURM_NODELIST specifies the list of nodes on which the job is actually running. SLURM_NPROCS specifies the total number of processes in the job. SLURM_PROCID specifies the MPI rank (or relative process ID) for the current process.
PAGE 43
Other SLURM-Relevant Variables. Other environment variables important for SRUN-managed jobs include: MAX_TASKS_PER_NODE provides an upper bound on the number of tasks that SRUN assigns to each job node, even if you allow more than one process per CPU by invoking SRUN's -O (uppercase oh) option. (page 29) SLURM_HOSTFILE names the file that specifies how to assign tasks to nodes, rather than using the block or cyclic approaches toggled by SRUN's -m (--distribution) option (page 24).
PAGE 44
Multiple Program Usage Strategy. SRUN's --multi-prog option (see SRUN Resource-Allocation Options above (page 22)) lets you assign to each parallel task in your job a different program with (if you wish) a different argument. If you invoke --multi-prog, then SRUN's own argument is not the name of one executable program (as usual) but rather the name of a local configuration file that specifies how to assign multiple programs and arguments among your job's tasks. For example, srun -n8 -l --multi-prog test.
PAGE 45
0: 1: 2: 3: 4: 5: 6: 7: offset:0 task:1 offset:1 offset:2 alc20.llnl.gov alc21.llnl.gov alc22.llnl.
PAGE 46
SQUEUE (List Jobs) SQUEUE Execute Line SQUEUE displays the job ID and job name for every job currently managed by the SLURM control daemon (SLURMCTLD) on the machine where you run SQUEUE, along with status and resource information for each job (such as time used so far, or a list of committed nodes), in a table whose content and format details you can control with SQUEUE options. (To report on node status rather than job status, use SINFO (page 53) instead.) BASIC RUN.
PAGE 47
SQUEUE Options Delimit all SQUEUE options with spaces (blanks), but delimit items in option-argument lists with commas unless otherwise noted (-o requires space-delimited arguments, for example). Enclose all argument lists in quotes (") for greater reliability. This section lists SQUEUE's control options alphabetically except for -o, which gets a separate subsection at the end because of its elaborate format-specification language. CONTROL OPTIONS.
PAGE 48
-S sortkeys (uppercase, --sort=sortkeys) sorts the (job) rows in SQUEUE's report using the sort keys specified in sortkeys, a comma-delimited list of the same field (column) specifiers used for and explained in the -o (--format) option below. The default order is ascending; prefix each field specifier with minus (-) for descending order. The default sort for jobs is --sort="P,t,-p" (increasing partition names, then increasing job states, then decreasing job priority).
PAGE 49
w is an integer specifying the width of this column in characters (omitting w uses just as much space as the data requires, which usually means that there is no column alignment from one row (= job) to the next). Z is a single case-sensitive letter that specifies the content (the job property) reported in this column (using the dictionary given below). For example, %.8j uses a right-justified (.) column 8 characters wide to report job name (lowercase j).
PAGE 50
SQUEUE Examples [1] GOAL: To display the default status report about all current SLURM-managed jobs on the machine (cluster) where you run SQUEUE. STRATEGY: Run SQUEUE with no options. An eight-column report, sorted by partition and then by time used (not by job ID or name), appears. SQUEUE automatically ends. Column ST here reports job STATE (status, see later section (page 52) for details). To add a time-limit column and see full-word status entries, use SQUEUE's -l (lowercase ell, --long) option.
PAGE 51
[2] GOAL: To build a customized status report about current SLURM-managed jobs, for example, showing only job names, requested features (if any), and time used, with all rows in alphabetical (instead of time-used) order. STRATEGY: (1) Use SQUEUE's -o (lowercase oh, --format) option to specify which specific columns (job properties) you want to report, the width of each column in characters, and the order for the columns to appear (left to right).
PAGE 52
SQUEUE Job State Codes Most SQUEUE reports use short codes (abbreviations) to reveal the state (current status) of each job that SLURM manages. The SQUEUE job-state codes and what they mean are explained here in alphabetical order. A separate section covers SINFO node state codes (page 60). Note that these SQUEUE codes differ from those used by PSTAT to report the status that LCRM/DPCS assigns to the batch jobs that it schedules (across machines "above" SLURM).
PAGE 53
SINFO (List Nodes) SINFO Execute Line SINFO reports current status information on node partitions and on individual nodes for computer systems managed by SLURM. SINFO's reports can help you plan job submittals and avoid hardware problems. SINFO's output is a table whose content and format you can control with SINFO options. (To report on job status rather than on node status, use SQUEUE (page 46) instead.) BASIC RUN.
PAGE 54
SINFO Options Delimit all SINFO options with spaces (blanks), but delimit items in option-argument lists with commas unless otherwise noted (-o requires space-delimited arguments, for example). Enclose all argument lists in quotes (") for greater reliability. This section lists SINFO's control options alphabetically except for -o, which gets a separate subsection at the end because of its elaborate format-specification language. HELP OPTIONS.
PAGE 55
-l (lowercase ell, --long) displays four more output fields (page 58) (columns) than in SINFO's default report (JOB_SIZE, ROOT, SHARE, GROUPS), but no more rows (instead try --all). This option is incompatible with -o (--format). Combining -N with -l reports CPU count, memory size, disk space, scheduling weight, and declared features (if any). -n nodes (lowercase, --nodes=nodes) reports information only for the specified nodes. For nodes use a quoted full node name (e.g.
PAGE 56
-t statelist (lowercase, --states=statelist) limits SINFO's report to nodes with the specified states, where statelist is a quoted, comma-delimited list with these possible members (case insensitive): ALLOC ALLOCATED COMP COMPLETING DOWN DRAIN DRAINED DRAINING IDLE UNK UNKNOWN. By default, SINFO reports on nodes in the specified states whether they are responding or not, but you can use -d or -r to filter this report further.
PAGE 57
.(dot) requests right justification of this column's data (the default omits the dot and uses left justification of the reported data). w is an integer specifying the width of this column in characters (omitting w uses just as much space as the data requires, which usually means that there is no column alignment from one row (= node) to the next). Z is a single case-sensitive letter that specifies the content (the node property) reported in this column (using the dictionary given below).
PAGE 58
SINFO Output Fields SINFO reports are tables each column of which lists values for some node-related field or property. This section explains all the column heads ("output field" labels) that can possibly appear in an SINFO report (and, when not obvious from the column content, tells which SINFO option generates a report that includes the column in question). Option -h (--noheader) eliminates these column heads for easier reuse of SINFO's output by other programs.
PAGE 59
REASON shows the first 35 characters of the field optionally provided by each SLURM administrator to explain why a node's STATE is either DOWN or DRAINED. Use -R (--list-reasons) to get this column; use -Rl to get both REASON and STATE in the same SINFO report. The default REASON is "null." ROOT reveals if the ability to allocate resources in a reported partition is restricted to the root user (YES or NO).
PAGE 60
SINFO Node States In SINFO reports, the strings below are the only possible values of the STATE column, indicating the current status of a node, a set of nodes, or a node partition. STATE codes with * appended indicate that a reported node is not responding (SLURM does not allocate new work to such nodes, which eventually enter the DOWN state).
PAGE 61
SINFO Examples [1] GOAL: To display the default status report about all SLURM-managed nodes on the machine (cluster, here MCR) where you run SINFO. STRATEGY: Run SINFO with no options. An six-column report (allegedly sorted by node state reported) appears. SINFO automatically ends. In this report, * appended to a partition name indicates the default partition, while * appended to a STATE value indicates that the node reported on that row is not currently responding.
PAGE 62
[2] GOAL: To build a customized status report about specific nodes on a SLURM-managed cluster (here, MCR), for example, showing only CPUs/node, temporary disk space per node, and allowed nodes/job. STRATEGY: (1) Specify which nodes you want reported by using SINFO's -n (--nodes) option (which here selects MCR nodes from 1 to 500 inclusive).
PAGE 63
SMAP (Show Job Geometry) ROLE. On BlueGene/L only, the SMAP utilty reveals not only which nodes are allocated to currently running jobs but also the geometric arrangement of those nodes (and hence, the way that BG/L jobs fit among one another topographically). On BG/L, SMAP thus supplements SINFO (page 53) and SQUEUE (page 46) as a visually enhanced way to monitor job interactions and to plan spatially for new node allocations. PREREQUISITES. (1) SMAP runs only on LC's BlueGene/L (BG/L) machine.
PAGE 64
(2) Unassigned (idle) BG/L base partitions ("nodes" to SLURM) are shown as a period (.) in the map. (3) Down/drained base partitions (unavailable for use) are shown as a pound sign (#). In this example, only BGL703 is down.
PAGE 65
SCONTROL (Manage Configurations) ROLE. SCONTROL is the SLURM utility that manages SLURM's own configuration, including the properties that it assigns to nodes, node partitions, and other SLURM-controlled system features. Most SCONTROL options and commands are intended for, and can only be successfully executed by, a system administrator (a privileged or root user).
PAGE 66
show entity id displays the current state of the SLURM-managed item that you specify, where entity can be any of these alternative literal strings: config [see "Scheduler Types" above (page 15)] daemons job node partition step id specifies which individual entity to report (for example, by providing a node name (e.g., mcr123), a partition name (e.g., pdebug), or a job ID number (e.g., 1428)).
PAGE 67
Disclaimer This document was prepared as an account of work sponsored by an agency of the United States Government. Neither the United States Government nor the University of California nor any of their employees, makes any warranty, express or implied, or assumes any legal liability or responsibility for the accuracy, completeness, or usefulness of any information, apparatus, product, or process disclosed, or represents that its use would not infringe privately owned rights.
PAGE 68
Keyword Index To see an alphabetical list of keywords for this document, consult the next section (page 70). Keyword ------entire title scope availability who Description ----------This entire document. The name of this document. Topics covered in this document. Where SLURM runs. Who to contact for assistance. introduction Overview of SLURM features, comparisons. slurm-strategy slurm-goals slurm-roles slurm-systems Special benefits built into SLURM. SLURM design goals as resource manager.
PAGE 69
sinfo sinfo-execute-line sinfo-options sinfo-output-fields node-states sinfo-examples Node status/property reporting utility. How to run SINFO. Controlling, customizing SINFO output. Column heads in SINFO reports explained. SINFO node state (status) codes. Standard and customized SINFO reports. smap Job geometry utility (BlueGene/L only). scontrol Sys admin configuration utility. index a date revisions The The The The structural index of keywords. alphabetical index of keywords.
PAGE 70
Alphabetical List of Keywords Keyword ------a availability constraint-options control-options date debug-options entire environment-variables general-constraints help-options i-o-alternatives i-o-commands i-o-options index introduction job-states multi-prog-usage node-management node-states numa-constraints poe-comparison portability prolog-options resource-allocation resource-control revisions run-mode-options scheduler-types scontrol scope sinfo sinfo-examples sinfo-execute-line sinfo-options sinfo-output
PAGE 71
user-impact who working-features SLURM's effect on typical jobs. Who to contact for assistance. Verbosity, job name, path options.
PAGE 72
Date and Revisions Revision Date -------12Sep06 Keyword Description of Affected Change ------------slurm-systems Operating system comparison section added. prolog-options SRUN prolog/epilog section added. debug-options SRUN root-user special options added. numa-constraints SRUN CPU and NUMA constraints added. multi-prog-usage SRUN tips on multiple programs added. srun Many options added, details updated. environment-variables More details, option/variable table added. index New keywords for 5 new sections.
PAGE 73
index New keyword for new section. 18May04 squeue index New sections on monitoring tool. New keywords for new sections. 17Mar04 control-options 14 SRUN options added in 4 subsections. i-o-options 5 SRUN options for I/O redirection explained. constraint-options 8 node-constraint options added. environment-variables 2 more env. variables explained. index 6 new keywords for new sections. 21Oct03 srun introduction index Major SRUN features, options explained. SRUN's central role introduced.