HP XC System Software Administration Guide Version 3.2

The LSB_HOSTS and LSB_MCPU_HOSTS environment variables, as initially established by
LSF-HPC with SLURM, do not accurately reflect the host names of the HP XC system nodes that
SLURM allocated for the user's job. This JOB_STARTER script corrects these environment variables
so that existing applications compatible with LSF can use them without further adjustment.
The SLURM srun command used by the JOB_STARTER script ensures that every interactive
job submitted by a user begins on the first allocated node. Without the JOB_STARTER script, all
interactive user jobs would start on the LSF execution host. This behavior is not consistent with
batch job submissions or Standard LSF-HPC behavior in general, and creates the potential for a
bottleneck in performance as both the LSF-HPC with SLURM daemons and local user tasks
compete for processor cycles.
The JOB_STARTER script has one drawback: all interactive I/O runs through the srun command
in the JOB_STARTER script. This means full tty support is unavailable for interactive sessions,
resulting in no prompting when a shell is launched. The workaround is to set your display to
support launching an xterm instead of a shell.
The JOB_STARTER script is located at /opt/hptc/lsf/bin/job_starter.sh, and is
preconfigured for all the queues created during the default LSF-HPC with SLURM installation
on the HP XC system. HP recommends that you configure the JOB_STARTER script for all queues.
To disable the JOB_STARTER script, simply remove it or comment it out from the lsb.queues
configuration file. For more information on the JOB_STARTER option and configuring queues,
see Administering Platform LSF on the HP XC Documentation CD.
For more information on configuring JOB_STARTER scripts and how they work, see the Standard
LSF documentation
16.2.1.2 SLURM External Scheduler
The integration of LSF-HPC with SLURM includes the addition of a SLURM-based external
scheduler. Users can submit SLURM parameters in the context of their jobs. This enables users
to make specific topology-based allocation requests. See the HP XC System Software User's Guide
for more information.
16.2.1.3 SLURM lsf Partition
An lsf partition is created in SLURM; this partition contains all the nodes that LSF-HPC with
SLURM manages. This partition must be configured such that only the superuser can make
allocation requests (RootOnly=YES). This configuration prevents other users from directly
accessing the resources that are being managed by LSF-HPC with SLURM. The LSF-HPC with
SLURM daemons, running as the superuser, make allocation requests on behalf of the owner of
the job to be dispatched. This is how LSF-HPC with SLURM creates SLURM allocations for users'
jobs to be run.
The lsf partition must be configured so that the nodes can be shared by default (Shared=FORCE).
Thus, LSF-HPC with SLURM can allocate serial jobs by different users on a per-processor basis
(rather than on a per-node basis) by default, which makes the best use of the resources. This
setting also enables LSF-HPC with SLURM to support preemption by allowing a new job to run
while an existing job is suspended on the same resource.
SLURM nodes can be in various states. Table 16-1 describes how LSF-HPC with SLURM interprets
each node state.
Table 16-1 LSF-HPC with SLURM Interpretation of SLURM Node States
DescriptionNode
A node that is configured in the LSF partition and is not allocated to any job.
The node is in the following state:
Free
The node is not allocated to any job
and is available for use.
IDLE
16.2 LSF-HPC with SLURM 193