HP XC System Software Administration Guide Version 3.2

slurm.conf
The SLURM configuration file,
/hptc_cluster/slurm/etc/slurm.conf. This file contains all
the information necessary to understand how SLURM is configured
on HP XC systems, including the following:
Logging (syslog is the default logging mechanism)
Debug level (the debug levels range from 1 to 7; the default debug
level is 3)
Nodes (all nodes are listed by default)
Node partitions (only one by default)
Authentication (MUNGE is used by default )
SLURM uses the MUNGE package to authenticate users between nodes in the system. Both
MUNGE and SLURM require files that contain encrypted keys. The names of the SLURM files
are configured in the /hptc_cluster/slurm/etc/slurm.conf file. The MUNGE key
file is /opt/hptc/munge/etc/keys/.munge_key. These files must be replicated on every
node in the HP XC system, which occurs by default through SystemImager; see Chapter 11:
Distributing Software Throughout the System (page 139) for more information on software
distribution.
SLURM and MUNGE expect the following characteristics of the system configuration. Errors
can result unless all these conditions are true:
Each node must be synchronized to the correct time. Communication errors occur if the
node clocks differ.
User authentication must be available on every node. If not, non-root users will be unable
to run jobs.
The /hptc_cluster directory must be properly shared. It is exported from the head node
and mounted on all the other nodes. If this directory is not properly shared, the slurm.conf
configuration file will not be found and errors will result.
On systems using Quadrics system interconnects, the
/opt/hptc/libelanhosts/etc/elanhosts file must be properly configured with the
spconfig command, as described in the HP XC System Software Installation Guide. Otherwise,
system interconnect errors will occur, and you must restart SLURM. See “Configuring
SLURM System Interconnect Support” (page 172) for more information.
On systems using Quadrics system interconnects, the spconfig command might report
that a node has less memory than expected.
Verify the node's memory size by running the following command and scroll through the
output. Compare the node's memory with nodes of the same type.
# shownode config | les
Perform the following steps:
1. Log in to the node in question as superuser (root).
2. Run the following command:
# /opt/hptc/etc/nconfig.d/C50gather_data
3. Run the spconfig command from the head node again.
21.6.2 SLURM Run-Time Troubleshooting
The following describes how to overcome problems reported by SLURM while the HP XC system
is running:
262 Troubleshooting