HP XC System Software Administration Guide Version 3.2

NOTE: If the user logged in from a node that is also a compute node, the epilog script also ends
the user's login. You can avoid this problem by editing the EPILOG_EXCLUDE_NODES variable
in the epilog file. It is empty by default. Specify the host names of the login nodes, separated by
spaces, so that the epilog script does not kill the user jobs on those nodes; for example:
EPILOG_EXCLUDE_NODES="n101 n102 n103 n104 n105"
The SLURM epilog is located at /opt/hptc/slurm/etc/slurm.epilog.clean initially.
You can maintain the file in this directory, move it to another directory, or move it to a shared
directory. If you decide to maintain this file in a local directory on each node, be sure to propagate
the SLURM epilog file to all the nodes in the HP XC system. The following example moves the
SLURM epilog file to a shared directory:
# mv /opt/hptc/slurm/etc/slurm.epilog.clean \
/hptc_cluster/slurm/slurm.epilog.clean
Enable this script by configuring it in the SLURM configuration file,
/hptc_cluster/slurm/etc/slurm.conf. Edit the Epilog declaration line in this file as
follows:
Epilog=/hptc_cluster/slurm/slurm.epilog.clean
Be sure to restart SLURM.
15.8 Maintaining the SLURM Daemon Log
By default SLURM daemon logs are stored in /var/slurm/log/ on each node that runs SLURM
daemons. The slurmctld controller daemon writes to the slurmctld.log file, and the slurmd
daemon writes to the slurmd.log file. These log files and their location are configured in the
slurm.conf file. You can view this information with the scontrol command, as follows:
# scontrol show config | grep LogFile
SlurmctldLogFile = /var/slurm/log/slurmctld.log
SlurmdLogFile = /var/slurm/log/slurmd.log
Over time these logs become large, particularly if you increase SLURM daemon debugging:
# scontrol show config | grep -i debug
SlurmctldDebug = 3
SlurmdDebug = 3
The daemon debug value ranges from 1 to 7, with 7 being very verbose. The default value is 3.
To cache these log files without disrupting SLURM operation, rename these files. Be sure the
new names are intuitive if you intend to archive them:
# mv /var/slurm/log/slurmctld.log{,.old}
# mv /var/slurm/log/slurmd.log{,.old}
Use the pdsh command to rename the files systemwide:
# scontrol ping
Slurmctld(primary/backup) at n16/n15 are UP/UP
# pdsh -w n[15-16] 'mv /var/slurm/log/slurmctld.log{,.old}'
# pdsh -a 'mv /var/slurm/log/slurmd.log{,.old}'
The SLURM daemons will still write to the renamed files. To have the daemons write to the new
daemon log files, issue the following command:
# scontrol reconfig
Now the SLURM daemons will write to the originally named log files. You can archive or delete
the old files.
You can automate the procedure for caching SLURM log files by using a cron job on the head
node set for an interval appropriate for your site.
15.8 Maintaining the SLURM Daemon Log 185