HP XC System Software Administration Guide Version 3.2

Running /sbin/service nagios restart on the non-headnode in the availability set causes
the nagios master to fail over.
21.5.4 Network Restart Command Negatively Affects Serviceguard
If a node is actively participating in a Serviceguard cluster, the Serviceguard tool manages some
HP XC services and their aliases. Because Serviceguard handles relocating these aliases after a
node dies, there are no network scripts defined for the aliases. Therefore, when you issue the
/sbin/service network restart command on a node in a Serviceguard cluster, the aliases
managed by Serviceguard are removed and never re-created. The services will not function
correctly and due to the HP XC configuration of Serviceguard, Serviceguard will not detect the
failure.
Follow this procedure of the head node to restart the network:
1. Transfer control of the database from Serviceguard to HP XC:
# transfer_from_avail
2. Restart the network:
# /sbin/service network restart
3. Return control of the database to Serviceguard:
# transfer_to_avail
21.5.5 Problem Failing Over Database Package Under Serviceguard
Occasionally, when the head node is running with Serviceguard and the head node becomes
unresponsive, the database package fails to start up successfully on the other node in the
availability set with the head node.
Serviceguard reports that the database package is running and that any other packages that are
dependent upon the database, namely lvs and nagios, will be down. However, the database
might not have actually started correctly.
Run the service mysqld status command to verify whether the database is running . If
the mysqld service is not running and to recover from this scenario, use the following commands
to restart the service under Serviceguard :
1. cmhaltpkg dbserver.{nodename}
2. cmrunpkg -n {other node in avail set} dbserver.{nodename}
3. cmmodpkg -e dbserver.{nodename}
To start any services that are dependent on the database service, issue the following commands
for each package. These commands enable Serviceguard to restart the database package and start
the remaining packages.
1. cmrunpkg -n {other node in avail set} {service}.{nodename}
2. cmmodpkg -e {service}.{nodename}
21.6 SLURM Troubleshooting
The following section discusses SLURM troubleshooting in terms of configuration issues and
run-time troubleshooting.
21.6.1 SLURM Configuration Issues
SLURM consists of the following primary components:
slurmctld
a master/backup daemon.
slurmd
a slave daemon.
Command binaries
The sinfo, srun, scancel, squeue, and scontrol commands.
21.6 SLURM Troubleshooting 261