HP XC System Software Administration Guide Version 3.2

ManualsBrandsHP ManualsSoftwareHP XC System 3.x Software

261

262

263

264

265

266

267

268

269

270

Running /sbin/service nagios restart on the non-headnode in the availability set causes

the nagios master to fail over.

21.5.4 Network Restart Command Negatively Affects Serviceguard

If a node is actively participating in a Serviceguard cluster, the Serviceguard tool manages some

HP XC services and their aliases. Because Serviceguard handles relocating these aliases after a

node dies, there are no network scripts defined for the aliases. Therefore, when you issue the

/sbin/service network restart command on a node in a Serviceguard cluster, the aliases

managed by Serviceguard are removed and never re-created. The services will not function

correctly and due to the HP XC configuration of Serviceguard, Serviceguard will not detect the

failure.

Follow this procedure of the head node to restart the network:

1. Transfer control of the database from Serviceguard to HP XC:

# transfer_from_avail

2. Restart the network:

# /sbin/service network restart

3. Return control of the database to Serviceguard:

# transfer_to_avail

21.5.5 Problem Failing Over Database Package Under Serviceguard

Occasionally, when the head node is running with Serviceguard and the head node becomes

unresponsive, the database package fails to start up successfully on the other node in the

availability set with the head node.

Serviceguard reports that the database package is running and that any other packages that are

dependent upon the database, namely lvs and nagios, will be down. However, the database

might not have actually started correctly.

Run the service mysqld status command to verify whether the database is running . If

the mysqld service is not running and to recover from this scenario, use the following commands

to restart the service under Serviceguard :

1. cmhaltpkg dbserver.{nodename}

2. cmrunpkg -n {other node in avail set} dbserver.{nodename}

3. cmmodpkg -e dbserver.{nodename}

To start any services that are dependent on the database service, issue the following commands

for each package. These commands enable Serviceguard to restart the database package and start

the remaining packages.

1. cmrunpkg -n {other node in avail set} {service}.{nodename}

2. cmmodpkg -e {service}.{nodename}

21.6 SLURM Troubleshooting

The following section discusses SLURM troubleshooting in terms of configuration issues and

run-time troubleshooting.

21.6.1 SLURM Configuration Issues

SLURM consists of the following primary components:

slurmctld

a master/backup daemon.

slurmd

a slave daemon.

Command binaries

The sinfo, srun, scancel, squeue, and scontrol commands.

21.6 SLURM Troubleshooting 261