Managing HP Serviceguard for Linux, Tenth Edition, September 2012

8 Troubleshooting Your Cluster
This chapter describes how to verify cluster operation, how to review cluster status, how
to add and replace hardware, and how to solve some typical cluster problems. Topics
are as follows:
Testing Cluster Operation
Monitoring Hardware (page 290)
Replacing Disks (page 291)
Replacing LAN Cards (page 293)
Replacing a Failed Quorum Server System (page 294)
Troubleshooting Approaches (page 296)
Solving Problems (page 299)
Testing Cluster Operation
Once you have configured your Serviceguard cluster, you should verify that the various
components of the cluster behave correctly in case of a failure. In this section, the following
procedures test that the cluster responds properly in the event of a package failure, a
node failure, or a LAN failure.
CAUTION: In testing the cluster in the following procedures, be aware that you are
causing various components of the cluster to fail, so that you can determine that the
cluster responds correctly to failure situations. As a result, the availability of nodes and
applications may be disrupted.
Testing the Package Manager
To test that the package manager is operating correctly, perform the following procedure
for each package on the cluster:
1. Obtain the PID number of a service in the package by entering
ps -ef | grep <service_cmd>
where service_cmd is the executable specified in the package configuration file
(or legacy control script) by means of the service_cmd parameter (page 216). The
service selected must have the default service_restart value (none).
2. To kill the service_cmd PID, enter
kill <PID>
3. To view the package status, enter
cmviewcl -v
Testing Cluster Operation 289