Computer Hardware User Manual

Cluster Troubleshooting 149

7.6 User ID Problems

Within an HACMP cluster, you always have more than one node potentially

offering the same service to a specific user or a specific user id.

As the node providing the service can change, the system administrator has

to ensure that the same user and group is known to all nodes potentially

running an application. So, in case one node is failing, and the application is

taken over by the standby node, a user can go on working since the takeover

node knows that user under exactly the same user and group id.

Since user access within an NFS mounted file system is granted based on

user IDs, the same applies to NFS mounted file systems.

For more information on managing user and group accounts within a cluster,

refer to Chapter 2.7, “User ID Planning” on page 48, or to Chapter 12,

“Managing User and Groups in a Cluster” of the

HACMP for AIX, Version 4.3:

Administration Guide

, SC23-4279.

7.7 Troubleshooting Strategy

In order to quickly find a solution to a problem in the cluster, some sort of

strategy is helpful for pinpointing the problem. The following guidelines

should make the troubleshooting process more productive:

• Save the log files associated with the problem before they become

unavailable. Make sure you save the /tmp/hacmp.out and /tmp/cm.log files

before you do anything else to try to figure out the cause of the problem.

• Attempt to duplicate the problem. Do not rely too heavily on the user’s

problem report. The user has only seen the problem from the application

level. If necessary, obtain the user’s data files to recreate the problem.

• Approach the problem methodically. Allow the information gathered from

each test to guide your next test. Do not jump back and forth between

tests based on hunches.

• Keep an open mind. Do not assume too much about the source of the

problem. Test each possibility and base your conclusions on the evidence

of the tests.

• Isolate the problem. When tracking down a problem within an HACMP

cluster, isolate each component of the system that can fail and determine

whether it is working. Work from top to bottom, following the progression

described in the following section.