HP AA HP Netserver 4000 Reference Guide
HP NetServer AA
Hewlett-Packard Company
7-2
Overview of Troubleshooting in a HP AA Environment
The HP AA system is a fault tolerant system. When faults occur (for
example, a failed network adapter) the system continues to operate.
While the system is operational, any additional failures to the faulted
components redundant counterpart can affect the availability of the
system.
Returning the system to a state of fully fault tolerant consists of a
series of actions. These actions fall under the following basic
categories:
• Diagnosing the Fault
• Isolating the Fault
• Correcting the Fault
The overall approach for this system is to focus primarily on
information gathering and only when a sufficient amount of data is
collected should an action take place. It is critical to understand that
the more analysis done up front, the less likely server availability
will have to be sacrificed.
Diagnosing Faults
In the HPAA environment there are four basic methods used to
diagnose the source of a fault.
These methods are:
Marathon Manager: This tool can be used to quickly examine the
status of a component. The color coding used in the Administration
Window and Device Status window quickly identify components
that are in a degraded state.
SSDL status lights: The front of the SSDL displays power and
connection status for the HPAA components. Because it is clearly
visible when approaching the array, this should be one of the first
components examined when performing fault diagnosis.
Windows NT Event View: The event view accumulates all events
associated with the Windows NT operating system and the HP AA
components. This is the primary tool used for detailed fault
diagnosis.
Marathon Event Log: The events displayed in this log are the same
as the NT Event viewer. There are two differences, the Marathon
Event Log is DOS based, and second it displays only Marathon
events as they occur.