Availability Guide for Problem Management
Recovering From Unplanned Outages
Availability Guide for Problem Management–125509
3-17
Tools for Problem Analysis
Tandem Failure Data System (TFDS)
TFDS isolates software problems and provides automatic processor failure data
collection, diagnosis, and recovery services. TFDS automatically collects the following
types of data, some of which might be needed by an analyst at the Tandem NonStop
Support Center (TNSC):
•
Dumps of frozen or halted processors
•
Online processor dumps
•
Process snapshot file dumps
•
Dumps of NonStop process pairs
•
Event Management Service (EMS) log files
•
CONFLIST system configuration data files
•
CONFAUX and CONFTEXT system data files
•
Incident data records
TFDS monitors processors and automatically initiates a processor dump if a failure
occurs. The failed processor is reloaded automatically, and the processor dump is
analyzed or compared against the incident database to determine whether the failure is
the result of a recurring or known defect. TFDS also logs unique problem occurrences in
an incident database.
If the failure is identified as a recurring or known defect, the dump file is removed from
the system, and the number of occurrences for the particular failure is tabulated. If the
failure cannot be identified as a recurring or known defect, the dump file, the EMS log
files, and the system configuration data (CONFLIST) file are automatically saved to
tape, and a new incident record is established in the database.
CPUDUMP Command
The CPUDUMP command requests a processor dump whenever the processor is down.
The command syntax is as follows:
Failure Data Capture
The goal of capturing failure data is to determine the underlying causes of unplanned
outages. While capturing failure data is important, it must be done with minimal impact
at the time of the outage. You should focus on current system data that provides a
snapshot of the system and application activities occurring at the time of the outage. You
should also collect static system data such as the version of the application and Tandem
system software, or the command files used to start the system or the application.
There are a number of TACL commands available to help you collect dynamic system
information, as follows:
•
STATUS * provides information about the processes and their associated priorities,
and file names in each of the system’s processors.
•
DSAP * provides information about free and allocated space on each of the disks in
the system.
CPUDUMP { CPU 1 [ , CPU 2, ... CPU n ] }
{ ALL }