Availability Guide for Problem Management

Problem Management Tools
Availability Guide for Problem Management125509
9-34
Tandem Failure Data System (TFDS)
Tandem Failure Data System (TFDS)
The Tandem Failure Data System (TFDS) is an operations management tool that isolates
software problems and provides automatic processor failure data collection, diagnosis,
and recovery services. TFDS automatically collects data from dumps of frozen or halted
processors, online processor dumps, and saveabend snapshot dumps. It also collects data
from Tandem Maintenance and Diagnostic System (TMDS) and Event Management
Service (EMS) log files and helps you recover quickly from processor halts.
How TFDS Works
TFDS monitors processors and automatically initiates a processor dump if a failure
occurs. The failed processor is reloaded automatically, and the processor dump is
analyzed with the incident database to determine whether the failure is the result of a
recurring or known defect. TFDS creates an incident database that tracks unique
problem occurrences.
If the failure is identified as a recurring or known defect, the dump file is removed from
the system, and the number of occurrences for the particular failure is tabulated. If the
failure cannot be identified as a recurring or known defect, the dump file, the TMDS and
EMS log files, and the CONFLIST file are automatically saved to tape, and a new
incident record is established in the database.
The TFDS tool configuration allows each of its features to be tailored to meet your
specific needs. These features include dump placement, automatic tape backup or
network file transfer, dump file analysis, and the ability to control the dumping and
reloading of failed processors. Figure 9-12 illustrates the TFDS architecture.