Tandem Failure Data System (TFDS) Manual
Introduction to TFDS
HP Tandem Failure Data System (TFDS) Manual—540122-003
1-4
Functionality
Functionality
Once installed and configured on your system, TFDS monitors your system
continuously for failures, and responds automatically to:
•
Processor Down messages.
•
Software failure reports from instruments embedded in programs.
Responding to Processor Down Messages
A Processor Down message is generated whenever a processor halts due to a software
or hardware failure. The TFDS monitor responds to Processor Down messages in one
of two ways, depending on the TFDS configuration settings and the current
configuration of the processor that experienced the failure. The possible configurations
are:
•
Single-modular redundancy (SMR)
•
Dual-modular redundancy (DMR)
•
Triple-modular redundancy (TMR)
Snapshot file An image of a process’s environment that contains all of the
data, the stack trace, and register states at a given point in
time
Snapshot server A program that accepts requests from Debug Services to save
Snapshot Files of a process under debug control.
TFDSCONF A file containing TFDS configuration information that is read by
the TFDS monitor at startup.
TFDS helper A process ($ZTHnn) that runs in each processor in the system
and is registered as a debugger with Debug Services. It
performs most of the functionality related to TFDS
instrumentation services
TFDS monitor A process pair that is responsible for the processor reload and
dumping functions, accepting instrument and system
messages, performing rediscovery, updating the incident
database, and sending EMS events to $ZLOG
Note. SMR is a degraded processor configuration involving a single functioning HP NonStop™
Blade Element (NSBE), and occurs only when the other NSBE(s) in the processor are
temporarily out of service.
Component Description (page 2 of 2)










