Tandem Failure Data System (TFDS) Manual

Introduction to TFDS
HP Tandem Failure Data System (TFDS) Manual540122-003
1-5
Responding to Processor Down Messages
DMR and TMR Processors
In response to a Processor Down messages for a DMR or TMR processor, the TFDS
monitor:
1. Reloads the affected processor, leaving one NSBE in the STOPPED state.
2. Takes a dump of the STOPPED NSBE.
3. Reintegrates the STOPPED NSBE into the processor.
4. Analyzes the dump to build a failure signature.
5. Determines whether the failure is unique or redundant:
a. If the failure is new and unique (or DUMPOVERRIDE is set to ON), the TFDS
monitor:
1. Copies the dump file and various other files to a subvolume for further
analysis by your service provider. (For more information, see Files Included
in Failure Data Collection on page 1-7.)
2. Records the failure signature as a new incident record in the incident
database.
3. Sends a software failure event to the Event Management Service (EMS).
b. If the failure is redundant (a “rediscovery” of a previous incident), the TFDS
monitor:
1. Deletes the processor dump file.
2. Records a rediscovery incident in the incident database.
3. Sends a software failure event to the EMS indicating that the failure is a
rediscovery.
SMR Processors
In response to a Processor Down messages for an SMR processor, the TFDS monitor:
1. Analyzes the processor memory to build a failure signature and performs
rediscovery analysis.
2. Determines whether the failure is unique or redundant.
a. If the failure is unique (or DUMPOVERRIDE is set to ON), the TFDS monitor:
Note. When the DUMPALLSLICES TFDS configuration option is turned ON, TFDS
does not perform parallel dumping for certain processor halts. Instead, TFDS dumps all
NSBE
s in the processor and then reloads the processor. For more information, see
DUMPALLSLICES
on page 3-18 and ADD on page 2-14.