RDF/IMP, IMPX, and ZLT System Management Manual

Managing RDF
HP NonStop RDF/IMP, IMPX, and ZLT System Management Manual524388-002
5-7
Processor Failures
If any RDF process pair stops unexpectedly, the monitor sends an abort message to all
other RDF processes.
The subtopics that follow discuss how RDF responds to extractor, receiver, updater,
and RDF state transition failures.
Extractor Failure
Although the extractor runs as a process pair, the primary process does not maintain
restart information nor checkpoint this information to its backup. Instead, the receiver
maintains all restart information for the extractor, ensuring that the extractor is
restartable. The restart point is based on the audit trail position of the last record stored
in the image trail on the backup system.
If the extractor process pair inadvertently stops, the monitor sends abort messages to
the other RDF processes in order to bring about an orderly shutdown of RDF. You can
then restart the subsystem by merely issuing a START RDF command.
If the primary extractor process fails, the backup process requests from the receiver a
new starting position in the audit trail, ensuring a correct restart position. This extractor-
receiver protocol also provides protection against messages from the extractor
erroneously arriving out-of-order: if a message arrives out-of-order, the receiver simply
directs the extractor to restart. When the primary CPU that failed comes back up, RDF
switches the new primary (formerly the backup) extractor process so that it runs in the
primary CPU.
Receiver Failure
If the primary receiver process stops, the backup receiver process takes over and
resynchronizes with the extractor process. The extractor process might have to resend
audit data that was generated several seconds earlier. When the primary CPU that
failed comes back up, RDF switches the new primary (formerly the backup) receiver
process so that it runs in the primary CPU.
Note. If the monitor process pair unexpectedly stops (for example, as in a double CPU failure),
you must stop the other RDF processes manually and then restart the subsystem. The easiest
way to do this is to issue a series of commands of the following form: STATUS *,PROG RDF-
software-loc.procname, STOP. The following command provides an example:
STATUS *, PROG RDF-software-loc.RDFUPDO, STOP
The RDF-software-loc could, for example, be $SYSTEM.RDF. Note that issuing this
command in this situation is only safe, however, if this is the backup system for a single RDF
environment. Alternatively, after stopping the extractor, you can stop all updaters and the
receiver on the backup system by issuing a STOP RDF command on the backup system.
Caution. During the interval between loss of the extractor and RDF subsystem restart, you
should not add any disk volumes to the RDF configuration (with the ADD VOLUME command).