RDF System Management Manual for J-series and H-series RVUs (RDF Update 13)

ManualsBrandsHP ManualsServerHP Integrity NonStop H-Series

111

112

113

114

115

116

117

118

119

120

NOTE: If you issue a STOP RDF command on the primary or backup system while the network

is down, you must also issue a STOP RDF command on the other system while the network is still

down.

If you have an RDF network running and the Network Master's RDFNET process encounters a

communications line failure when attempting to perform a network transaction on another primary

node in the RDF network, then it can lead to an increase in work to be performed during an RDF

Takeover operation. Once the comm line comes back up and the RDFNET process can resume its

network transactions, that need for increased takeover work is eliminated.

System Failures

If you lose your primary system and you can recover it without having to perform an RDF Takeover

operation, then no special recovery is required for RDF. When you have restarted your primary

system, then restart RDF before you restart your applications.

If you lose your primary system and you need to restart you applications as quickly as possible,

then perform the RDF Takeover operation on your backup system. Details of the various tasks you

need to do after the RDF Takeover are provided further below. Additionally, if you can eventually

recover your primary system, a discussion is also provided further below on how you can recover

the database on that system and bring it into synchronization with the database on your backup

system where your applications are now running.

If you lose your backup system, you only need to recover it and then restart RDF on your primary

system as quickly as possible. If the communications line to your backup system has sufficient

bandwidth, then RDF can catch up very quickly.

Processor Failures

All RDF processes other than RDFCOM run as process pairs. If a CPU failure causes a primary

RDF process to fail, the backup process takes over without interruption in service.

If any RDF process pair stops unexpectedly, the monitor sends abort messages to the other RDF

processes in order to bring about an orderly shutdown of RDF. You can then restart the subsystem

by merely issuing a START RDF command.

NOTE: If the monitor process pair unexpectedly stops (for example, as in a double CPU failure),

you must stop the other RDF processes manually and then restart the subsystem. The easiest way

to do this is to issue a series of commands of the following form: STATUS *,PROG

RDF-software-loc.procname, STOP.

The subtopics that follow discuss how RDF responds to extractor, receiver, updater, and RDF state

transition failures.

Extractor Failure

Although the extractor runs as a process pair, the primary process does not maintain restart

information nor checkpoint this information to its backup. Instead, the receiver maintains all restart

information for the extractor, ensuring that the extractor is restartable. The restart point is based

on the audit trail position of the last record stored in the image trail on the backup system.

If the extractor process pair inadvertently stops, you can (as stated ) restart the RDF subsystem by

merely issuing a START RDF command.

CAUTION: During the interval between loss of the extractor and RDF subsystem restart, you should

not add any disk volumes to the RDF configuration (with the ADD VOLUME command).

If the primary CPU of the extractor process fails, the backup extractor process requests from the

receiver a new starting position in the audit trail, ensuring a correct restart position. This

extractor-receiver protocol also provides protection against messages from the extractor erroneously

116 Critical Operations, Special Situations, and Error Conditions