RDF System Management Manual for H-Series RVUs (RDF 1.8)

ManualsBrandsHP ManualsServerHP Integrity NonStop J-Series

121

122

123

124

125

126

127

128

129

130

After a TMF file recovery to a timestamp or to first purge, or after a TMF subsystem failure for

which volume recovery cannot succeed, the databases or the affected files on the primary and

backup systems must be resynchronized.

If the primary system fails, you might want to request a takeover operation to switch application

processing to the backup system.

Communication Line Failures

RDF can recover from communication line failures. When the extractor detects that a

communication line to the backup system is down, it reports the error to the EMS event log. The

extractor attempts to resend data every minute until the line to the backup system is reenabled.

If you stop RDF on the primary system when a communication line to the backup system is

down, the monitor tries to send a stop message to the processes on the backup system and reports

that the line is down. All of the processes on the backup system continue to run until a STOP

RDF command is issued at the backup system.

NOTE: If you issue a STOP RDF command on the primary or backup system while the network

is down, you must also issue a STOP RDF command on the other system while the network is

still down.

Processor Failures

All RDF processes other than RDFCOM run as process pairs. If a CPU failure causes a primary

RDF process to fail, the backup process takes over without interruption in service.

If any RDF process pair stops unexpectedly, the monitor sends abort messages to the other RDF

processes in order to bring about an orderly shutdown of RDF. You can then restart the subsystem

by merely issuing a START RDF command.

NOTE: If the monitor process pair unexpectedly stops (for example, as in a double CPU failure),

you must stop the other RDF processes manually and then restart the subsystem. The easiest

way to do this is to issue a series of commands of the following form: STATUS *,PROG

RDF-software-loc.procname, STOP.

The subtopics that follow discuss how RDF responds to extractor, receiver, updater, and RDF

state transition failures.

Extractor Failure

Although the extractor runs as a process pair, the primary process does not maintain restart

information nor checkpoint this information to its backup. Instead, the receiver maintains all

restart information for the extractor, ensuring that the extractor is restartable. The restart point

is based on the audit-trail position of the last record stored in the image trail on the backup

system.

If the extractor process pair inadvertently stops, you can (as stated above) restart the RDF

subsystem by merely issuing a START RDF command.

CAUTION: During the interval between loss of the extractor and RDF subsystem restart, you

should not add any disk volumes to the RDF configuration (with the ADD VOLUME command).

If the primary CPU of the extractor process fails, the backup extractor process requests from the

receiver a new starting position in the audit trail, ensuring a correct restart position. This

extractor-receiver protocol also provides protection against messages from the extractor

erroneously arriving out-of-order: if a message arrives out-of-order, the receiver simply directs

the extractor to restart.

Responding to Operational Failures 125