RDF/IMP, IMPX, and ZLT System Management Manual
Network Transactions
HP NonStop RDF/IMP, IMPX, and ZLT System Management Manual—524388-002
13-8
Communication Failures During Phase 3 Takeover
Processing
Communication Failures During Phase 3 Takeover Processing
If one RDF subsystem is unable to reach the backup system of another RDF subsytem
during phase 3 processing, phase 3 processing stalls until the communication line
comes back up. This can lengthen the overall duration of takeover operations on all
backup systems. Should this type of stall occur, the RDF subsystem issues an event
message alerting operators to the situation.
Takeover Delays and Purger Restarts
During phase 3 purger work, the network master needs information from the other
purger processes in the RDF network, and, during the latter part of phase 3
processing, the non-network master purgers need information from the purger of the
network master. When a purger process is waiting for information from another purger,
it waits for up to 60 seconds, during which time it does not respond to certain requests
(such as STATUS RDF). After a purger has waited 60 seconds, it quits the operation
and restarts. This allows the purger to read the $RECEIVE file, respond to messages
that have been waiting for replies, and then retry phase 3 processing.
Takeover Restartability
As has always been the case, the RDFCOM TAKEOVER command is restartable.
Therefore, if a takeover operation terminates prematurely for any reason on any
system in an RDF network, it can be restarted.
Takeover and File Recovery
When a takeover operation completes in an RDF network environment, the purger logs
two events: one reports a safe MAT position (indicating that all committed data up to
that location was successfully applied to the backup database), and the second (888 or
858) reports whether or not a File Recovery position is available for use on the primary
system. The RDF event 888 reports that a File Recovery is available and it includes
the exact sno and rba to be used for a File Recovery operation on the primary system.
If, however, “kept-commits” have been encountered during phase 2 processing, a File
Recovery position is not available; this is reported in RDF event 858.
If an RDF event 888 is reported, then the specified File Recovery position is based on
both phase 1 and phase 3 processing. Each system logs its own File Recovery
position. While that position may differ from one backup system to the next, the logged
position for any single system is correct. If you supply the returned File Recovery
position to the TMF file recovery process on the primary system, the process recovers
the files on the primary database up to that point. If you use File Recovery to a MAT
position on all primary systems in the RDF network, in each case using the returned
File Recovery positions, then your primary distributed database will be consistent
across the RDF network.
You would use the File Recovery position with File Recovery in situations such as the
following. Assume you have had an outage of your primary system, you have