RDF System Management Manual for J-series and H-series RVUs (RDF 1.10)

The purger of the network master determines what network transactions are incomplete across the
different backup systems, and it produces the master network undo list. Each purger then uses this
master list to ascertain the transaction data that must be undone on its backup database. For
example, if a network transaction involved only four of the ten primary systems in an RDF network,
then that transaction only needs to be undone on the backup databases where that data was
replicated. Because the other systems were not involved, the transaction does not need to be listed
there. The list of network transactions that need to be undone on a specific system resides in its
ZNETUNDO file.
Takeover Phase 3 Performance
The speed with which a takeover completes for an entire RDF network varies based on the number
of systems in the network and how far any system had fallen behind when the takeover was initiated.
For example, if you have three systems in your RDF network, and all extractors on all three systems
were keeping up with audit generation on their systems, and then one system fails, the takeover
operations might only take a modest number of additional seconds to complete phase 3 takeover
processing.
In contrast, if you have three systems in your RDF network, and one extractor had fallen 60 minutes
behind at the time its system went down, then phase 3 takeover processing on the other two systems
will take many more seconds to complete. The reason for this is that phase 3 processing on the
two systems that were not behind will have to go through 60 minutes of data to determine what
must be undone due to data missing on the system that had fallen behind.
A variation of the first example is that no extractors have fallen behind, but you have 25 systems
in your RDF network. In such a case, phase 3 processing might take many additional seconds
because data must be checked for so many different systems in order to determine what network
data might be missing from the various systems in the RDF network.
Communication Failures During Phase 3 Takeover Processing
If one RDF subsystem is unable to reach the backup system of another RDF subsystem during phase
3 processing, phase 3 processing stalls until the communication line comes back up. This can
lengthen the overall duration of takeover operations on all backup systems. Should this type of stall
occur, the RDF subsystem issues an event message alerting operators to the situation.
Takeover Delays and Purger Restarts
During phase 3 purger work, the network master needs information from the other purger processes
in the RDF network, and, during the latter part of phase 3 processing, the non-network master
purgers need information from the purger of the network master. When a purger process is waiting
for information from another purger, it waits for up to 60 seconds, during which time it does not
respond to certain requests (such as STATUS RDF). After a purger has waited 60 seconds, it quits
the operation and restarts. This allows the purger to read the $RECEIVE file, respond to messages
that have been waiting for replies, and then retry phase 3 processing.
Takeover Restartability
As has always been the case, the RDFCOM TAKEOVER command is restartable. Therefore, if a
takeover operation terminates prematurely for any reason on any system in an RDF network, it can
be restarted.
Takeover and File Recovery
When a takeover operation completes in an RDF network environment, the purger logs two events:
one reports a safe MAT position (indicating that all committed data up to that location was
successfully applied to the backup database), and the second (888 or 858) reports whether or
not a File Recovery position is available for use on the primary system. The RDF event 888 reports
that a File Recovery is available and it includes the exact sno and RBA to be used for a File Recovery
RDF Takeovers Within a Network Environment 283