RDF/IMP and IMPX System Management Manual (RDF 1.3+)

Network Transactions
Compaq NonStop™ RDF/IMP and IMPX System Management Manual522204-001
13-6
Takeover Phase Two – Network Undo
Takeover Phase Two – Network Undo
Phase two determines if network transaction data is missing from any of the backup
systems in the RDF network, and marks those transactions to be undone on all of the
systems. For example, suppose you began a network transaction, updated tables on ten
different systems, and then committed the transaction. Now suppose that nine of the ten
systems were able to transmit their updates and commit records to their backup systems,
but the tenth primary system went down before its extractor was able to do so. Phase
two determines that the particular transaction involved all ten databases, that one of the
backup databases is missing audit data for that transaction, and identifies the transaction
as one that must be undone on the other nine systems (it is undone during phase 1 on the
tenth system). All of the updaters then look for audit data associated with the
transaction, and undo it.
More specifically, each purger process has two phases of work to do:
1. produce the local undo list in the ZTXUNDO file
2. produce the network undo list in the ZNETUNDO file
The purger of the network master determines what network transactions are incomplete
across the different backup systems, and it produces the master network undo list. Each
purger then uses this master list to ascertain the transaction data that must be undone on
its backup database. For example, if a network transaction involved only four of the ten
primary systems in an RDF network, then that transaction only needs to be undone on
the backup databases where that data was replicated. Because the other systems were
not involved, the transaction does not need to be listed there. The list of transactions
that need to be undone on a specific system resides in its ZNETUNDO file.
Takeover Phase Two Performance
The speed with which a takeover completes for an entire RDF network varies based on
the number of systems in the network and how far any system had fallen behind when
the takeover was initiated.
For example, if you have three systems in your RDF network, and all extractors on all
three systems were keeping up with audit generation on their systems, and then one
system fails, the takeover operations may only take a modest number of additional
seconds to complete phase 2 takeover processing.
In contrast, if you have three systems in your RDF network, and one extractor had fallen
60 minutes behind at the time its system went down, then phase 2 takeover processing
on the other two systems will take many more seconds to complete. The reason for this
is that phase 2 processing on the two systems that were not behind will have to go
through 60 minutes of data to determine what must be undone due to data missing on the
system that had fallen behind.
A variation of the first example is that no extractors have fallen behind, but you have 25
systems in your RDF network. In such a case, phase 2 processing may take many
additional seconds because data must be checked for so many different systems in order
to determine what network data might be missing from the various systems in the RDF
network.