RDF/IMP and IMPX System Management Manual (RDF 1.3+)

ManualsBrandsHP ManualsServerHP NonStop G-Series

321

322

323

324

325

326

327

328

329

330

Network Transactions

Compaq NonStop™ RDF/IMP and IMPX System Management Manual—522204-001

13-7

Communication Failures During Phase Two

Takeover Processing

Communication Failures During Phase Two Takeover Processing

If one RDF subsystem is unable to reach the backup system of another RDF subsytem

during phase 2 processing, phase 2 processing stalls until the communication line comes

back up. This can lengthen the overall duration of takeover operations on all backup

systems. Should this type of stall occur, the RDF subsystem issues an event message

alerting operators to the situation.

Takeover Delays and Purger Restarts

During phase 2 purger work, the network master needs information from the other

purger processes in the RDF network, and, during the latter part of phase 2 processing,

the non network master purgers need information from the purger of the network master.

When a purger process is waiting for information from another purger, it waits for up to

60 seconds, during which time it does not respond to certain requests (such as STATUS

RDF). After a purger has waited 60 seconds, it quits the operation and restarts. This

allows the purger to read the $RECEIVE file, respond to messages that have been

waiting for replies, and then retry phase 2 processing.

Takeover Restartability

As has always been the case, the RDFCOM TAKEOVER command is restartable.

Therefore, if a takeover operation terminates prematurely for any reason on any system

in an RDF network, it can be restarted.

Takeover and File Recovery

When a takeover operation completes in an RDF network environment, the purger logs

two events: one reports a safe MAT position (indicating that all committed data up to

that location was successfully applied to the backup database), and the second (888 or

858) reports whether or not a File Recovery position is available for use on the primary

system. The RDF event 888 reports that a File Recovery is available and it includes the

exact sno and rba to be used for a File Recovery operation on the primary system. If,

however, “kept-commits” have been encountered during phase 2 processing, a File

Recovery position is not available; this is reported in RDF event 858.

If an RDF event 888 is reported, then the specified File Recovery position is based on

both phase 1 and phase 2 processing. Each system logs its own File Recovery position.

While that position may differ from one backup system to the next, the logged position

for any single system is correct. If you supply the returned File Recovery position to the

TMF file recovery process on the primary system, the process recovers the files on the

primary database up to that point. If you use File Recovery to a MAT position on all

primary systems in the RDF network, in each case using the returned File Recovery

positions, then your primary distributed database will be consistent across the RDF

network.

You would use the File Recovery position with File Recovery in situations such as the

following. Assume you have had an outage of your primary system, you have executed

the RDF takeover operation on your backup system, and you have resumed business

transactions on your backup system. Assume further that the former primary system has

been repaired, it is back online, and you want to switch your business transactions from