RDF System Management Manual for H-Series RVUs (RDF 1.8)

ManualsBrandsHP ManualsServerHP Integrity NonStop J-Series

271

272

273

274

275

276

277

278

279

280

Communication Failures During Phase 3 Takeover Processing

If one RDF subsystem is unable to reach the backup system of another RDF subsystem during

phase 3 processing, phase 3 processing stalls until the communication line comes back up. This

can lengthen the overall duration of takeover operations on all backup systems. Should this type

of stall occur, the RDF subsystem issues an event message alerting operators to the situation.

Takeover Delays and Purger Restarts

During phase 3 purger work, the network master needs information from the other purger

processes in the RDF network, and, during the latter part of phase 3 processing, the non-network

master purgers need information from the purger of the network master. When a purger process

is waiting for information from another purger, it waits for up to 60 seconds, during which time

it does not respond to certain requests (such as STATUS RDF). After a purger has waited 60

seconds, it quits the operation and restarts. This allows the purger to read the $RECEIVE file,

respond to messages that have been waiting for replies, and then retry phase 3 processing.

Takeover Restartability

As has always been the case, the RDFCOM TAKEOVER command is restartable. Therefore, if a

takeover operation terminates prematurely for any reason on any system in an RDF network, it

can be restarted.

Takeover and File Recovery

When a takeover operation completes in an RDF network environment, the purger logs two

events: one reports a safe MAT position (indicating that all committed data up to that location

was successfully applied to the backup database), and the second (888 or 858) reports whether

or not a File Recovery position is available for use on the primary system. The RDF event 888

reports that a File Recovery is available and it includes the exact sno and rba to be used for a File

Recovery operation on the primary system. If, however, “kept-commits” have been encountered

during phase 2 processing, a File Recovery position is not available; this is reported in RDF event

858. This last situation will never occur in an RDF/ZLT environment because a File Recovery

position is always available with RDF/ZLT.

If an RDF event 888 is reported, then the specified File Recovery position is based on both phase

1 and phase 3 processing. Each system logs its own File Recovery position. While that position

can differ from one backup system to the next, the logged position for any single system is correct.

If you supply the returned File Recovery position to the TMF file recovery process on the primary

system, the process recovers the files on the primary database up to that point. If you use File

Recovery to a MAT position on all primary systems in the RDF network, in each case using the

returned File Recovery positions, then your primary distributed database will be consistent

across the RDF network.

You would use the File Recovery position with File Recovery in several situations: Assume you

have had an outage of your primary system, you have executed the RDF takeover operation on

your backup system, and you have resumed business transactions on your backup system.

Assume further that the former primary system has been repaired, it is back online, and you

want to switch your business transactions from the active backup database back to the former

primary database. To do so, you merely execute a planned RDF switchover from the backup to

the newly restored primary.

The problem with doing a planned switchover from backup to primary after an RDF takeover

operation is that some transactions might have committed on the primary system immediately

prior to the unplanned outage, and the outage brought down the extractor before it could send

that data to the backup system. In such a case, when you bring the primary system back up the

two databases are no longer synchronized because the primary database contains committed

transactions that are not in the backup database. Such transactions cannot be recovered.

280 Network Transactions