RDF System Management Manual for J-series and H-series RVUs (RDF 1.10)
Therefore, taking an online dump before resuming business operations is important, but when
do you do it? If you wait until after the RDF takeover operation has completed, then it could
take many hours before the online dumps complete, and only then would it be safe to resume
business operations. Thus, not taking regular online dumps of your backup database can lead
to a significant length of time before you can safely resume business operations on your backup
system. If, however, you take regular online dumps of your backup database as well as take
audit dumps, then you can start business operations as soon as the RDF takeover operation
completes and you will have full TMF protection. For more details see the discussion on “TMF
and Online Dumps on the Backup System” (page 145).
4. Most customers require a high-level decision to takeover on the backup system; this is not an
automated decision for the majority of RDF users; most require an executive level decision to
takeover.
a. Make sure your system operators have a hierarchical list of who to contact in case of the
loss of your primary system. This will save time in getting executive authorization and
initiating the takeover on the backup
b. Determine in advance what constitutes a failure of the primary system so that the process
for escalation to the executive decision level can be started as soon as possible.
c. If the criteria for determining a failure is complicated, write it down; the last thing you
want to do during a real disaster time is to try to remember everything.
5. Have a solid disaster recovery plan in place that covers all the different tasks that need to be
done in order to switch operations to the backup. Write out all of your disaster recovery plans
to avoid having to recall them from memory when your anxiety is already high as a result of
the unplanned outage.
6. If you have command and control files on your primary system that are copied to your backup
as part of your RDF set up, be sure you revise these on your backup system to reflect the
hardware and software configurations on your backup system. For example, if you have a
fewer CPUs on your back up system, be sure that your command and control scripts do not
contain references to CPUs that exist on the primary but not on the backup. Be sure you have
replaced all references to the name of the primary system with the name of the backup system
in command and control files.
7. RDF does provide the “!” option on Takeover command. If specified, it eliminates the user
prompt and it eliminates the check to reach the primary system, thereby eliminating the Expand
level-4 timer. But, before you use this option, you should consider the following points:
a. How do you know if the primary system is down? By having RDFCOM check to see if
the primary system is accessible, you avoid starting a takeover operation by mistake.
While the check does involve the Expand level-4 timer wait (5 minutes by default), you
should not lower that timer because it can have many other side-effects that you do not
want.
b. For the majority of RDF users, issuing the RDF Takeover command is neither an automated
operation nor is it typically executed without high-level approval, often executive level.
Depending on the time of day or evening when the outage occurs, it may take much more
time than the level-4 timer delay, thereby making that delay inconsequential.
8. For the typical SQL requestor-server environment, you can start servers on both primary and
backup systems at all times, but you must ensure that ensure no work is ever routed to servers
on the backup system.
a. SQL files are only opened on demand; hence by having your servers up and running on
your backup system at all times, you can avoid the time it takes to start them when you
encounter a takeover or switchover situation. It does mean that when you eventually route
work to these servers after a takeover or switchover, it may take time to have them open
up the backup database, but you avoid the cost of a cold startup.
b. Similarly, start Pathway servers and freeze them before they open any files; this eliminates
having to cold-start them after a takeover or switchover.
136 Critical Operations, Special Situations, and Error Conditions










