RDF System Management Manual for J-series and H-series RVUs (RDF 1.10)

ManualsBrandsHP ManualsServerHP Integrity NonStop J-Series

131

132

133

134

135

136

137

138

139

140

Therefore, taking an online dump before resuming business operations is important, but when

do you do it? If you wait until after the RDF takeover operation has completed, then it could

take many hours before the online dumps complete, and only then would it be safe to resume

business operations. Thus, not taking regular online dumps of your backup database can lead

to a significant length of time before you can safely resume business operations on your backup

system. If, however, you take regular online dumps of your backup database as well as take

audit dumps, then you can start business operations as soon as the RDF takeover operation

completes and you will have full TMF protection. For more details see the discussion on “TMF

and Online Dumps on the Backup System” (page 145).

4. Most customers require a high-level decision to takeover on the backup system; this is not an

automated decision for the majority of RDF users; most require an executive level decision to

takeover.

a. Make sure your system operators have a hierarchical list of who to contact in case of the

loss of your primary system. This will save time in getting executive authorization and

initiating the takeover on the backup

b. Determine in advance what constitutes a failure of the primary system so that the process

for escalation to the executive decision level can be started as soon as possible.

c. If the criteria for determining a failure is complicated, write it down; the last thing you

want to do during a real disaster time is to try to remember everything.

5. Have a solid disaster recovery plan in place that covers all the different tasks that need to be

done in order to switch operations to the backup. Write out all of your disaster recovery plans

to avoid having to recall them from memory when your anxiety is already high as a result of

the unplanned outage.

6. If you have command and control files on your primary system that are copied to your backup

as part of your RDF set up, be sure you revise these on your backup system to reflect the

hardware and software configurations on your backup system. For example, if you have a

fewer CPUs on your back up system, be sure that your command and control scripts do not

contain references to CPUs that exist on the primary but not on the backup. Be sure you have

replaced all references to the name of the primary system with the name of the backup system

in command and control files.

7. RDF does provide the “!” option on Takeover command. If specified, it eliminates the user

prompt and it eliminates the check to reach the primary system, thereby eliminating the Expand

level-4 timer. But, before you use this option, you should consider the following points:

a. How do you know if the primary system is down? By having RDFCOM check to see if

the primary system is accessible, you avoid starting a takeover operation by mistake.

While the check does involve the Expand level-4 timer wait (5 minutes by default), you

should not lower that timer because it can have many other side-effects that you do not

want.

b. For the majority of RDF users, issuing the RDF Takeover command is neither an automated

operation nor is it typically executed without high-level approval, often executive level.

Depending on the time of day or evening when the outage occurs, it may take much more

time than the level-4 timer delay, thereby making that delay inconsequential.

8. For the typical SQL requestor-server environment, you can start servers on both primary and

backup systems at all times, but you must ensure that ensure no work is ever routed to servers

on the backup system.

a. SQL files are only opened on demand; hence by having your servers up and running on

your backup system at all times, you can avoid the time it takes to start them when you

encounter a takeover or switchover situation. It does mean that when you eventually route

work to these servers after a takeover or switchover, it may take time to have them open

up the backup database, but you avoid the cost of a cold startup.

b. Similarly, start Pathway servers and freeze them before they open any files; this eliminates

having to cold-start them after a takeover or switchover.

136 Critical Operations, Special Situations, and Error Conditions