RDF System Management Manual for J-series and H-series RVUs (RDF 1.10)

12. Test Your Switchover/Takeover Procedures
You may not know whether you have everything you need on your backup system to move
business operations from your primary system to your backup until you perform that task. If
you wait until you actually encounter a disaster and must move business operations to the
backup system, you may find that you are missing important items that you need. Therefore,
the best way to be certain is to perform either a takeover or switchover operation in order to
resume business operations on your backup system. Do it when you can schedule down time
or do it during periods of low activity. Since a lot can change over the course of a year, it is
a standard disaster recovery practice that you perform this exercise at least once a year. The
age old adage is "practice makes perfect", and this certainly applies here. An annual test run
can mean a considerable difference between a lengthy RTO versus a rapid RTO. The latter
is always the goal, so one test a year is a small price to pay for the assurance that you have
everything you need and that your switch from the primary to the backup goes as smoothly
as possible. Secondly, practicing the movement of business operations from your primary
system to your backup promotes faster and smoother switchover operations when you need
to take down your primary system to perform software or hardware upgrades.
NOTE: A common myth in the data replication arena is "an Active-Active environment is
what I want, then my takeover and switchover testing is easy", and a myth this is. For most
NonStop users, the hardest part of switching from the primary to the backup is dealing with
the communications switching. Active-Active requires dealing with the same issues up front in
order to set up an Active-Active environment in the first place, and a switchover operation
involves the same issues for being able to route all work to one side or the other.
Some suggestions for how to set up your test are as follows:
Down CPUs 0 and 1 on your primary system to simulate an unplanned outage
Execute the RDF Takeover operation
Execute the various scripts you have to resume business operations on your backup system
Test your applications against your backup database
When you have finished your testing, clean up the backup database
Either make your backup system your new primary, or switch business operations back
to your original primary system.
13. Suggestions for cleaning up your backup database after a test.
a. If your database is small, you might just resynchronize it from your primary system.
b. If synchronization is not an option, then you can use the TMF Recover Files to Time facility
by observing the following steps:
After completion of the RDF takeover operation and before starting your testing,
create an audited Enscribe file and take note of the system time.
Perform your test, including running your applications with test data.
Verify that all is running correctly
Stop your applications
If your testing involved unaudited files, restore these to their pre-test state.
Execute a TMF Recover Files to Time operation, specifying the timestamp obtained
above after creating the Enscribe file.
If you had brought your primary system down after stopping your applications, then
you are ready to reinitialize RDF and restart it to run from primary to backup
If you had practiced an actual unplanned takeover while your applications were
running, then read the subsequent section on Restoring the Primary System.
138 Critical Operations, Special Situations, and Error Conditions