Availability Guide for Application Design

Availability Through Process-Pairs and Monitors
Availability Guide for Application Design525637-004
7-2
When to Use Process Pairs
You can find programming guidelines for active backup process pairs in the Guardian
Programmer’s Guide. For passive backup, refer to the appropriate procedure call
descriptions in the Guardian Procedure Calls Reference Manual.
When to Use Process Pairs
If the transaction monitor and transaction management facilities will not work for your
application, you can consider writing your own process pair. The effort involved in
designing a process pair varies considerably depending on what you want to do.
The following uses for process pairs are relatively easy to design:
A process that must always run, but does not need to maintain context. Such a
process does no continual checkpointing or updating of state information in the
backup process, but simply takes over in an initialized state. (Some information,
however, might need to be copied from the primary process to the backup process
to put the backup in the initialized state.) Restart is fast because it is not necessary
to reinitialize the process state. These processes are referred to as initialized
persistent processes.
A process whose job it is to make sure that other processes are always running
and, to do so, must always be available. Such a process is often referred to as a
process-monitor pair.
A single-threaded process that performs only operations that can be tried again
when it is not known whether the first attempt succeeded, or operations that can be
made retryable by use of synchronization blocks. (Refer to Section 2, Overview of
Server and Network Fault Tolerance, for a discussion on synchronization.)
Processes that access data in Enscribe data files are a good example.
Other, more complex designs are also possible. Be aware, however, that such designs
can be very difficult to program and test and often require a level of skill equivalent to
that of a systems programmer. Such designs include:
Multithreaded processes, especially those that share resources. These designs get
complex when shared resource information gets checkpointed or updated in the
backup process.
Subsystems that do not support synchronization blocks; in other words, there is no
mechanism for making operations retryable. Most communications subsystems fall
into this category.
Applications that require a lot of cleanup after takeover. Such cleanup includes
figuring out what other threads were doing when the primary process failed; rolling
back nonretryable operations, establishing which nowait I/O operations were still
pending in other threads of the primary process, and so on.