COBOL Manual for TNS/E Programs (H06.08+, J06.03+)

The information in these topics is general. For specific details on checkpointing, see the Guardian
Programmer’s Guide. For details on the STARTBACKUP and CHECKPOINT statements, see
STARTBACKUP (page 460) and CHECKPOINT (page 302).
Process Pairs
A process is the basic executable unit known to the operating environment—the execution (in a
processor) of a program. Specifically, the term program indicates a group of instruction codes and
initialized data—an HP COBOL run unit; the term process denotes the changing states of an
executing program. The same loadfile can be executing concurrently a number of times, but each
execution is a separate process.
An application process can be designed to recover from any type of hardware failure except
one—a failure of the processor in which it is executing. One way to provide fault tolerance is to
establish the process as a process pair. A process pair consists of two executions of the same
loadfile: the primary process executes in one processor; the backup process executes in another.
Control in the program indicates whether the process is executing in the primary mode to perform
its task or in the backup mode to monitor the primary process.
Figure 43 Process Pair
In this primary-plus-backup structure, the fault-tolerant facility (as directed by the primary process)
keeps the backup process informed of the executing state of the primary process. At critical points
in the processing, the primary process sends checkpoint messages to the backup process to pass
the current state of the data, the file buffers, and the files to the backup process. When the backup
process learns of the failure of its primary process (by the receipt of a process-failure or
processor-failure system message through $RECEIVE), the backup process becomes the primary
process and continues with the application’s work (possibly starting a new backup process for
itself).
The fault-tolerant facility provides the means of writing application programs that can recover from
a processor module failure. When the primary process executes a STARTBACKUP statement, a
fault-tolerant facility routine in the primary process directs the operating environment to start the
backup process.
When the primary process executes a CHECKPOINT statement, a fault-tolerant facility routine
transmits pertinent data to the backup process. While the primary process is operating, a
fault-tolerant facility routine in the backup process automatically monitors and accepts checkpoint
information from the primary process. If the backup process is notified of the failure of its primary
process, the fault-tolerant facility causes the backup process to begin executing at the statement
following the latest CHECKPOINT statement. (The notification to the backup process of the failure
of the primary process comes in the form of a processor-down, stop, or abend message delivered
through $RECEIVE and handled automatically by the HP COBOL fault-tolerant facility.)
946 Fault-Tolerant Processes