COBOL Manual for TNS and TNS/R Programs

Fault-Tolerant Processes
HP COBOL Manual for TNS and TNS/R Programs—522555-006
32-6
Checkpointing
This sequence of actions occurs when a process pair runs:
1. The primary process opens any files required for its execution.
2. The primary process starts its backup process in another processor module by
executing a STARTBACKUP verb.
This action also opens the files for the backup process and checkpoints the state of
the primary process to the backup process. A process pair opens files in a manner
that permits both members of the pair to have a file open while retaining the ability
to exclude other processes from accessing a file. When a disk file has been
opened by a process pair in this manner, a record or file lock by the primary
process is also an equivalent lock by the backup process.
3. The backup process, at the beginning of its execution, automatically begins
monitoring the primary process. This is the extent to which the backup process
executes unless a failure of the primary process occurs.
4. The primary process begins executing its main processing loop. At critical points
through the execution loop, typically before each write to a disk file, the primary
process executes a CHECKPOINT statement to copy part of its environment and
pertinent file control information to the backup process (marking a restart point for
the backup process). Typically, a program contains several CHECKPOINT
statements, each of which checkpoints only a portion of the primary process’s
environment.
5. If the primary process fails, the backup process begins executing at the restart
point indicated by the latest execution of a CHECKPOINT statement. The backup
process is then considered to be the primary process.
6. If the reason the primary process failed was a processor failure (that is, the backup
process received a processor-down message), the fault-tolerant facility in the new
primary (former backup) process automatically starts a new backup process when
the failed processor has been repaired and brought back on line. This new backup
process is then ready to take over if the primary process fails.
Checkpointing
When the primary process executes a CHECKPOINT statement, one of its fault-
tolerant facility routines formats a message containing the information to be
checkpointed and sends it to the backup process in the form of an interprocess
message. A fault-tolerant facility routine in the backup process receives and acts upon
the message.
The two types of information you must usually checkpoint are data items and sync
blocks.