COBOL Manual for TNS/E Programs (H06.08+, J06.03+)
Checkpointing
When the primary process executes a CHECKPOINT statement, one of its fault-tolerant facility
routines formats a message containing the information to be checkpointed and sends it to the
backup process in the form of an interprocess message. A fault-tolerant facility routine in the backup
process receives and acts upon the message.
The two types of information you must usually checkpoint are data items and sync blocks.
Data Items
These are usually file record areas but can be any desired data items in the File Section, the
Working-Storage Section, or the Extended-Storage Section of the Data Division. You must checkpoint
any data items that are part of the program’s state—specifically the disk record that is about to be
written, the terminal or tape record that was just read, and any data that is necessary to resume
processing at the site of the checkpoint statement.
The reason for checkpointing data items is to give the backup process all the information it needs
to reexecute an I-O request if the primary process fails. Usually, you checkpoint a data item just
before writing the data to disk. You can also use data-item checkpointing to eliminate the need for
the backup process to reexecute an I-O request. An example of this is an entry received from a
terminal. You checkpoint the data item received from a terminal by a READ statement immediately
after executing the READ statement to minimize the possibility that the operator has to reenter data.
Sync Blocks
A sync block contains control information about the current state of a disk file (such as the current
value of the file pointers).
The purpose of checkpointing the sync block is twofold:
• To ensure that a write operation is not duplicated when a backup process takes over from its
primary process
• To pass the current file pointers’ values to the file system of the backup processor
When a process executes a checkpoint of a sync block, the operating environment passes the
information in the sync block to the file system of the backup processor. The reason for preventing
duplicate operations is illustrated in Figure 45. In Figure 45, a primary process completes a
sequential write operation (that is, append to end of file) successfully, but fails before a subsequent
checkpoint to its backup process. On the takeover from the primary process, the backup process
reexecutes the operations just completed by the primary process. If the write operation was
performed as requested, it duplicates the record, but at the new end-of-file location.
Figure 45 Duplication in Takeover
To prevent such duplicate write operations by the backup process, you must specify a nonzero
SYNCDEPTH parameter in the OPEN statement. This action allows the file system to record the
completion status of each input-output operation. If the backup process requests an operation
already completed by the primary process, the file system recognizes this condition. Then, instead
948 Fault-Tolerant Processes










