FORTRAN Reference Manual
Fault-Tolerant Programming
FORTRAN Reference Manual—528615-001
16-4
Overview of Fault- Tolerant Programs
Overview of Fault- Tolerant Programs
The following actions occur when you run a fault-tolerant program:
•
The primary process opens the initial set of files required for its operation.
•
The primary process starts its backup process in another processor by executing a 
START BACKUP statement. START BACKUP, in addition to starting the backup 
process, sends the backup checkpoint information for files open in the primary 
process. Process pairs open files in a way that permits both members of the pair to 
access the file. For disk files opened in this way, a record lock or file lock specified 
by the primary process is equivalent to a lock by the backup. 
•
The backup process, at the start of its execution, automatically begins monitoring 
the primary process. The backup proceeds no further unless a failure occurs.
•
The primary process begins executing its main processing loop. At critical points in 
the loop (for example, just before write operations to disk files), the primary 
process executes CHECKPOINT statements to send program state and file control 
data to the backup process and establish takeover points for the backup. A 
takeover point is established in the backup process by the most recently executed 
CHECKPOINT statement that does not specify STACK='NO'. OPEN and CLOSE 
statements also establish takeover points in the backup unless you specify STACK 
= 'NO' for those statements.
A program can contain many CHECKPOINT statements. You usually code 
CHECKPOINT statements so as to ensure that logical groupings of data are 
preserved in the backup process. 
For example, you frequently execute a CHECKPOINT statement immediately 
before you execute a WRITE statement so that if the WRITE statement fails, or the 
processor in which your primary runs fails, all the processing up to the point of the 
WRITE statement is preserved in the backup process. If the backup process takes 
over processing, the first statement it executes is the WRITE statement for which it 
has all the information it needs. Here is an example:
Primary process:
...
CHECKPOINT
WRITE(6, 100) r, s
Primary’s processor fails, backup takes over:
CHECKPOINT <-- Backup does NOT re-execute
WRITE(6, 100) r, s <-- Backup begins HERE by re-
 executing the WRITE statement










