FORTRAN Reference Manual

Fault-Tolerant Programming
FORTRAN Reference Manual528615-001
16-11
Checkpointing Large Amounts of Data
For example, the following code shows how you might checkpoint a large array A
consisting of 100,000 bytes. A is allocated in extended memory:
DIMENSION A(100000)
10 CHECKPOINT (STACK='YES') global-data <-- Establish a
takeover pt
15 CONTINUE
...
DO 20 I=1,4
20 CHECKPOINT (STACK='NO') part-of-array-A < -- Do not
establish a
takeover pt
30 CHECKPOINT (STACK='YES') global-data <-- Establish a
takeover pt
Execution of the previous code proceeds as follows:
1. The CHECKPOINT statement at label 10 establishes a takeover point prior to
checkpointing the array A.
2. The CHECKPOINT statement at label 20—which is in a DO loop—transmits data
but does not establish a takeover point because it specifies STACK='NO'.
3. The CHECKPOINT statement labeled 30, following the DO-loop, establishes a new
takeover point because it specifies STACK = 'YES'.
If the primary fails at any point after executing the CHECKPOINT statement labeled 10
but before executing the CHECKPOINT statement labeled 30, the backup process
takes over at the CONTINUE statement labeled 15.
If the CHECKPOINT statement at label 20 had specified STACK = 'YES' and a failure
occurred before all of array A was transferred to the backup process, some of the
values in array A in the backup process would be left over from a previous checkpoint
of array A, but some of the values might be from the current transfer of array A.
For additional information on fault-tolerant processing, see the Guardian Programmers
Guide.