FORTRAN Reference Manual

Fault-Tolerant Programming
FORTRAN Reference Manual528615-001
16-12
Starting a New Backup Process
Starting a New Backup Process
The following list describes the possible actions of the new primary process—formerly
the backup process—after a takeover from the former primary process as a result of
either a failure or a call to a Guardian routine to stop the process.
If the former primary process called STOP or PROCESS_STOP_ and the START
BACKUP statement did not set bit 13 in its OPTION specifier, the backup process
also stops immediately.
If there have been more than ten takeovers by the backup process, FORTRAN
does not start another backup process, and returns from the CHECKPOINT
statement with BACKUPSTATUS = 5000.
If the START BACKUP statement had bit 11 set (recreate a backup process
immediately after a takeover) in its OPTION specifier, FORTRAN attempts to
create a new backup process in the former primary’s processor. If the takeover was
not caused by a processor failure and FORTRAN cannot start a new backup
process in the former primary processor, FORTRAN terminates your process.
Otherwise, it returns from the CHECKPOINT statement with BACKUPSTATUS =
100 or BACKUPSTATUS = 101.
If the START BACKUP statement did not set bit 11 in its OPTION specifier,
FORTRAN allows the new primary process to run without a backup for a while, and
arranges for the next CHECKPOINT statement to attempt to create a new backup
process in the former primary’s processor. It then returns from the present
CHECKPOINT statement with BACKUPSTATUS = 100 or 101.
If the former primary’s processor failed, FORTRAN does not attempt to create a
new backup process, but only returns from the CHECKPOINT statement with
BACKUPSTATUS = 102.
When the FORTRAN run-time system cannot start a new backup process because the
former primary’s processor is down, the application program must implement one of
the following strategies:
Run without a backup for the remainder of the program’s execution.
Periodically execute a START BACKUP statement on the failed processor. This
could be done every time a CHECKPOINT statement returns BACKUPSTATUS =
1000 (backup CPU down).
Execute a START BACKUP statement specifying a different processor.