FORTRAN Reference Manual

Fault-Tolerant Programming
FORTRAN Reference Manual528615-001
16-9
Checkpointing File Status Information
The preceding code is easier to understand if you consider the following:
A process, A, that opens a second process, B, and sends requests to B is, by
definition, a requester.
A process, C, that receives requests for services from another process, B, is, by
definition, a server.
A process is often both a requester and a server. In the text that follows, you will
see that the preceding code is a server, for example B, to requesters, A. As a
server, the preceding code receives requests from $RECEIVE. In addition, the
preceding code is, itself, a requester, B, to another server, C. As a requester, the
preceding code opens a disk file, EMPLOYEE, and sends requests to the disk
process, which, in this context, is a server.
There is no fundamental end to such a list. An application can have numerous
processes that each, in turn, act as both requester and server.
Server processes can support fault tolerance in two senses.
A server can run as a NonStop process so that the server can continue running,
even if its primary process fails.
A server can support NonStop requester processes, so that if the requester’s
primary process fails, the server correctly processes duplicate requests that it
receives from the requester’s backup process following the takeover by the
backup.
NonStop Server Processes
When a server’s primary process fails, its backup begins executing your program at the
FORTRAN instruction that immediately follows the last stack checkpoint. This could be
after a FORTRAN CHECKPOINT, OPEN, or CLOSE statement or after any I/O
statement that implicitly opens a unit.
Managing $RECEIVE
The FORTRAN run-time library in the former backup, now the primary, recognizes that
a takeover has occurred and discards each message that it has read from $RECEIVE
but to which it has not yet replied. If the old primary failed before it wrote its reply to
$RECEIVE, the request failed, and the file system automatically redirects the request
to the new primary (assuming the server was opened with SYNCDEPTH greater than
zero). If the old primary failed after it wrote its reply to $RECEIVE, that request was
complete and nothing more need be done. (The file system redirects a requester’s
messages only a limited number of times, typically less than three.)
The new primary then loops back to read a request from $RECEIVE: either the same
request that the old primary was processing when it failed or a new request.