COBOL Manual for TNS/E Programs (H06.08+, J06.03+)

32 Fault-Tolerant Processes
A process is running in a fault-tolerant manner when no single point of failure can stop the process
or corrupt its data or the files it is manipulating. Processes are not automatically fault tolerant—they
must be designed and implemented to be fault tolerant.
How might a single point of failure affect a process? Suppose a process is operating an automated
teller machine (ATM) for a financial institution. If you, as a customer, come to the ATM and request
$20 from your account:
You want the ATM to service your request, not terminate before completing the transaction.
You want the ATM to record at most one debit of $20 from your account; the institution wants
the ATM to record at least one debit of $20 from your account.
You want the ATM to dispense at least $20 to you; the institution wants the ATM to dispense
at most $20 to you.
If the process is not fully fault tolerant, a number of possible failures can interfere with the preceding
desires:
A process might be running on a processor that fails (blows a fuse, is accidentally unplugged,
is stopped by an operator, or whatever), so the transaction does not complete.
A process might get part way through your withdrawal transaction, deducting the $20 from
your balance but not yet reaching the point of dispensing the cash to you. If the process is
automatically retried, it might deduct the $20 from your account again and dispense you the
cash; but the balance in your account would then be down by $40.
A process might fail during the transaction after disbursing the $20 to you, but before recording
the fact, and resume at the point of asking what you want. If you again asked for $20, the
process could disburse another $20 (total = $40) and record only a $20 withdrawal.
A process might disburse the $20 cash to you and fail before making a permanent record of
the transaction.
NonStop Operating System
The NonStop operating system architecture is the underlying mechanism that enables you to write
fault-tolerant processes. The full redundancy of processors, devices, controllers, and paths among
them is the basis for the NonStop operating system’s fault tolerance. But given that base, there are
still two ways a process (particularly a Pathway server process operating under TS/MP) can be
designed to be fault tolerant: by using the fault-tolerant facility or by using TMF.
When you have decided which of the two mechanisms to use, you can read more about it in
Fault-Tolerant Facility or TMF.
Introduction to the Fault-Tolerant Facility
NOTE: This topic does not apply to the OSS environment.
To use the fault-tolerant facility, you must include the NONSTOP compiler directive in your
compilation and embed one STARTBACKUP statement and one or more CHECKPOINT statements
at strategic points in your program.
At the beginning of its execution, after opening its files, your process executes a STARTBACKUP
statement to instruct the operating environment to produce a backup process in a different processor
and to open the same files in the backup process. The backup process is loaded from the same
loadfile as the original (primary) process, but the operating environment does not actually start it
running.
944 Fault-Tolerant Processes