Guardian Programmer's Guide

Table Of Contents
Fault-Tolerant Programming in C
Guardian Programmer’s Guide 421922-014
27 - 3
What the Programmer Must Do
If the primary process or CPU fails, the backup process takes over execution from the
failed primary process and becomes the new primary process. First, it creates a new
backup process. (If the failure is caused by CPU failure, the new backup is created
either immediately in another CPU or when the failing CPU is brought back online.)
The backup process then continues application processing at a point indicated in the
state information received from the primary process.
What the Programmer Must Do
When coding a program to run as a process pair, there are several activities you, as
the programmer, need to complete. These include planning tasks, which should be
completed before coding an active backup program, and programming tasks, which
involve the actual coding of an active backup program.
Note that fault-tolerant programs should be designed that way from the outset.
Converting existing programs to run in a fault-tolerant manner can be very difficult,
depending on the structure of the program.
Planning Tasks
Before coding an application to run as an active backup program, do the following:
Develop a strategy for updating state information. You will need to include
statements in your program for providing the backup process with the information it
needs to take over execution if the primary process fails. This state information
accomplishes three things:
Tells the backup process where to take over execution.
Provides critical information about files currently in use by the application.
Provides current data values to the backup process.
You must determine what information to provide and the points in the execution of
the application at which the state information will be updated and at which the
backup can take over execution. Developing an appropriate strategy is vitally
important; errors can result if the backup does not have the correct state
information. Guidelines for developing a strategy for updating state information are
given later in this section under Updating State Information.
Define a communications protocol. You need to provide for passing messages
between the primary and backup processes. The communications protocol
enables the primary to send state information to the backup. It enables the backup
to monitor the primary process and CPU and to receive state information from the
primary process. The communications protocol should use the same message
formats as the operating system uses. HP recommends that you use the Guardian
interprocess communication facility. Guidelines are given later in this section
under Providing Communication Between the Primary and Backup Processes.