Availability Guide for Problem Management

Auditing Systems for Fault Tolerance
Availability Guide for Problem Management125509
7-8
Configuring Your Software for Fault Tolerance
Configuring Your Software for Fault Tolerance
Fault tolerance requires that all programs—the operating system as well as individual
application programs—contribute to the reliability and recoverability of a process if a
failure occurs. Therefore, your software should also be audited for fault tolerance. You
can ensure that your software configuration is fault tolerant by:
Using Pathway or the NonStop Transaction Manager/MP (TM/MP) to achieve
application fault tolerance
Testing applications for graceful recovery
Following Tandem recommendations
Using process pairs
Using persistent processes
Preventing external data communications failures
Using Pathway to Achieve Application Fault Tolerance
The Pathway transaction-processing system is a collection of processes and files
designed to facilitate the development and management of OLTP applications. Pathway
provides programs and an operating environment to help you develop and run reliable,
manageable, and cost-effective OLTP applications.
At the heart of every application in the Pathway transaction-processing environment is
the PATHMON process, which supports the server part of the application. It provides
linkage with the requester in addition to monitoring the server to ensure that the server
process is always running. PATHMON can be configured as a process pair to ensure that
monitoring continues even if the primary PATHMON process fails.
Using the NonStop Transaction Manager/MP (TM/MP) to Achieve
Application Fault Tolerance
Using NonStop Transaction Manager/MP (TM/MP) software, you can build fault
tolerance into your application by grouping operations into transactions. A transaction is
defined as a transformation of the database from one state to a new state with the
following attributes:
Either all changes take effect or none take effect.
The effect survives failures.
A correct transaction is performed.
The transaction is unaffected by concurrent transactions.
Changes to the database may be written asynchronously from the transaction and, in
fact, may occur at a much later time. NonStop TM/MP uses process-pairs to recover
from processor or system failures. When a transaction updates data on multiple nodes,
NonStop TM/MP uses a two-phase commit protocol to ensure consistency among all
nodes.