Availability Guide for Application Design
Availability Guide for Application Design—525637-004
2-1
2
Overview of Server and Network
Fault Tolerance
This section provides an overview of the fault-tolerant features of the hardware and
system software environment in which the user's continuously available application
executes. After introducing the concept of fault tolerance, it talks about fault tolerance
on the server, the network, and in the client system. For technical details about a
specific HP server, refer to the corresponding server description manual.
The HP NonStop Systems Introduction Manual supplies a more detailed introduction to
all the components of an HP NonStop server.
This section does not describe the entire environment in which the application runs.
Other HP products are usually involved, such as Transaction Management Facility
(TMF), NonStop Transaction Services/MP (NonStop TS/MP), NonStop Structured
Query Language/MP (NonStop SQL/MP), Remote Duplicate Database Facility (RDF),
or Remote Server Calls/MP (RSC/MP). The roles of these products are discussed in
later sections of this guide.
What Is Fault Tolerance?
Traditionally, fault tolerance has meant that a system or network is able to continue
normal operation if one hardware or software component fails.
Even with this more limited definition, server fault tolerance is only one part of
continuous application availability. It provides a critical base on which to build
additional tools and applications.
The server system will continue to operate in the event of any hardware malfunction,
such as a downed processor, a failed controller, a broken or disconnected cable, or a
broken power supply; or in the event of any system software failure, such as an
aborted I/O process or a corrupted system data structure.
In the unlikely event that a second failure occurs before the first is fixed, continuous
system operation is not guaranteed (except in the case of a TNS/E triplex
configuration). You protect your application against such an outage by using the
products and techniques described in other sections of this manual. These optional HP
products include the Remote Duplicate Database Facility (RDF), and the techniques
include designing additional safeguards into your application by use of transaction
protection or instrumentation.
If a single failure does occur, the HP server continues to operate and the application
remains available. However, the server is no longer fault-tolerant because an additional
component failure could bring the server down. Following online repair of the failed
component, the server is once again fault-tolerant.