Availability Guide for Application Design

Overview of Server and Network Fault Tolerance
Availability Guide for Application Design525637-004
2-8
Fault Isolation
interruption. If a disk is repaired, its contents are updated once it is reintegrated into
the server system. The update takes place concurrent with other system and
application activity so that there is no interruption of the application.
Failure of a mirrored disk is an exceptionally rare occurrence. However, the HP
architecture does allow for even higher levels of fault tolerance by making it possible to
configure four disks that are mirrors of each other. Such a configuration can tolerate
the failure of three out of the four disks before the failure affects the operation of the
server system.
Switchable Disks
The availability of disk files (whether mirrored or not) can be increased by using a disk
subsystem, such as the Nomadic Disk product, that can be switched between multiple
systems. If the entire system providing primary access fails or must be taken down for
a planned outage, the disks can be switched to an alternate system to provide
alternate access until such time as the primary system can be brought back up. A
switchable disk subsystem also improves the portability of files when upgrading
systems or redistributing portions of applications among systems.
Fault Isolation
The ability to isolate faults within a given hardware or software module is strategic in
keeping HP NonStop servers running. By taking down the errant module, the HP
server prevents fault propagation from one module to another. To this end, the server
provides:
A loosely coupled architecture, with independent copies of the operating system
running in each processor
Independent processes and hardware modules
Fail-fast hardware and software
Protection against invalid application operation
Loosely Coupled Architecture and Detection of Processor
Failure
Independent copies of the operating system running in each processor provide fault
isolation. This loosely coupled architecture makes it possible for each processor to
operate as an individual computer that functions autonomously from other processors
in the server. The operating system manages communication between processors over
the S-series or NS-series ServerNet or the K-series interprocessor bus to provide
continuous availability at both the system and application levels. Because each
processor is independent, a fault within a processor can be isolated and prevented
from spreading to the rest of the server or to other components on the network.
The message system in each processor is responsible for keeping all other processors
in the server system informed that it is operating correctly. It carries out this function by