Availability Guide for Application Design

What Is Application Availability?
Availability Guide for Application Design525637-004
1-8
Outage Classes
classes and brings to your attention the need to consider outages in parts of the
network in addition to the server.
Outage Classes
Outages fall into the following classes:
Physical
Design
Operations
Environment
Reconfiguration
The first four classes listed above are recognized throughout the computer industry.
Reconfiguration is an outage class added by HP; other computer vendors call this
phenomenon scheduled downtime. HP refers to scheduled downtime as a
reconfiguration fault.
When designing your application you must consider all five classes. The degree to
which software and hardware levels below your application are continuously available
affects the availability of the application. Hence, you need to consider possible physical
and environmental outage classes and outages due to system software and hardware
design classes.
Even if the system under your application offers zero minutes of downtime every year,
you still need to be aware of potential outages due to application design faults.
The following paragraphs provide details on each outage class and point out elements
of NonStop systems that help prevent such outages.
Physical Outages
A physical outage is caused by a hardware failure. While modern technologies make
hardware components that rarely fail as a result of a hard (or deterministic) fault, soft
(or transient) faults remain more common throughout the industry.
Some potential physical outages on a NonStop system include both halves of a
mirrored disk failing or a faulty processor power supply. Error correcting codes, parity,
and processor duplication are only some of the techniques used in NonStop hardware
component design to keep component outages to a minimum. Redundant hardware
modules and redundant paths between components enable the operating system to
isolate defective components and continue operation. Section 2, Overview of Server
and Network Fault Tolerance, provides details.
The most significant potential result of a hardware failure is data corruption because,
on some vendors’ systems, it can take days of downtime to recover a database.
Installations that cannot tolerate such a long recovery period under any circumstances
keep a remote duplicate database as described in Section 4, Data Protection and
Recovery.