Availability Guide for Problem Management

Introduction to Problem Management
Availability Guide for Problem Management125509
1-2
What Is an Outage?
What Is an Outage?
In general terms, an outage is a period of time during which a system cannot perform
useful work. From an end-user’s perspective, an outage is any period of time during
which an application is not available.
There are two types of outages: planned and unplanned.
Planned Outages
A planned outage is system or application downtime that is planned or scheduled.
Planned outages are used to perform system software upgrades, applications changes,
and other tasks that must be done offline. The Availability Guide for Change
Management describes how to reduce or eliminate planned outages.
Unplanned Outages
An unplanned outage is the time in which the application or system becomes
unavailable to the end user because of a problem situation such as faulty hardware,
operator error, disaster, and so forth. This manual describes how to reduce, eliminate,
and quickly recover from unplanned outages.
Outage Classes
Tandem classifies outages according to their causes, as follows:
Physical
Design
Operations
Environmental
Reconfiguration
The first four outage classes describe unplanned outages. The reconfiguration outage
class includes all planned outages. Section 2, “Preventing Unplanned Outages,
describes the first four, unplanned outage classes.
Measuring Outages
Tandem believes that you should measure availability from the end-user’s perspective.
For example, it is not enough simply to record that a certain hardware or software
component has failed; you must also consider the user’s ability to access affected
services, the degradation in quality of service provided, and the acceptability of the
response time to the user.