Availability Guide for Problem Management
Introduction to Problem Management
Availability Guide for Problem Management–125509
1-4
Measuring Outages
If a transient error in the workstation makes the application unavailable to 1 user for 5
minutes, it counts as 5 user minutes of downtime. If the problem on the server makes the
application unavailable for 15 minutes to 100 users, it counts as 1500 user minutes of
downtime.
The correct way to measure an outage affecting a batch program varies from one
application to another. If the batch program is considered a major user, it should be
weighted more heavily than single-transaction users. Conversely, you could argue that
the batch program should carry less weight because you can easily restart it, and its
completion time is not critical.
Alternative Ways to Measure Downtime
Of course, you might choose to measure downtime in ways other than user outage
minutes, depending on your specific business needs. For example, a site might be
obligated to pay penalties for each transaction that is not processed while the application
is down. Such a site might supplement its measure of downtime as follows:
To measure the number of transactions that would have been processed during an
outage, the site keeps a record of the number of transactions it normally processes by
minute and by day of the week. If an outage occurs, for example, at 10:00 a.m. on
Tuesday and lasts for 15 minutes, the site can calculate the average number of
transactions that would normally be processed during that period. Subsequently, the site
pays a corresponding penalty to its customer.
Using this method leads to significantly different outage costs depending on the time of
day and the day of the week. An hour-long outage at 2:00 a.m. on Monday might carry a
negligible penalty when compared with a 15-minute outage at 5:00 p.m. on a Friday.
The High Cost of Downtime
We have come to expect the continuous operation of many basic services such as water,
power, and telephone. Computers have opened the door to additional 24x7x365 services.
Consumers now expect to access cash from automated teller machines (ATMs) and to
purchase goods with credit cards 24 hours a day. For MIS departments, every employee
in the enterprise is a customer. To keep productivity high, employees also demand
24x7x365 access to enterprise databases, communications systems (such as electronic
mail), and reliable client/server applications.
Offering such services around the clock requires computer and network systems that are
available all of the time. The cost of downtime, for even a few minutes, can be dramatic
in terms of lost revenue, lost consumer confidence, and lost productivity. Examples to
consider include:
•
When an airline’s reservation system went down, thousands of travel agents had to
book flights manually. Estimated revenue impact from lost reservations (or
reservations made with other airlines) amounted to $36,000 per minute.