Availability Guide for Application Design

What Is Application Availability?
Availability Guide for Application Design525637-004
1-6
Measuring Downtime in Minutes
it does break, it takes no time at all to fix it. The computer industry traditionally uses a
percentage to represent this value.
For example, suppose that over a period of 10,000 minutes, an application has one
outage that takes 100 minutes to repair:
Uptime = 9,900 minutes
Repair time = 100 minutes
Availability = 9,900/(9,900 + 100), or 99%.
If, over the same period, the application had 100 outages taking an average of one
minute to repair:
System uptime (the mean average for which the system is up) = 9,900/100 = 99
minutes
Repair time = 1 minute
Availability = 99/(99 + 1), or 99%
So, a 100-minute outage is the equivalent of 100 one-minute outages.
Although availability is traditionally expressed as a percentage as shown above, HP
prefers to talk about the number of minutes for which an application is unavailable.
Measuring Downtime in Minutes
A couple of decades ago, it was meaningful to talk about a computer system being
available 75 percent of the time. Today, however, reliability standards have increased
to the point at which you might need to talk about a computer system being available
99.9 percent of the time and compare it with another system that is available
99.99 percent of the time. These availability rates both seem pretty good, but you need
to think of them a different way.
Consider the same two computer systems in terms of the number of outage minutes.
One system is unavailable 500 minutes a year, and the other 50 minutes a year. These
values are much more meaningful in a world in which the costs of application downtime
are usually measured in cost per minute.
In addition, measuring downtime in minutes makes it easier to understand the benefits
of automated problem resolution. For example, suppose one of your service-level
objectives is to keep downtime to less than 50 minutes per year. If it takes, on average,
5 minutes to manually correct an outage, then your application can tolerate 10 outages
per year, or an average of about one outage every 5 weeks. Given that a fully
automated solution to a problem can be accomplished, typically, 20 times faster than a
manual solution of the same problem, it follows that you can tolerate up to 200 outages
each year using fully automated solutions, or about one outage every 1.5 to 2 days,
and achieve the same goal.
Measuring Downtime in a Client/Server Application
For client/server types of applications it is useful to take measuring downtime a step
further and express it as the number of user outage minutes. A failure in the client part