Availability Guide for Application Design

What Is Application Availability?

Availability Guide for Application Design—525637-004

1-6

Measuring Downtime in Minutes

it does break, it takes no time at all to fix it. The computer industry traditionally uses a

percentage to represent this value.

For example, suppose that over a period of 10,000 minutes, an application has one

outage that takes 100 minutes to repair:

Uptime = 9,900 minutes

Repair time = 100 minutes

Availability = 9,900/(9,900 + 100), or 99%.

If, over the same period, the application had 100 outages taking an average of one

minute to repair:

System uptime (the mean average for which the system is up) = 9,900/100 = 99

minutes

Repair time = 1 minute

Availability = 99/(99 + 1), or 99%

So, a 100-minute outage is the equivalent of 100 one-minute outages.

Although availability is traditionally expressed as a percentage as shown above, HP

prefers to talk about the number of minutes for which an application is unavailable.

Measuring Downtime in Minutes

A couple of decades ago, it was meaningful to talk about a computer system being

available 75 percent of the time. Today, however, reliability standards have increased

to the point at which you might need to talk about a computer system being available

99.9 percent of the time and compare it with another system that is available

99.99 percent of the time. These availability rates both seem pretty good, but you need

to think of them a different way.

Consider the same two computer systems in terms of the number of outage minutes.

One system is unavailable 500 minutes a year, and the other 50 minutes a year. These

values are much more meaningful in a world in which the costs of application downtime

are usually measured in cost per minute.

In addition, measuring downtime in minutes makes it easier to understand the benefits

of automated problem resolution. For example, suppose one of your service-level

objectives is to keep downtime to less than 50 minutes per year. If it takes, on average,

5 minutes to manually correct an outage, then your application can tolerate 10 outages

per year, or an average of about one outage every 5 weeks. Given that a fully

automated solution to a problem can be accomplished, typically, 20 times faster than a

manual solution of the same problem, it follows that you can tolerate up to 200 outages

each year using fully automated solutions, or about one outage every 1.5 to 2 days,

and achieve the same goal.

Measuring Downtime in a Client/Server Application

For client/server types of applications it is useful to take measuring downtime a step

further and express it as the number of user outage minutes. A failure in the client part