Availability Guide for Application Design
What Is Application Availability?
Availability Guide for Application Design—525637-004
1-7
Alternative Ways to Measure Downtime
of the application might affect only one user, but to that user the application is down. A
failure in part of the network could affect several users. A failure in the server, however,
could affect hundreds of users. It is therefore important that an outage in the server be
weighted over an outage in the client.
By expressing downtime in terms of user outage minutes, a one-minute outage in the
client equals one minute of downtime. An outage of one minute in the server, however,
equals one minute times the number of users accessing the server.
Of course, not all user outage minutes are equal. You might need to refine the model
for measuring outage minutes to suit your business needs. For example, a seed
retailing company might have a different sales channel for backyard gardeners than for
commercial market gardeners and farmers. Clearly, a one-minute outage on a line that
typically carries orders for a few packets of border plant seeds should be weighted less
than a one-minute outage on a line that often carries orders for planting several
thousand acres of corn.
The correct way to measure an outage affecting a batch program varies from one
application to another. The batch program could be considered a major user and,
therefore, should be weighted more heavily than single-transaction users. Conversely,
you could argue that the batch program should be weighted more lightly because you
can easily start it again and how long it takes or when it finishes is not important.
Alternative Ways to Measure Downtime
Of course, many users might choose to measure downtime in ways other than user
outage minutes, depending on their specific business needs. For example, a site might
be obligated to pay penalties for each transaction that does not get processed while
the application is down. Such a site might supplement its measure of downtime as
follows.
To measure the number of transactions that would have been processed during an
outage, the site keeps a record of the number of transactions it normally processes by
minute and by day of the week. If an outage occurs, for example, at 10 a.m. on
Tuesday morning and lasts for 15 minutes, the site can calculate the average number
of transactions that would normally be processed during that period. Subsequently, the
site pays a corresponding penalty to its customer.
Using this method leads to significantly different outage costs depending on the time of
day and the day of the week. An hour-long outage at 2 a.m. on Monday morning might
carry a negligible penalty when compared with a 15-minute outage at 5 p.m. on a
Friday.
What Causes Outages?
Before attempting to design an available application, it is important to understand the
potential causes of outage. While the specific causes of outage are many, it is possible
to place them into meaningful categories. This subsection introduces the five outage