Availability Guide for Problem Management

Monitoring Objects

Availability Guide for Problem Management–125509

5-6

Performance Monitoring

When an object goes into an odd state, you need sufficient information to bring the

object back into an up state. This is preventive recovery, because the object is still

providing services; but if the situation is not corrected, a more important problem can

occur. For example, if an application transaction log file is over 75 percent full and this

is considered to be an odd state for this object, the common corrective action is to create

a new log file or modify the attributes of the existing file. However, if this condition is

not detected, the log file could become full. If that happens, the application might have

to be stopped to correct the problem.

To detect that an object is in an odd state may require threshold alarm detection for the

object. Either an application or a monitoring subsystem must take responsibility for

tracking the odd state.

When determining what objects to monitor, there are two types of events that are

important:

•

The first type of event tells when an object changes state and requires reactive

recovery.

•

The second type of event tells when an object exceeds a threshold, which may also

cause a state change, and requires preventive recovery.

Performance Monitoring

In addition to monitoring the states of critical objects in your system environment,

another way to ensure increased availability is to monitor performance using the

following measurements:

•

End-user response time measurement

•

Throughput measurement

Measuring end-user response time is important because the assessment of system

availability should be from the end-user’s perspective. For example, it is not enough to

simply record that a certain hardware or software component has gone down; you must

also take into consideration the user’s ability to access the service, the quality of the

service provided, and whether or not the response time is acceptable to the user.

Throughput is measured as the number of transactions that the system can process in a

particular span of time. It is usually expressed as transactions per second. As throughput

increases, the cost of each transaction falls proportionally.