Availability Guide for Application Design
Instrumenting an Application for Availability
Availability Guide for Application Design—525637-004
8-10
A Framework for Planning and Developing Your
Instrumentation
Instrumenting for Failure Resolution and Recovery
Instrumentation can help resolve the failure and recover the application through a
command interface that can alter the status of objects by starting, stopping,
suspending, or activating parts of the application. For some classes of failures,
automated operations can react immediately upon detecting a failure by commanding
the application to perform appropriate tasks. For example, on receipt of an event
message indicating that a critical process has stopped, an automated operator can
command the application management interface to restart the failed process.
A Framework for Planning and Developing Your Instrumentation
Central to instrumenting an application is the ability to monitor the state changes of
objects in your application. As indicated under How Does Instrumentation Improve
Availability? on page 8-8, you need to generate and capture events to prevent failures
and to detect failures. You can do this by monitoring the state changes in objects.
Your instrumentation should therefore include the following steps:
1. Establish the set of critical objects that need instrumentation.
2. Determine the set of states for each object and the conditions that cause state
changes.
3. Design event messages to report those changes.
4. Define the command and control messages that automated and human operators
will use in response to significant events and to query the status of objects.
5. Define the criteria that indicate the health of the application.
Which Objects Need Monitoring?
To establish the set of objects that need monitoring, you should first take an inventory
of all objects in your application: disks, processors, processes, TMF transactions,
workstations, and so on. Having listed your objects, you can then establish how they
relate to one another and the extent to which each of these objects is critical to the
availability of the application.
To establish how objects in an application relate to each other, consider the object
diagram for a Pathway application shown in Figure 8-3 on page 8-11. Here, the objects
must satisfy the following constraints:
•
A PATHMON system can have only one PATHMON process pair.
•
A PATHMON process manages one or more terminal control processes (TCPs)
and one or more server classes.
•
A TCP manages one or more terminals.
•
A terminal can execute one or more programs.