Availability Guide for Application Design

Instrumenting an Application for Availability
Availability Guide for Application Design525637-004
8-12
A Framework for Planning and Developing Your
Instrumentation
Define Valid Object States
For each object that is critical to the availability of your application, you need to
establish its valid states and the conditions that cause state changes. Valid states for
all objects fall into the following general state categories:
For each object identified, you should establish its valid states, state transitions,
possible conditions that make it change state, and the desired action. For each state
change, the object might have a predefined set of actions. To provide appropriate
instrumentation, you must determine for which of these transitions an event message
is required.
For most critical objects, you typically need to generate an event message if the object
makes a transition into a down state from an up state or an odd state, or if that object
makes a transition into an odd state from an up state. It is helpful to establish the valid
states for an object and the conditions that cause state change. This technique is
sometimes referred to as dynamic modeling.
Up An object is in an up state when it is started. The object meets all of its
operational objectives and can be used to provide services.
Examples are a normally executing server process or an active
transaction.
Down An object is in a down state when it is stopped. The object is known to
the system but cannot provide any useful services for the application.
A critical object that transitions to a down state implies the need for
reactive recovery.
Examples include a stopped server process or a frozen server class.
Unknown An object is in an unknown state when it is not defined. As far as the
application is concerned, the object does not exist.
An example is a terminal that has not been configured in a Pathway
application.
Odd An object is in an odd state when it is not in any of the other states.
While still able to function, it might have crossed some threshold value,
indicating that it might not be available much longer unless corrective
action is taken. Alternatively, it might be exhibiting partial function by
operating with degraded performance or with some of its functions
available but others not available.
A critical object in the odd state might imply the need for preventive
recovery to keep the object available.
An example is an automated teller that is low on cash and will have to
shut down if corrective action is not taken.