Availability Guide for Application Design

What Is Application Availability?
Availability Guide for Application Design525637-004
1-19
Provide Instrumentation
Pathway server. The HP server and its software, however, are designed so that
availability features are retained even in a mixed application environment.
Provide Instrumentation
The costs of application downtime have already been discussed earlier in this section.
The burden of keeping those costs low is often carried by the operations staff. The
operations staff must do what they can to prevent the application from going down and,
if it does go down, to get the application back online as quickly as possible.
Application designers and developers can help the operations staff considerably by
providing appropriate instrumentation in the application. The application should
provide:
Information to the operator about changes in status of application objects
A command-and-response interface that the operator can use to control the
application in an appropriate way
By using Event Management Service (EMS) messages and the Subsystem
Programmatic Interface (SPI), your application can be a part of the Distributed
Systems Management (DSM) subsystem. The result is that you can take advantage of
message collection and filtering and have your messages processed by DSM tools and
applications. Operators and programs performing automated operations tasks are
thereby quickly informed of the information they need to know.
The EMS and DSM facilities are unique to HP systems. However, by using the open
Simple Network Management Protocol (SNMP), your application can forward EMS
messages to any open network management facility for display and operator action.
Your application can also be directly instrumented using the PEER Networks’
Subagent Toolkit for management through local or remote SNMP manager
applications; refer to the SNMP Subagent Programmers Guide for details.
By using the Measure product, your application can take statistical measurements of
the way your application resources are used.
With the appropriate instrumentation, operations can enhance the availability of any
application:
Using the command interface to get the status of critical objects by performing
query operations. Preventive action can be scheduled to make sure that the object
status does not become critical.
By responding to alarms indicating that a resource has crossed a critical threshold;
for example, a disk file has become 95 percent full. An immediate proactive
response usually avoids an outage. Measure counters can help by monitoring
application resources, for example, by reporting when the time that messages
spend on a request queue passes a given threshold.
By responding in the shortest possible time to an indication that a critical object
has gone offline; for example, a communication line is down. Immediate reactive
response keeps downtime to a minimum.