Availability Guide for Application Design
Instrumenting an Application for Availability
Availability Guide for Application Design—525637-004
8-37
Availability Requirements of DSM Management
Applications
•
Supporting automated operations through issuing commands in response to
event messages
•
Supporting a human operator interface though the display of filtered messages and
a command interface
•
Gathering performance statistics on system resources and application resources
by using the Measure product
Using the functions listed above, commercial or user-written management applications
must work with the SPI interfaces in the business application to address a range of
availability issues. These issues include:
•
Providing a query/status infrastructure
•
Reporting on resource availability
•
Handling online recovery
Providing a Query/Status Infrastructure
The management application needs to support operator interface commands that will
enable operators to periodically query the business application and the HP subsystems
on which the application depends to establish the status of its objects. This
infrastructure allows operations staff to predict problems ahead of time by observing
when application response time is deteriorating, specific processors are running close
to capacity, or a disk is filling up.
Using this kind of information, the operations staff is able to proactively take steps to
avoid downtime by performing load balancing operations, starting additional server
processes, or making capacity planning decisions regarding additional purchases of
equipment.
Reporting on Resource Availability
Your management application needs to read event messages from the consumer
distributor so that operations can respond to objects that cross critical thresholds and
other alarms. A proactive response by an automated or human operator can prevent
your business application from going offline.
When a disk containing a log file, for example, is approaching its threshold, an
automated operator should be able to command the business application to use an
alternate disk for logging. In other situations, an automated response might not be
possible; the management application must highlight the problem to operations staff for
urgent attention.
Handling Online Recovery
Even after implementing rigorous query/status operations and providing response to
object thresholds, applications can still fail. Some possible reasons are listed as
follows: