Availability Guide for Application Design

Instrumenting an Application for Availability
Availability Guide for Application Design525637-004
8-20
The Subsystem Programmatic Interface (SPI)
EMS Messages
EMS event messages are a special category of SPI messages that convey information
about events or significant occurrences in the subsystem environment. Occurrences
and conditions reported by event messages include:
Changes in the subsystem environment
Errors encountered during continuous operation. (This does not include errors
encountered during an interaction with a user or application, which are usually
reported directly to the user or application)
Conditions that might lead to a problem if not corrected
Conditions that require operator intervention
Significant losses of function or resources
Conditions that cause a process to terminate
These messages are in a format that can be used by EMS for filtering and collecting
and distributing as appropriate to DSM applications for automatic or human response.
Components of Event Messages
A typical event message generated by an application contains tokens for most of the
following:
An identification of the subsystem or application that owns the message
This information is needed by EMS for filtering. Using this information, operators
interested in activity from a specific subsystem or application or group of
subsystems or applications can receive event messages from only those
subsystems or applications. For example, an operator might need to monitor the
state changes of a requester program that manages automated tellers.
An indication of the criticality of the event
Because EMS event messages report on a wide range of occurrences and
conditions, they are divided into three classes: information events, action events,
and critical events.
An information event reports changes in the status of a process or device and
requires no further action by an operator or management application.
An action event reports when a condition that the subsystem or application
cannot resolve without operator intervention arises, for example, a tape needs
mounting. Note that an action event does not necessarily imply a problem. It
does imply that a human or automated operator must intervene on behalf of
the application.
A critical event reports a potentially critical situation when the consequences of
the event might be severe. The subsystem or application identifies potentially