Availability Guide for Problem Management
Monitoring Event Messages
Availability Guide for Problem Management–125509
4-11
Step 1—Instrumenting Applications to Generate
EMS Event Messages
Deciding Which Events to Report
The first task is to determine what events your applications can detect and which of
those it should report to EMS. Generate messages for critical or action events only. Once
you have decided what events to report, you will need to decide whether or not any of
those events should be considered critical or action events.
In general, your application should produce an event message to report an occurrence
that might affect how the system or network is managed or maintained. Be selective,
however, to avoid overwhelming management applications and operations staff with
messages that are of little help to them.
Reporting Critical Events
Critical events are those you so designate because they may indicate significant loss of
or damage to your application’s environment. Critical events may include:
•
Potential or actual loss of data
•
Loss of a major subsystem function
•
Loss of fault tolerance, redundant resource, failure-recovery function
•
Loss of subsystem integrity (an unrecoverable internal error)
Reporting Action Events
Action events arise when an application determines that a problem cannot be resolved
without operator intervention, such as needing a tape to be mounted.
Because the operator might overlook a displayed event message or, rarely, an event
message might be lost, your application should reissue any action-attention event that
has not been remedied within an appropriate period of time.
Deciding What Information to Include in Event Messages
Event messages are based on tokens and are built in a standard way by the EMS routines
EMSINIT, EMSADDTOKENS, and so on.
The EMS routines require that many common event-message components be passed as
parameters, ensuring that all event messages have at least the basic facts. However, there
is still a lot of room for variation in event messages. When deciding how your
application will report events, there are a few basic principles to keep in mind:
•
Because there are two possible audiences for any event message—management
applications and human operators—it is important that the message convey
meaningful information to both audiences. The tokenized information should
contain a complete description of the event for applications, and the text should
summarize the event for operators.
•
The event messages should be self-contained, each fully describing a particular
event.
•
Event messages are almost always filtered before reaching an interested party, so
you should ensure that the information is presented in a way that allows many
filtering options.