Availability Guide for Problem Management

Monitoring Event Messages
Availability Guide for Problem Management125509
4-7
Step 2Filtering System Event Messages
Step 2—Filtering System Event Messages
EMS allows you to filter event messages to reduce the number of messages and
highlight messages that require operator attention or intervention.
What Are EMS Filters and How Are They Used?
The event log file is read by the EMS distributor processes configured onto or started on
the system. Every distributor uses a filter to determine whether to pass an event message
to its destination (or destinations). The distributor selects only the event messages it
wants by comparing each message to this filter.
For example, you can design a forwarding distributor that forwards only critical event
messages, a printing distributor that prints only action event messages, and a consumer
distributor that returns only Pathway event messages. EMS filters allow you to reduce
event message traffic within a system and over a network by placing part of the program
logic as close to the source of event messages as possible. Rejected event messages are
skipped; selected event messages proceed to their destination.
Filter Language and Compiler
To make a filter for a forwarding, printing, or consumer distributor, you can create an
edit file containing the filter-language constructs that express your selection criteria. You
then use the filter-language compiler (EMF) to generate an object file suitable for
loading to the distributor. Once the filter specification is loaded, the distributor uses it to
decide whether to pass each event message to its destination. For more information
about using the EMS filter language and compiler, refer to the EMS Manual.
Step 3—Writing Operations and Recovery Procedures
After analyzing and filtering your system event messages, you need to document the
recovery steps in an operations runbook. A runbook is a compilation, either in hard
copy or online, of the procedures required to keep your system environment up and
running. It should include all of the routine tasks, daily and otherwise, that your
operations staff perform, such as monitoring the status of system hardware and software;
performing routine disk, tape, and spooler operations; and identifying and resolving
system problems. Your operations runbook should specify which events are critical and
should define the operational policy and recovery steps for each critical event.
Table 4-1 lists some of the various tasks that should be documented in your operations
runbook. Refer to the Introduction to NonStop Operations Management for a more
complete list of routine (daily, weekly, monthly) tasks.
Step 4—Automating Operations and Recovery Procedures
Where possible, have an automatic response to events, so that the operator does not have
to be involved in recovery procedures. Section 8, “Automating Operations and Recovery
Procedures,” provides more detailed information on this topic.