Availability Guide for Problem Management

Monitoring Event Messages

Availability Guide for Problem Management–125509

4-7

Step 2—Filtering System Event Messages

EMS allows you to filter event messages to reduce the number of messages and

highlight messages that require operator attention or intervention.

What Are EMS Filters and How Are They Used?

The event log file is read by the EMS distributor processes configured onto or started on

the system. Every distributor uses a filter to determine whether to pass an event message

to its destination (or destinations). The distributor selects only the event messages it

wants by comparing each message to this filter.

For example, you can design a forwarding distributor that forwards only critical event

messages, a printing distributor that prints only action event messages, and a consumer

distributor that returns only Pathway event messages. EMS filters allow you to reduce

event message traffic within a system and over a network by placing part of the program

logic as close to the source of event messages as possible. Rejected event messages are

skipped; selected event messages proceed to their destination.

Filter Language and Compiler

To make a filter for a forwarding, printing, or consumer distributor, you can create an

edit file containing the filter-language constructs that express your selection criteria. You

then use the filter-language compiler (EMF) to generate an object file suitable for

loading to the distributor. Once the filter specification is loaded, the distributor uses it to

decide whether to pass each event message to its destination. For more information

about using the EMS filter language and compiler, refer to the EMS Manual.

Step 3—Writing Operations and Recovery Procedures

After analyzing and filtering your system event messages, you need to document the

recovery steps in an operations runbook. A runbook is a compilation, either in hard

copy or online, of the procedures required to keep your system environment up and

running. It should include all of the routine tasks, daily and otherwise, that your

operations staff perform, such as monitoring the status of system hardware and software;

performing routine disk, tape, and spooler operations; and identifying and resolving

system problems. Your operations runbook should specify which events are critical and

should define the operational policy and recovery steps for each critical event.

Table 4-1 lists some of the various tasks that should be documented in your operations

runbook. Refer to the Introduction to NonStop Operations Management for a more

complete list of routine (daily, weekly, monthly) tasks.

Step 4—Automating Operations and Recovery Procedures

Where possible, have an automatic response to events, so that the operator does not have

to be involved in recovery procedures. Section 8, “Automating Operations and Recovery

Procedures,” provides more detailed information on this topic.