Availability Guide for Problem Management
Problem Management Tools
Availability Guide for Problem Management–125509
9-5
Event Management Service (EMS)
Event Management Service (EMS)
Tandem’s primary tool for event collection is the Event Management Service (EMS),
which is a set of processes that collects event messages from Tandem subsystems
(including NonStop operating system processes) and user-written subsystems. EMS then
selectively distributes those event messages to various destinations, such as a local
operator console or a management application running on a remote system. EMS event
messages can be used to perform the following tasks:
•
Monitoring a running network or system
•
Managing operator tasks
•
Analyzing problems
•
Detecting potential problems in advance
•
Automating problem detection and recovery
What Is an EMS Event Message?
EMS event messages are a special category of Subsystem Programmatic Interface (SPI)
messages that convey information about events or significant occurrences in the
subsystem environment. Occurrences and conditions reported by event messages
include:
•
Changes in subsystem environment
•
Errors encountered during continuous operation (This does not include errors
encountered during an interaction with a user or application, which are usually
reported directly to the user or application.)
•
Conditions that might lead to a problem if not corrected
•
Conditions that require operator intervention
•
Significant losses of function or resources
•
Conditions that cause a process to terminate
Event Classes
Because EMS event messages report on a wide range of occurrences and conditions,
they are divided into three classes: information events, action events, and critical events.
•
Information events report changes in the status of a process or device that require
no further action by an operator or system management application.
•
Action events report when a condition arises that the subsystem cannot resolve
without operator intervention.
•
Critical events report potentially critical situations when the consequences of the
event might be severe. The subsystem identifies potentially critical situations and
lets you (with the help of any programmatic tools you select) make the final
determination.