EMS Manual
Standard Events
EMS Manual—426909-005
9-7
Reactive Problem Management Functions
To find the event with the actual cause of the problem, use the problem management
application to search the EMS log for an Object Unavailable event with a subject the
same as the underlying object specified in this event. Repeat this search algorithm until
it finds an Object Unavailable event that contains the actual cause of the failure.
Because related events are generated close in time, examine only events with
generation times close to this event; in general, within a few minutes. You can use
filters to assist the EMS Distributor in the search.
Event generation time and object names in the Object Unavailable event help locate
the event that contains the actual cause of the problem. This is possible only if all
subsystems report the unavailability of their objects using the Object Unavailable
event, follow the recommended conventions for naming their objects, and do not use
unknown as the change reason.
Problem Rediscovery
One aspect of problem tracking and control is to determine if an Object Unavailable
event is reporting a new problem or the recurrence of a previously known problem. To
help with this task, the Object Unavailable event contains a field called symptom string,
which uniquely identifies where in the subsystem or application code the fault
manifested itself. This lets management applications quickly isolate the fault to a given
piece of software in the system, and to differentiate among problems reported by a
subsystem and determine if a similar problem was already reported. If a similar
problem was reported, the operator does not waste effort rediagnosing the problem,
and can quickly bypass or correct it.
The symptom string—in ASCII form—should contain:
The release version update of the subsystem, including any version information
(ID, date, and so on) that uniquely identifies a piece of software in the system.
The subsystem module name. A unique name within the subsystem; for example,
the name of a procedure where the fault occurred.
An identifier within the module name; for example, a code statement label that
indicates where the fault occurred in the module.
If the fault manifests itself as inconsistent data, logic error, or any errors that a
subsystem or application can detect before it aborts its service, the subsystem should
construct and report the symptom string in the Object Unavailable event.
If the fault manifests itself in a CPU or system freeze or in a NonStop Kernel trap, the
subsystem cannot construct and report the above symptom string. In this case, the
operating system constructs the fault code—halt code for CPU or system freeze, trap
code for a NonStop Kernel trap—and the code location of the process when the fault
occurs. This information helps identify the problem.