EMS Manual
Standard Events
EMS Manual—426909-005
9-5
Reactive Problem Management Functions
enters this state. Report this same event every time the operator tries but fails to
bring the object back into service.
When an object needs an operator to change its state
An object is in an intervention-needed state when it needs the operator to change
its state. For example, a Pathway terminal needs the intervention of an operator if it
is to be taken out of the suspended state, so the suspended state of a Pathway
terminal is of interest to an operator. Report the Object Other State Change event
when the object enters this state. If the object enters a state that is better described
as unavailable, report the Object Unavailable event instead.
When an object persists in a state longer than expected
An object is in a persistent state when it stays in a state longer than expected and
the operator should take notice. For example, if SNAX/XF takes longer than
expected to activate a Physical Unit (PU) or an application takes longer than
expected to connect to the network, report an Object Other State Change event for
the operator to take notice.
There is a subtle difference between persistent and transient states. In general, a state
change is persistent if the state remains long enough for management applications to
take action. It is transient if the state does not remain long enough for a management
application to intervene. Do not report transient object states that do not require
operator intervention in events because they do not facilitate problem management,
and the potential volume of such events is very high.
There are no restrictions on the design of the internal states of subsystems and
applications. Subsystem and application developers are responsible for defining the
internal states of their objects, but they should report state changes that conform to the
requirements defined here.
Reactive Problem Management Functions
A problem is any incident that results in the loss of a system resource. When the
problem has already occurred, reactive problem management involves problem
detection through final resolution, including tracking and control:
1. Problem detection and isolation. An operator should be able to determine the loss
of availability of a system resource and how to isolate the problem to the failed
hardware, firmware, or software component.
Hardware failures deal with hardware objects such as CPU, memory, controller,
peripheral device, fan, or power supply.
Firmware failures deal with software that is downloaded to devices such as the
disk controller or tape controller.
Software failures deal with the loss of service of any software objects like a
process, a network connection, a subdevice, a protocol layer, or any function
provided by a subsystem or application.