EMS Manual

ManualsBrandsHP ManualsServerHP NonStop G-Series

191

192

193

194

195

196

197

198

199

200

Standard Events

EMS Manual—426909-005

9-5

Reactive Problem Management Functions

enters this state. Report this same event every time the operator tries but fails to

bring the object back into service.



When an object needs an operator to change its state

An object is in an intervention-needed state when it needs the operator to change

its state. For example, a Pathway terminal needs the intervention of an operator if it

is to be taken out of the suspended state, so the suspended state of a Pathway

terminal is of interest to an operator. Report the Object Other State Change event

when the object enters this state. If the object enters a state that is better described

as unavailable, report the Object Unavailable event instead.



When an object persists in a state longer than expected

An object is in a persistent state when it stays in a state longer than expected and

the operator should take notice. For example, if SNAX/XF takes longer than

expected to activate a Physical Unit (PU) or an application takes longer than

expected to connect to the network, report an Object Other State Change event for

the operator to take notice.

There is a subtle difference between persistent and transient states. In general, a state

change is persistent if the state remains long enough for management applications to

take action. It is transient if the state does not remain long enough for a management

application to intervene. Do not report transient object states that do not require

operator intervention in events because they do not facilitate problem management,

and the potential volume of such events is very high.

There are no restrictions on the design of the internal states of subsystems and

applications. Subsystem and application developers are responsible for defining the

internal states of their objects, but they should report state changes that conform to the

requirements defined here.

Reactive Problem Management Functions

A problem is any incident that results in the loss of a system resource. When the

problem has already occurred, reactive problem management involves problem

detection through final resolution, including tracking and control:

1. Problem detection and isolation. An operator should be able to determine the loss

of availability of a system resource and how to isolate the problem to the failed

hardware, firmware, or software component.

Hardware failures deal with hardware objects such as CPU, memory, controller,

peripheral device, fan, or power supply.

Firmware failures deal with software that is downloaded to devices such as the

disk controller or tape controller.

Software failures deal with the loss of service of any software objects like a

process, a network connection, a subdevice, a protocol layer, or any function

provided by a subsystem or application.