Introduction to NonStop Operations Management

ManualsBrandsHP ManualsServerHP NonStop G-Series

241

242

243

244

245

246

247

248

249

250

Operations Management and Continuous

Improvement

Introduction to NonStop Operations Management–125507

13-12

Implementing an Operations-Management

Improvement Program

•

Provide a high-level view of the system that operators can easily interpret. OMF

can represent many thousands of objects and their states on one screen. With a

quick look at this screen, operators get an immediate impression of the health of

the system they have to manage.

The improvement team implemented OMF in stages, beginning with processors,

followed by disks, processes, spooler objects, and finally TMF. This helped

operators gain experience with one or two objects at a time.

For more information about object monitoring, refer to the Availability Guide for

Problem Management.

•

Action 4: Implement automation. Completing the preceding actions allowed

operators to display significant events and detect critical conditions before they

occurred. Now the improvement team was ready to implement an automated

operator product. To accomplish this, the improvement team:

•

Used the default rule set to perform problem recovery for the Pathway, Expand,

and SNAX subsystems.

•

Wrote customized recovery rules for their specific installation.

•

Used OMF to develop and optimize new rules for objects monitored by OMF.

•

Coded the automated operator so that an event is generated each time a recovery

rule is executed. This helped operators know when a problem occurred and the

outcome of the recovery.

For more information about implementing automation, refer to the Availability

Guide for Problem Management.

•

Action 5: Implement process statistics. After implementing such significant changes,

the improvement team wanted to measure the results. Specifically, they wanted to

review and optimize the automated recovery rules. To accomplish this, the

improvement team used EMS Analyzer (EMSA) to track the efficiency of

automation. They made the following observations:

•

Manual recoveries increased in December after the operations console was

installed. Because of the improved visibility of messages, operators could detect

and fix problems that were previously unnoticed.

•

After the automated operator was installed, automated recoveries began to

replace manual recoveries.

•

During the first few months after the automated operator was installed, it

recovered from 50 to 80 incidents per week without operator intervention. After

OMF was used to develop and optimize new rules, automated recoveries grew to

300 per week.

Figure 13-3 compares the number of problem events recovered manually with the

number recovered by the automated operator during the improvement program.