Introduction to NonStop Operations Management

ManualsBrandsHP ManualsServerHP NonStop G-Series

241

242

243

244

245

246

247

248

249

250

Operations Management and Continuous

Improvement

Introduction to NonStop Operations Management–125507

13-8

Problem Scenario

The complexity of NAC’s systems was growing rapidly. Managers in the MIS

department had to ensure that each of the 10,000 objects was installed and configured

correctly and ran efficiently. The business applications and the system generated more

than 15 events (status, warning, and problem messages) per minute. However, most

problems were reported by end users over the phone. Even the most experienced

operators had difficulty detecting, recognizing, and recovering from problems in this

complex environment.

In addition, because business services were now available almost continuously, the

operations group no longer had periods of down time in which to perform maintenance

and installation tasks.

Implementing an Operations-Management Improvement Program

As the quality of end-user services decreased, the MIS managers recognized that it

would take a serious effort to cope with these new challenges. The MIS managers

decided to initiate an operations-management improvement program, assigning a team

of two senior support analysts to the project.

The following paragraphs describe the improvement team’s step-by-step implementation

of the improvement program.

Step 1—Assessing the Environment

The improvement team decided to assess their operations management processes by

measuring outages, observing the working environment, and analyzing the effectiveness

of their existing tools and processes. Based on their assessment, they concluded that

their operations management processes were at maturity level 1. The following

paragraphs summarize the improvement team’s assessments.

•

Application outages were too frequent. The improvement team required help-desk

operators to log each outage, the time of occurrence, end-user name, business

services affected, and the time to repair (outage duration). After analyzing the logs,

the improvement team determined that during peak hours of the day, the help desk

received from 20 to 25 phone calls per hour. Each outage took between 5 and 20

minutes to resolve.

•

In most cases, operators did not detect problems. Generally, end users phoned in to

report problems.

•

Sometimes operators learned of a critical situation only when scores of messages

started printing on hard-copy consoles.

•

There were so many messages that the operators could not sift through them and

take effective action. All application and system messages were directed to hard-

copy consoles configured as the HOMETERM device.

•

All problem recovery was performed manually. The hard-copy console arrangement

provided inadequate support for problem detection and analysis. Because operators

had trouble correlating the information on many pages of listings, they couldn’t see

what was going on in the system and couldn’t control it.