Introduction to NonStop Operations Management

ManualsBrandsHP ManualsServerHP NonStop G-Series

281

282

283

284

285

286

287

288

289

290

Check Lists

Introduction to NonStop Operations Management–125507

B-5

Problem Management

1. Maintain a well-trained operations and support staff.

2. Establish problem prevention strategies. Your staff should:

•

Monitor the hardware and software

•

Monitor system and application message logs

•

Automate operations and recovery procedures as much as possible

•

Ensure that the system’s fault-tolerant features are fully used and maintained

•

Design your system to take advantage of quick startup and shutdown techniques

•

Ensure the availability of super-group (255,n) capabilities to solve certain

problems

•

Be prepared and trained for environmental problems and disasters

•

Maintain up-to-date and well-tested recovery procedures

3. Establish problem detection procedures. Your staff should:

•

Monitor the hardware and software

•

Monitor system and application software message logs

•

Automate system-monitoring tasks and use monitoring check lists

•

Monitor TSM incident reports

•

Act on information received from users reporting problems

4. Establish procedures for reporting problems:

•

Develop a standard problem report form.

•

Create and maintain a system outage log.

•

Designate people responsible for logging problems.

•

Consider establishing a help desk.

•

Train staff and users in problem reporting procedures.

5. Establish problem-solving techniques for identifying the cause of a problem and

developing a solution. Using a problem-solving worksheet can help operators

systematically list the facts about a problem, list possible causes, identify the cause,

and develop a solution.

6. Establish problem escalation procedures. Your staff should:

•

Know who should work on easy-to-fix problems and who should work on

complex problems, and determine the percentage of problems that should be

resolved by each level of support.

•

Know how long to work on a problem before escalating the problem to the next

level of support.

•

Know whom to contact for help with system-related and application-related

problems.