Introduction to NonStop Operations Management

ManualsBrandsHP ManualsServerHP NonStop G-Series

121

122

123

124

125

126

127

128

129

130

Problem Management

Introduction to NonStop Operations Management–125507

6-7

Step 1—Detecting and Isolating the Problem

To detect problems quickly, operators must be aware that a problem exists. Some of the

same techniques used to predict and prevent problems are also used to determine if a

problem exists. These are:

•

Monitoring hardware and software.

•

Monitoring system and application software message logs.

•

Using Tandem Service Management (TSM) tools, including the TSM EMS Event

Viewer. TSM uses expert systems technology to detect, analyze, diagnose, and

archive hardware problems as they occur—often detecting failures before they affect

system performance.

•

Automating monitoring tasks and recovery procedures.

•

Receiving information from a user or from users indicating that a problem exists.

To ensure that problems are detected as quickly as possible, establish procedures for

monitoring the system and logs, and for receiving information from users. For guidelines

to help you develop monitoring procedures, refer to The Availability Guide for Problem

Management.

Step 2—Gathering the Facts and Reporting the Problem

After a problem is detected, it is usually reported. Consider establishing procedures for

reporting problems. Established procedures help you track:

•

Each problem that occurs

•

How the problem was resolved

•

Who resolved the problem and when

•

Recurring problems

•

How long it took to resolve the problem

•

Whether a problem can be prevented or recovery procedures for that problem can be

automated

If all problems are logged, your staff can generate weekly or monthly summaries that

allow you to evaluate system and staff performance and focus on problem areas.