Availability Guide for Problem Management
Recovering From Unplanned Outages
Availability Guide for Problem Management–125509
3-22
Step 5—Reviewing the Problem
Step 5—Reviewing the Problem
When a problem is resolved, the solution can be recorded and the problem report can be
closed. Reviewing problems and solutions with a focus on prevention can help the
operations staff prevent the same problems from recurring. Consider holding regular
review meetings with the staff to:
•
Review resolved and unresolved problems and classify problems into groups
•
Ensure that progress is made to close open issues and unresolved problems
•
Learn how problems were resolved and determine if problems could have been
resolved more quickly
•
Improve problem reporting and escalation procedures as necessary
When reviewing problems (both resolved and unresolved) it is important to differentiate
between the effect of the problem and the actual root cause of the problem. For example,
if a processor keeps going down, you need to try to determine why the processor
continues to fail and correct that problem.
Asking the Right Questions
When reviewing a problem and its resolution, asking the following questions can help
stimulate thoughts and ideas for problem prevention:
•
Why did this problem occur? What was the root cause? What were the contributing
factors?
•
How serious was the problem?
•
What is the likelihood that it will occur again?
•
Can the cause (or causes) be eliminated completely?
•
Can something be done to reduce the likelihood that such a problem will recur?
•
Can automation tools be used to detect or to respond (or both) to preliminary
symptoms automatically?
•
Can anything be done now to minimize the damage if such a problem recurs?
•
Can the speed of the problem resolution process be improved in any way?
Detecting Trends
Using information gathered from the problem review, you can detect trends and generate
reports that provide the following problem-recovery statistics:
•
Number and types of problems encountered
•
Number of problems resolved
•
Number of problems unresolved
•
Amount of time problems remained unresolved
•
Number of problems that were escalated
•
Levels of support that were required to solve problems