Availability Guide for Problem Management
Availability Guide for Problem Management–125509
3-1
3
Recovering From Unplanned Outages
Overview
Even the best planning and prevention cannot avoid all unplanned outages. When
unplanned outages do occur, a methodical approach can help you pinpoint the cause
quickly. Using efficient problem-resolution techniques will save you time and money.
This section describes how to get your system or application back online quickly after
an unplanned outage by implementing efficient problem-resolution techniques. It also
includes:
•
Steps of systematic problem solving
•
Procedures for detecting problems in a timely manner
•
Approaches to analyzing problems and developing solutions
•
Steps for conducting a problem review
Systematic Problem Solving
Systematic problem solving is a way of organizing and structuring the task of solving
problems which, if not solved quickly, might result in unplanned outages. If you define
in advance the kinds of questions to ask and how to organize and analyze problem
information, your problem-solving techniques will be much more effective.
Problem-Solving Steps
Systematic problem solving can be broken down into the following steps:
1. Detecting and isolating the problem
2. Gathering facts and reporting the problem
3. Identifying the cause and then developing and implementing a solution
4. Escalating the problem, if necessary
5. Reviewing the problem, focusing on prevention
Note. This section does not provide recovery procedures for major component or system
failures. For information about how to recover from these failures, refer to the Guardian
System Operations Guide, and system and subsystem recovery manuals for processor, disk,
power, system, application subsystem, and communications failures.