Introduction to NonStop Operations Management
Problem Management
Introduction to NonStop Operations Management–125507
6-2
Management Responsibilities
Management Responsibilities
Managing the problem environment is most effective when problem-reporting and 
problem-escalation policies and procedures are developed and enforced, and the staff is 
trained in outage prevention and recovery.
Establishing Policies and Procedures
Past experience has shown that organizations lacking problem-reporting and problem-
escalation procedures have a higher rate of errors, a less efficient organization, longer 
recovery times, and a greater percentage of dissatisfied users.
Established problem-reporting and problem-escalation policies and procedures help you:
•
Ensure that all identified problems are reported, recorded, assigned a priority, and 
resolved
•
Track how quickly problems are resolved in order to determine if procedures need to 
be improved and if service-level agreements are being met
•
Identify recurring problems in order to eliminate the problems or to help the staff 
resolve the problems more quickly
•
Ensure that applications are designed to help your staff resolve problems when they 
occur
Table 6-1. Unplanned Outage Classes
Outage Class Description
Physical Physical faults or failure in the hardware. 
Examples include system disk failure and network router failure, nonfault-
tolerant hardware configurations (such as unmirrored disk drives), and 
nonfault-tolerant application configurations.
Design Design errors such as bugs in design and design failure in hardware or 
software. 
Examples include an application change that makes the application 
unusable by introducing unexpected problems.
Operations Errors caused by operations personnel caused by accident, inexperience, or 
malice. 
Examples include deleting data, incorrectly installing software, procedural 
problems (or lack of procedures), lack of operator training, and basic 
operations and maintenance tasks not being done or not being done 
correctly.
Environmental Failures in power, cooling, network connections, natural disasters 
(earthquake, flood), terrorism, and accidents.
Examples include air-conditioning system failure, power failures (such as 
batteries dead, no backup generator), or computer in basement destroyed by 
flood.










