NonStop S-Series Planning and Configuration Guide (G06.29+)

Planning for System Availability and Support
HP NonStop S-Series Planning and Configuration Guide523303-021
10-5
Preventing Unplanned Outages
Preventing Unplanned Outages
In addition to minimizing the number and duration of planned outages, preventing
unplanned outages is an important component of minimizing outage minutes.
Causes of Unplanned Outages
Studies have identified four common causes of unplanned outages (in order of greatest
frequency):
1. Operations management errors
2. Hardware configuration that is not fault-tolerant
3. Application design that is not fault-tolerant
4. Environmental problems such as AC power failures
Some of the strategies used to minimize planned outage minutes are also useful in
preventing unplanned outages. An example is automating startup and shutdown of
applications and system resources by creating startup and shutdown command files.
Using command files reduces the opportunity for operator errors that can cause an
unplanned outage.
Other ways of preventing unplanned outages include developing strategies for
managing problems that occur in your operations environment.
Problem Management
Prevent problems from becoming unplanned outages by:
Predicting potential problems before they occur.
Preparing for problems that might occur. Three important strategies are:
°
Preparing for environmental problems and disasters
°
Documenting your operations-management procedures
°
Documenting your problem-detection, escalation, and recovery procedures
Managing the system and applications to ensure that:
°
Operators are quickly notified of error conditions, state changes, and when
threshold conditions are exceeded, before they escalate into unplanned
outages.
°
Messages are logged and provide a chronological list of events to aid in
problem diagnosis and resolution.
°
A single source of information exists for both system and application events.