Availability Guide for Problem Management

Availability Guide for Problem Management125509
xiii
About This Manual
The Availability Guide for Problem Management explains how to maximize system and
application availability by preventing problems from becoming unplanned outages. This
manual:
Defines problem management and explains how it relates to the operations
management (OM) framework and online management
Describes the causes of unplanned outages and explains how to predict, prevent, and
prepare for them
Shows how to quickly bring a system or application back online after an unplanned
outage by implementing efficient problem-resolution techniques
Describes how to predict, prevent, and detect problems by effectively managing
system and application messages and by monitoring important objects
Describes how to predict, prevent, detect, and quickly recover from problems by
automating operations and recovery procedures
Lists and describes the tools provided by Tandem to detect, analyze, and recover
from problems, and to administer the problem environment
Who Should Read This Manual?
Anyone—or any group—responsible for managing Tandem systems should read this
manual. The following table identifies some typical readers and the kinds of
information they can find in this manual.
This manual assumes that the reader has worked with NonStop systems before and is
familiar with operations management.
These kinds of readers… Look for this information about…
Operations management
Choosing Tandem products for problem
management
Understanding how various products fit together
Operations and support personnel
Diagnosing and solving (or escalating) problems
Monitoring systems
Logging problems
Measuring and analyzing system performance
Automating operations and recovery procedures
Setting policies for problem escalation and disaster
recovery
Ensuring that all personnel are adequately trained