Availability Guide for Problem Management
Availability Guide for Problem Management–125509
v
Contents
New and Changed Information iii
About This Manual xiii
Notation Conventions xvii
1. Introduction to Problem Management
Overview 1-1
What Is an Outage? 1-2
What Is Problem Management? 1-5
What Are the Goals of Problem Management? 1-5
Reducing or Eliminating Problems 1-5
Recovering Quickly From Problems That Do Occur 1-6
Focusing on Problems That Can Cause Unplanned Outages 1-6
Tandem’s Commitment to Problem Management Solutions 1-7
Tandem’s Support Process Is Changing 1-7
Reporting Problems 1-8
2. Preventing Unplanned Outages
Overview 2-1
What Is an Unplanned Outage? 2-1
Common Causes of Unplanned Outages 2-2
Operations Management Errors 2-2
Nonfault-Tolerant Hardware Configuration 2-2
Nonfault-Tolerant Application Design 2-2
Environmental Problems 2-2
Preventing Problems From Becoming Outages 2-3
Why Is Problem Prevention Important? 2-3
Goals and Strategies 2-3
Requirements for Successful Problem Prevention 2-6
Detailed, Step-by-Step Procedures 2-6
Well-Trained Staff 2-7
Tandem Education 2-7
Well-Designed Applications 2-7
System Configuration Documentation 2-8
Availability of Super-Group Capabilities 2-9
Disaster-Recovery Planning 2-9
Automated Recovery Procedures 2-9
Where to Find More Information 2-9