Availability Guide for Problem Management
Recovering From Unplanned Outages
Availability Guide for Problem Management–125509
3-18
Developing and Implementing a Solution
•
PEEK/CPUx/ (for all processors) provide statistical information about memory,
system tables, and other resources; includes configured parameters and high-water
marks of system resources since the counts were last reset.
•
PATHWAY STATS, depending on the Pathway applications being run on the system,
can be issued for TCP, TERM, and SERVER elements in the application. These
commands provide statistical information about these Pathway elements.
Other Tandem products available to help you collect dynamic system data following an
outage include:
•
Measure captures data during the normal and peak periods of activity of the
application. The data captured should depend on the application being run and the
outage being analyzed.
•
Guardian Performance Analyzer (GPA) is run in conjunction with previously
captured Measure data. It provides a series of commands and recommendations to
improve performance. It will also identify system resource shortages such as
processor memory or disk space.
Static information that should be collected and analyzed includes the following:
•
The currently installed versions of your Tandem software
•
System startup and shutdown files, as well as application startup and shutdown files
•
Application configuration and control files
Developing and Implementing a Solution
Given the cause of the problem, determine the best solution to resolve the problem.
Deciding how and when to resolve the problem is critical. Sometimes what seems to be
an “obvious” immediate fix may introduce subsequent problems. However, waiting may
delay recovery from the outage. Evaluate all of the issues surrounding the problem and
its resolution. The best solution is one that considers:
•
Cost—Is this the least expensive solution?
•
Speed—Is this the quickest way to solve the problem?
•
Safety—Will this solution adversely affect other components of the system?
•
Reliability—Will this solution eliminate the problem? Will this solution cause other
problems? Will this solution fix the cause or just the symptom?