Availability Guide for Problem Management
Preventing Unplanned Outages
Availability Guide for Problem Management–125509
2-5
Goals and Strategies
Operations management documentation should include descriptions of processes
and routine tasks, and it should indicate who is responsible for these processes and
tasks.
•
Documenting your problem-detection, escalation, and recovery procedures.
Define procedures for monitoring system hardware and software, system and
application message logs, and user requests. For example,
•
Operators should completely check the system at the beginning and middle of
each shift.
•
Operators should check all log files frequently.
Train your operators and analysts to interpret error messages and to refer to
documentation or software that describes error messages, such as the Operator
Messages Manual.
Publish your procedures to ensure that users know whom to contact and what
information to provide when a problem occurs.
The Introduction to NonStop Operations Management provides task lists related to
monitoring hardware and software.