Introduction to NonStop Operations Management

Check Lists
Introduction to NonStop Operations Management125507
B-5
Problem Management
Problem Management
1. Maintain a well-trained operations and support staff.
2. Establish problem prevention strategies. Your staff should:
Monitor the hardware and software
Monitor system and application message logs
Automate operations and recovery procedures as much as possible
Ensure that the system’s fault-tolerant features are fully used and maintained
Design your system to take advantage of quick startup and shutdown techniques
Ensure the availability of super-group (255,n) capabilities to solve certain
problems
Be prepared and trained for environmental problems and disasters
Maintain up-to-date and well-tested recovery procedures
3. Establish problem detection procedures. Your staff should:
Monitor the hardware and software
Monitor system and application software message logs
Automate system-monitoring tasks and use monitoring check lists
Monitor TSM incident reports
Act on information received from users reporting problems
4. Establish procedures for reporting problems:
Develop a standard problem report form.
Create and maintain a system outage log.
Designate people responsible for logging problems.
Consider establishing a help desk.
Train staff and users in problem reporting procedures.
5. Establish problem-solving techniques for identifying the cause of a problem and
developing a solution. Using a problem-solving worksheet can help operators
systematically list the facts about a problem, list possible causes, identify the cause,
and develop a solution.
6. Establish problem escalation procedures. Your staff should:
Know who should work on easy-to-fix problems and who should work on
complex problems, and determine the percentage of problems that should be
resolved by each level of support.
Know how long to work on a problem before escalating the problem to the next
level of support.
Know whom to contact for help with system-related and application-related
problems.