Introduction to NonStop Operations Management
Problem Management
Introduction to NonStop Operations Management–125507
6-4
Problem Prevention Strategies
Problem Prevention Strategies
You can prevent many problems by implementing the following strategies:
•
Monitor the hardware and software. To ensure that the system is operating properly 
and to recognize when a potential problem might occur, it is important to monitor 
continuously the status of all the resources of the system and network. Resources 
commonly monitored include processors, disks, paths, devices, processes, spooler 
components, audit trails, audit dumps, NonStop TM/MP transactions, tape mount 
requests, communication lines, and programs. Monitoring includes:
•
Monitoring resources as they change states (up or down). (Use the Object 
Monitoring Facility [OMF] or TSM.)
•
Monitoring end-user response time and throughput. (Use ViewSys or NSX.)
•
Monitoring critical resource utilization (threshold limits, disk files and volumes 
percent full, memory queues, message queues, disk queues, processor 
utilization, and control block usage). (Use ViewSys or NSX.)
•
Monitor system and application software message logs by using DSM facilities, 
such as EMS and the TSM EMS Event Viewer. DSM also helps developers create 
applications that generate events and create log files.
•
Automate operations and recovery procedures. Examples of tasks that are typically 
automated for problem prevention include:
•
Object state monitoring.
•
Performance monitoring.
•
Critical resource monitoring.
•
Recovery tasks for routine (recurring) problems.
•
Routine (recurring) tasks. If you have to perform a task more than three times, 
automate the task.
•
Problem determination steps. For example, an event is generated when a line 
goes down. Problem analysis tasks, such as gathering information to help you 
determine the cause of the failure, can be automated.
For more information on automating operations and automation tools, refer to 
Section 12, “Automating and Centralizing Operations.”
•
Make sure that your system is fault tolerant. Tandem systems provide continuous 
availability and fault-tolerance features; however, it is up to you to make sure that 
these unique features are fully used and maintained. 
The Availability Guide for Problem Management provides information on auditing 
your system for fault tolerance. Guidelines are included to help you determine the 
fault tolerance of your software and hardware configurations.
•
Design your system and application to take advantage of quick startup and shutdown 
techniques. The Availability Guide for Change Management provides operational 
strategies for reducing startup and shutdown time. The Availability Guide for 










