Introduction to NonStop Operations Management
Production Management
Introduction to NonStop Operations Management–125507
5-2
Monitoring System Status
Monitoring System Status
To ensure that the system is operating properly and to recognize when corrective action
is required, it is important to monitor the status of all the resources of the system and
network. Monitor on a continuous basis. Resources include processors, cabinets, disks,
paths, volumes, controllers, communication lines, Expand lines, transaction-processing
servers, terminal control processes (TCPs), terminals, spooler devices, and programs.
Monitoring should include:
•
Monitoring event and alert messages
•
Monitoring resources as they change states
•
Monitoring performance of processors, disks, and communication lines
By monitoring system status, you can:
•
See if resources are currently up or down
•
Be quickly notified of error conditions, state changes, and threshold conditions that
have been exceeded or are reaching their limits
•
See a chronological list of events that can aid in problem diagnosis and resolution
•
Determine how much of a particular resource is being used, for example, processor
cycles, disk or file space, or communication line bandwidth
•
Find bottlenecks, which can affect the users of the system
•
Make better use of existing resources
•
Ensure that applications such as NonStop SQL/MP, NonStop Transaction
Manager/MP, and NonStop Transaction Services/MP are available
•
Prevent problems from occurring
Controlling the System
Based on the information gathered from looking at events, monitoring objects, and
watching performance, you must be able to control these objects. For example, you must
be able to issue commands to fix problems, avoid problems, perform routine tasks, or
increase system stability. To be able to control the system effectively, it is important to:
•
Provide operator tools to report, resolve, and fix problems; view documentation and
manuals online; and write reports. Providing your staff with these tools can:
•
Help improve operator productivity and make better use of computer resources
•
Make training easier
•
Reduce the number of operator mistakes
•
Automate operations. Automating operations is important because it can:
•
Help manage unattended remote nodes
•
Perform routine tasks
•
Automate monitoring tasks