Introduction to NonStop Operations Management

Production Management
Introduction to NonStop Operations Management125507
5-2
Monitoring System Status
Monitoring System Status
To ensure that the system is operating properly and to recognize when corrective action
is required, it is important to monitor the status of all the resources of the system and
network. Monitor on a continuous basis. Resources include processors, cabinets, disks,
paths, volumes, controllers, communication lines, Expand lines, transaction-processing
servers, terminal control processes (TCPs), terminals, spooler devices, and programs.
Monitoring should include:
Monitoring event and alert messages
Monitoring resources as they change states
Monitoring performance of processors, disks, and communication lines
By monitoring system status, you can:
See if resources are currently up or down
Be quickly notified of error conditions, state changes, and threshold conditions that
have been exceeded or are reaching their limits
See a chronological list of events that can aid in problem diagnosis and resolution
Determine how much of a particular resource is being used, for example, processor
cycles, disk or file space, or communication line bandwidth
Find bottlenecks, which can affect the users of the system
Make better use of existing resources
Ensure that applications such as NonStop SQL/MP, NonStop Transaction
Manager/MP, and NonStop Transaction Services/MP are available
Prevent problems from occurring
Controlling the System
Based on the information gathered from looking at events, monitoring objects, and
watching performance, you must be able to control these objects. For example, you must
be able to issue commands to fix problems, avoid problems, perform routine tasks, or
increase system stability. To be able to control the system effectively, it is important to:
Provide operator tools to report, resolve, and fix problems; view documentation and
manuals online; and write reports. Providing your staff with these tools can:
Help improve operator productivity and make better use of computer resources
Make training easier
Reduce the number of operator mistakes
Automate operations. Automating operations is important because it can:
Help manage unattended remote nodes
Perform routine tasks
Automate monitoring tasks