Availability Guide for Problem Management
Automating Operations and Recovery Procedures
Availability Guide for Problem Management–125509
6-4
Repetitive Tasks
•
Message queues
•
Processor utilization
•
Control block usage
•
Disk queues
•
Spooler cleanup
The Tandem Object Monitoring Facility (OMF) can be used to monitor these objects.
For example, when a critical process fails, OMF detects it and generates an EMS event.
An automated operator will receive the event and execute a customized PROCESS
recovery rule, which will send the event-related information to a TACL server. The
restart code executed by the TACL server then specifies ASSIGN, DEFINE, and
PARAM attributes before restarting the process.
Repetitive Tasks
Automate any repetitive tasks using the “three-by-three” rule: If you have to perform a
task more than three times, automate the task. Repetitive tasks that can be automated
include:
•
Backing up configuration files, such as Pathway, Safeguard, and TMF control files
to tape and disk
•
Performing reloads
•
Cleaning up and restarting spoolers
•
Deleting saveabend files
•
Restarting local area networks (LANs) and SNAX lines
•
Performing disk decompressions using DCOM and DSAP
•
Saving TMF audit trail dumps to disk
•
Using Meascom to take regular system measurements
•
Backing up the EMS log
•
Performing dumps of line configurations using SCF
•
Collecting statistics on communications lines for analysis
•
Checking status of devices, processes, and applications
•
Starting up and shutting down applications
Problem Determination Steps
Problem determination steps should be automated to help you determine the cause of a
failure, for example, when a line goes down and an EMS event is generated. Automating
the steps necessary to determine how the problem occurred can reduce the time needed
to recover the object and get it functioning again. Section 3, “Recovering From
Unplanned Outages,” provides more information about this topic.