Availability Guide for Problem Management

Automating Operations and Recovery Procedures
Availability Guide for Problem Management125509
6-4
Repetitive Tasks
Message queues
Processor utilization
Control block usage
Disk queues
Spooler cleanup
The Tandem Object Monitoring Facility (OMF) can be used to monitor these objects.
For example, when a critical process fails, OMF detects it and generates an EMS event.
An automated operator will receive the event and execute a customized PROCESS
recovery rule, which will send the event-related information to a TACL server. The
restart code executed by the TACL server then specifies ASSIGN, DEFINE, and
PARAM attributes before restarting the process.
Repetitive Tasks
Automate any repetitive tasks using the “three-by-three” rule: If you have to perform a
task more than three times, automate the task. Repetitive tasks that can be automated
include:
Backing up configuration files, such as Pathway, Safeguard, and TMF control files
to tape and disk
Performing reloads
Cleaning up and restarting spoolers
Deleting saveabend files
Restarting local area networks (LANs) and SNAX lines
Performing disk decompressions using DCOM and DSAP
Saving TMF audit trail dumps to disk
Using Meascom to take regular system measurements
Backing up the EMS log
Performing dumps of line configurations using SCF
Collecting statistics on communications lines for analysis
Checking status of devices, processes, and applications
Starting up and shutting down applications
Problem Determination Steps
Problem determination steps should be automated to help you determine the cause of a
failure, for example, when a line goes down and an EMS event is generated. Automating
the steps necessary to determine how the problem occurred can reduce the time needed
to recover the object and get it functioning again. Section 3, “Recovering From
Unplanned Outages,” provides more information about this topic.