Availability Guide for Application Design
Instrumenting an Application for Availability
Availability Guide for Application Design—525637-004
8-4
Writing Code to Handle Problem Errors
Benign conditions, such as a user mistyping a file name or entering some out-of-range
value, are taken care of simply by sending a message to the user.
Attempting Recovery
Many errors indicate some temporary loss of service, which can be corrected simply by
retrying the operation. The period and number of retries again depends on the error. If
the error persists or recurs, then it might be appropriate to inform a higher authority.
For example, the COBOL85 SEND verb has the potential to return many error codes.
On considering each possible error code, you might decide that server-class-frozen
errors, link-request-denied errors, and input/output errors are the only errors that could
be expected to arise and that some form of retry is appropriate for these errors. If the
error persists, you should consider whether it is appropriate for the process to
terminate.
Generally, recovery should be attempted if the loss of any resource occurs. Your
program should repeatedly retry associated errors until the resource reappears and
advise the user as appropriate for the application.
For some errors, it might be appropriate to retreat to a known safe point, such as a
transaction boundary. Having done so, your application might be able to continue
processing, or at least leave the database in a known, consistent state before
terminating.
For other errors, your application might be designed to continue with delayed or partial
functionality. For example, if your client process receives an error indicating that it can
no longer access the server, it can do as much processing as it can locally and send
the request to a queue file for later processing. Refer to Section 4, Data Protection and
Recovery, for an overview of queue files.
Gracefully Terminating the Process
For a highly available application, process termination should be considered as a last
resort. However, if an unexpected error occurs, other software faults are returned from
error detectors, potential data corruption errors occur, or recoveries repeatedly fail, the
correct action for your application is usually to terminate.
On termination, your application should create a saveabend file for further analysis and
alert a higher authority, such as a human or automated system operator that can
implement a speedy solution.
Writing Code to Handle Problem Errors
The application designer and programmer should arrange for the application to:
1. Recognise a specific problem error.
2. Decide whether to terminate or continue execution by looping and waiting.