Availability Guide for Application Design

Instrumenting an Application for Availability
Availability Guide for Application Design525637-004
8-5
Writing Code to Handle Problem Errors
When using the technique of looping and waiting, it is preferable to try to recover a
few times, over a period, but to reduce the frequency of retries if the first two or
three retries fail.
3. Tell appropriate people about the error (get help and warn users).
4. Recognise when the problem is repaired.
5. Know what processes in the application need restarting, and where (perhaps
based on the content of a configuration file or database).
6. Tell the users when they can resume work, and enable them to do so (the error or
its handling might have locked out some users).
For example, if a disk or disk file has no more space, a temporary resource problem
exists. The following questions need to be answered to properly cope with the
situation:
What file system error does the program look for to identify the problem?
Should the whole application abort, or should the server process enter a waiting
loop while checking to see if operations has fixed the problem (in this case, either
by raising the maximum number of extents for the file or by running an online
partition split [SQL only])?
If the application enters a waiting loop, how does it handle the wait?
What other processes are affected by the application?
Does the program need a communication strategy which will send messages to
other servers affected by this problem, and stop them, too?
Will those other processes be able to stop themselves independently by using the
same logic as the process encountering the problem?
Will users be able to do useful work while the problem exists?
°
Should the program tell the users about the error?
°
Which users need to know about the error?
How does the program tell the users?
Should the process send a message to the requester program to display on the
user screen that says “Transaction types … are currently unavailable - try later,
please”?
The program can do that in a terminal/Pathway TCP environment by using an
unsolicited message to the TCP. However, if the program is a client/server PC
application on a LAN, how can the program get the message to the PC?
Does the program's client/server design need to allow for such messages to the
end-user screen? Does it need a pop-up dialog error box?
How does the program communicate with the operations/technical support people?