Managing HP Serviceguard for Linux, Tenth Edition, September 2012

A Designing Highly Available Cluster Applications

This appendix describes how to create or port applications for high availability, with

emphasis on the following topics:

• Automating Application Operation

• Controlling the Speed of Application Failover (page 308)

• Designing Applications to Run on Multiple Systems (page 311)

• Restoring Client Connections (page 316)

• Handling Application Failures (page 317)

• Minimizing Planned Downtime (page 318)

Designing for high availability means reducing the amount of unplanned and planned

downtime that users will experience. Unplanned downtime includes unscheduled events

such as power outages, system failures, network failures, disk crashes, or application

failures. Planned downtime includes scheduled events such as scheduled backups, system

upgrades to new OS revisions, or hardware replacements.

Two key strategies should be kept in mind:

1. Design the application to handle a system reboot or panic. If you are modifying an

existing application for a highly available environment, determine what happens

currently with the application after a system panic. In a highly available environment

there should be defined (and scripted) procedures for restarting the application.

Procedures for starting and stopping the application should be automatic, with no

user intervention required.

2. The application should not use any system-specific information such as the following

if such use would prevent it from failing over to another system and running properly:

• The application should not refer to uname() or gethostname().

• The application should not refer to the SPU ID.

• The application should not refer to the MAC (link-level) address.

Automating Application Operation

Can the application be started and stopped automatically or does it require operator

intervention?

This section describes how to automate application operations to avoid the need for user

intervention. One of the first rules of high availability is to avoid manual intervention. If

it takes a user at a terminal, console or GUI interface to enter commands to bring up a

subsystem, the user becomes a key part of the system. It may take hours before a user

can get to a system console to do the work necessary. The hardware in question may

be located in a far-off area where no trained users are available, the systems may be

located in a secure datacenter, or in off hours someone may have to connect via modem.

306 Designing Highly Available Cluster Applications