Managing HP Serviceguard for Linux Ninth Edition, April 2009

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux RH AS ProLiant Cluster

281

282

283

284

285

286

287

288

289

290

A Designing Highly Available Cluster Applications

This appendix describes how to create or port applications for high availability, with

emphasis on the following topics:

• Automating Application Operation

• Controlling the Speed of Application Failover (page 291)

• Designing Applications to Run on Multiple Systems (page 294)

• Restoring Client Connections (page 299)

• Handling Application Failures (page 300)

• Minimizing Planned Downtime (page 301)

Designing for high availability means reducing the amount of unplanned and planned

downtime that users will experience. Unplanned downtime includes unscheduled

events such as power outages, system failures, network failures, disk crashes, or

application failures. Planned downtime includes scheduled events such as scheduled

backups, system upgrades to new OS revisions, or hardware replacements.

Two key strategies should be kept in mind:

1. Design the application to handle a system reboot or panic. If you are modifying

an existing application for a highly available environment, determine what happens

currently with the application after a system panic. In a highly available

environment there should be defined (and scripted) procedures for restarting the

application. Procedures for starting and stopping the application should be

automatic, with no user intervention required.

2. The application should not use any system-specific information such as the

following if such use would prevent it from failing over to another system and

running properly:

• The application should not refer to uname() or gethostname().

• The application should not refer to the SPU ID.

• The application should not refer to the MAC (link-level) address.

Automating Application Operation

Can the application be started and stopped automatically or does it require operator

intervention?

This section describes how to automate application operations to avoid the need for

user intervention. One of the first rules of high availability is to avoid manual

intervention. If it takes a user at a terminal, console or GUI interface to enter commands

to bring up a subsystem, the user becomes a key part of the system. It may take hours

before a user can get to a system console to do the work necessary. The hardware in

question may be located in a far-off area where no trained users are available, the

systems may be located in a secure datacenter, or in off hours someone may have to

connect via modem.

Automating Application Operation 289