Managing HP Serviceguard A.11.20.10 for Linux, December 2012

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux Cluster

251

252

253

254

255

256

257

258

259

260

A Designing Highly Available Cluster Applications

This appendix describes how to create or port applications for high availability, with emphasis on

the following topics:

• Automating Application Operation

• Controlling the Speed of Application Failover (page 258)

• Designing Applications to Run on Multiple Systems (page 261)

• Restoring Client Connections (page 264)

• Handling Application Failures (page 265)

• Minimizing Planned Downtime (page 266)

Designing for high availability means reducing the amount of unplanned and planned downtime

that users will experience. Unplanned downtime includes unscheduled events such as power

outages, system failures, network failures, disk crashes, or application failures. Planned downtime

includes scheduled events such as scheduled backups, system upgrades to new OS revisions, or

hardware replacements.

Two key strategies should be kept in mind:

1. Design the application to handle a system reboot or panic. If you are modifying an existing

application for a highly available environment, determine what happens currently with the

application after a system panic. In a highly available environment there should be defined

(and scripted) procedures for restarting the application. Procedures for starting and stopping

the application should be automatic, with no user intervention required.

2. The application should not use any system-specific information such as the following if such

use would prevent it from failing over to another system and running properly:

• The application should not refer to uname() or gethostname().

• The application should not refer to the SPU ID.

• The application should not refer to the MAC (link-level) address.

A.1 Automating Application Operation

Can the application be started and stopped automatically or does it require operator intervention?

This section describes how to automate application operations to avoid the need for user intervention.

One of the first rules of high availability is to avoid manual intervention. If it takes a user at a

terminal, console or GUI interface to enter commands to bring up a subsystem, the user becomes

a key part of the system. It may take hours before a user can get to a system console to do the

work necessary. The hardware in question may be located in a far-off area where no trained users

are available, the systems may be located in a secure datacenter, or in off hours someone may

have to connect via modem.

There are two principles to keep in mind for automating application relocation:

• Insulate users from outages.

• Applications must have defined startup and shutdown procedures.

You need to be aware of what happens currently when the system your application is running on

is rebooted, and whether changes need to be made in the application's response for high

availability.

A.1.1 Insulate Users from Outages

Wherever possible, insulate your end users from outages. Issues include the following:

• Do not require user intervention to reconnect when a connection is lost due to a failed server.

• Where possible, warn users of slight delays due to a failover in progress.

A.1 Automating Application Operation 257