Managing HP Serviceguard for Linux, Sixth Edition, August 2006

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux

291

292

293

294

295

296

297

298

299

300

Designing Highly Available Cluster Applications

Automating Application Operation

Appendix B 299

Define Application Startup and Shutdown

Applications must be restartable without manual intervention. If the

application requires a switch to be flipped on a piece of hardware, then

automated restart is impossible. Procedures for application startup,

shutdown and monitoring must be created so that the HA software can

perform these functions automatically.

To ensure automated response, there should be defined procedures for

starting up the application and stopping the application. In Serviceguard

these procedures are placed in the package control script. These

procedures must check for errors and return status to the HA control

software. The startup and shutdown should be command-line driven and

not interactive unless all of the answers can be predetermined and

scripted.

In an HA failover environment, HA software restarts the application on

a surviving system in the cluster that has the necessary resources, like

access to the necessary disk drives. The application must be restartable

in two aspects:

• It must be able to restart and recover on the backup system (or on

the same system if the application restart option is chosen).

• It must be able to restart if it fails during the startup and the cause

of the failure is resolved.

Application administrators need to learn to startup and shutdown

applications using the appropriate HA commands. Inadvertently

shutting down the application directly will initiate an unwanted failover.

Application administrators also need to be careful that they don't

accidently shut down a production instance of an application rather than

a test instance in a development environment.

A mechanism to monitor whether the application is active is necessary so

that the HA software knows when the application has failed. This may

be as simple as a script that issues the command ps -ef | grep xxx for

all the processes belonging to the application.

To reduce the impact on users, the application should not simply abort in

case of error, since aborting would cause an unneeded failover to a

backup system. Applications should determine the exact error and take

specific action to recover from the error rather than, for example,

aborting upon receipt of any error.