Managing HP Serviceguard for Linux, Seventh Edition, July 2007

Designing Highly Available Cluster Applications

Controlling the Speed of Application Failover

Appendix B322

Controlling the Speed of Application Failover

What steps can be taken to ensure the fastest failover?

If a failure does occur causing the application to be moved (failed over) to

another node, there are many things the application can do to reduce the

amount of time it takes to get the application back up and running. The

topics covered are as follows:

• Replicate Non-Data File Systems

• Use Raw Volumes

• Evaluate the Use of a journaled file system

• Minimize Data Loss

• Use Restartable Transactions

• Use Checkpoints

• Design for Multiple Servers

• Design for Replicated Data Sites

Replicate Non-Data File Systems

Non-data file systems should be replicated rather than shared. There can

only be one copy of the application data itself. It will be located on a set of

disks that is accessed by the system that is running the application.

After failover, if these data disks are filesystems, they must go through

filesystems recovery (fsck) before the data can be accessed. To help

reduce this recovery time, the smaller these filesystems are, the faster

the recovery will be. Therefore, it is best to keep anything that can be

replicated off the data filesystem. For example, there should be a copy of

the application executables on each system rather than having one copy

of the executables on a shared filesystem. Additionally, replicating the

application executables makes them subject to a rolling upgrade if this is

desired.