Managing HP Serviceguard for Linux, Seventh Edition, July 2007

Designing Highly Available Cluster Applications
Controlling the Speed of Application Failover
Appendix B322
Controlling the Speed of Application Failover
What steps can be taken to ensure the fastest failover?
If a failure does occur causing the application to be moved (failed over) to
another node, there are many things the application can do to reduce the
amount of time it takes to get the application back up and running. The
topics covered are as follows:
Replicate Non-Data File Systems
Use Raw Volumes
Evaluate the Use of a journaled file system
Minimize Data Loss
Use Restartable Transactions
Use Checkpoints
Design for Multiple Servers
Design for Replicated Data Sites
Replicate Non-Data File Systems
Non-data file systems should be replicated rather than shared. There can
only be one copy of the application data itself. It will be located on a set of
disks that is accessed by the system that is running the application.
After failover, if these data disks are filesystems, they must go through
filesystems recovery (fsck) before the data can be accessed. To help
reduce this recovery time, the smaller these filesystems are, the faster
the recovery will be. Therefore, it is best to keep anything that can be
replicated off the data filesystem. For example, there should be a copy of
the application executables on each system rather than having one copy
of the executables on a shared filesystem. Additionally, replicating the
application executables makes them subject to a rolling upgrade if this is
desired.