Managing HP Serviceguard for Linux Ninth Edition, April 2009

To reduce the impact on users, the application should not simply abort in case of error,
since aborting would cause an unneeded failover to a backup system. Applications
should determine the exact error and take specific action to recover from the error
rather than, for example, aborting upon receipt of any error.
Controlling the Speed of Application Failover
What steps can be taken to ensure the fastest failover?
If a failure does occur causing the application to be moved (failed over) to another
node, there are many things the application can do to reduce the amount of time it
takes to get the application back up and running. The topics covered are as follows:
Replicate Non-Data File Systems
Use Raw Volumes
Evaluate the Use of a journaled file system
Minimize Data Loss
Use Restartable Transactions
Use Checkpoints
Design for Multiple Servers
Design for Replicated Data Sites
Replicate Non-Data File Systems
Non-data file systems should be replicated rather than shared. There can only be one
copy of the application data itself. It will be located on a set of disks that is accessed
by the system that is running the application. After failover, if these data disks are
filesystems, they must go through filesystems recovery (fsck) before the data can be
accessed. To help reduce this recovery time, the smaller these filesystems are, the faster
the recovery will be. Therefore, it is best to keep anything that can be replicated off the
data filesystem. For example, there should be a copy of the application executables on
each system rather than having one copy of the executables on a shared filesystem.
Additionally, replicating the application executables makes them subject to a rolling
upgrade if this is desired.
Evaluate the Use of a Journaled Filesystem (JFS)
If a file system must be used, a JFS offers significantly faster file system recovery than
an HFS. However, performance of the JFS may vary with the application. An example
of an appropriate JFS is the Reiser FS or ext3.
Minimize Data Loss
Minimize the amount of data that might be lost at the time of an unplanned outage. It
is impossible to prevent some data from being lost when a failure occurs. However, it
is advisable to take certain actions to minimize the amount of data that will be lost, as
explained in the following discussion.
Controlling the Speed of Application Failover 291