Managing HP Serviceguard for Linux, Tenth Edition, September 2012

issues the command ps -ef | grep xxx for all the processes belonging to the
application.
To reduce the impact on users, the application should not simply abort in case of error,
since aborting would cause an unneeded failover to a backup system. Applications
should determine the exact error and take specific action to recover from the error rather
than, for example, aborting upon receipt of any error.
Controlling the Speed of Application Failover
What steps can be taken to ensure the fastest failover?
If a failure does occur causing the application to be moved (failed over) to another node,
there are many things the application can do to reduce the amount of time it takes to get
the application back up and running. The topics covered are as follows:
Replicate Non-Data File Systems
Use Raw Volumes
Evaluate the Use of a journaled file system
Minimize Data Loss
Use Restartable Transactions
Use Checkpoints
Design for Multiple Servers
Design for Replicated Data Sites
Replicate Non-Data File Systems
Non-data file systems should be replicated rather than shared. There can only be one
copy of the application data itself. It will be located on a set of disks that is accessed by
the system that is running the application. After failover, if these data disks are filesystems,
they must go through filesystems recovery (fsck) before the data can be accessed. To
help reduce this recovery time, the smaller these filesystems are, the faster the recovery
will be. Therefore, it is best to keep anything that can be replicated off the data filesystem.
For example, there should be a copy of the application executables on each system
rather than having one copy of the executables on a shared filesystem. Additionally,
replicating the application executables makes them subject to a rolling upgrade if this is
desired.
Evaluate the Use of a Journaled Filesystem (JFS)
If a file system must be used, a JFS offers significantly faster file system recovery than an
HFS. However, performance of the JFS may vary with the application. An example of
an appropriate JFS is the Reiser FS or ext3.
Minimize Data Loss
Minimize the amount of data that might be lost at the time of an unplanned outage. It is
impossible to prevent some data from being lost when a failure occurs. However, it is
308 Designing Highly Available Cluster Applications