Managing HP Serviceguard for Linux, Sixth Edition, August 2006

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux

301

302

303

304

305

306

307

308

309

310

Designing Highly Available Cluster Applications

Controlling the Speed of Application Failover

Appendix B302

Keep Logs Small

Some databases permit logs to be buffered in memory to increase online

performance. Of course, when a failure occurs, any in-flight transaction

will be lost. However, minimizing the size of this in-memory log will

reduce the amount of completed transaction data that would be lost in

case of failure.

Keeping the size of the on-disk log small allows the log to be archived or

replicated more frequently, reducing the risk of data loss if a disaster

were to occur. There is, of course, a trade-off between online performance

and the size of the log.

Eliminate Need for Local Data

When possible, eliminate the need for local data. In a three-tier,

client/server environment, the middle tier can often be dataless (i.e.,

there is no local data that is client specific or needs to be modified). This

“application server” tier can then provide additional levels of availability,

load-balancing, and failover. However, this scenario requires that all

data be stored either on the client (tier 1) or on the database server (tier

3).

Use Restartable Transactions

Transactions need to be restartable so that the client does not need to

re-enter or back out of the transaction when a server fails, and the

application is restarted on another system. In other words, if a failure

occurs in the middle of a transaction, there should be no need to start

over again from the beginning. This capability makes the application

more robust and reduces the visibility of a failover to the user.

A common example is a print job. Printer applications typically schedule

jobs. When that job completes, the scheduler goes on to the next job. If,

however, the system dies in the middle of a long job (say it is printing

paychecks for 3 hours), what happens when the system comes back up

again? Does the job restart from the beginning, reprinting all the

paychecks, does the job start from where it left off, or does the scheduler

assume that the job was done and not print the last hours worth of

paychecks? The correct behavior in a highly available environment is to

restart where it left off, ensuring that everyone gets one and only one

paycheck.