Managing HP Serviceguard A.11.20.10 for Linux, December 2012

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux Cluster

251

252

253

254

255

256

257

258

259

260

running the application. After failover, if these data disks are filesystems, they must go through

filesystems recovery (fsck) before the data can be accessed. To help reduce this recovery time,

the smaller these filesystems are, the faster the recovery will be. Therefore, it is best to keep anything

that can be replicated off the data filesystem. For example, there should be a copy of the application

executables on each system rather than having one copy of the executables on a shared filesystem.

Additionally, replicating the application executables makes them subject to a rolling upgrade if

this is desired.

A.2.2 Evaluate the Use of a Journaled Filesystem (JFS)

If a file system must be used, a JFS offers significantly faster file system recovery than an HFS.

However, performance of the JFS may vary with the application. An example of an appropriate

JFS is the Reiser FS ( reiserfs is not supported in Serviceguard A.11.20.00.) or ext3 or ext4.

A.2.3 Minimize Data Loss

Minimize the amount of data that might be lost at the time of an unplanned outage. It is impossible

to prevent some data from being lost when a failure occurs. However, it is advisable to take certain

actions to minimize the amount of data that will be lost, as explained in the following discussion.

A.2.3.1 Minimize the Use and Amount of Memory-Based Data

Any in-memory data (the in-memory context) will be lost when a failure occurs. The application

should be designed to minimize the amount of in-memory data that exists unless this data can be

easily recalculated. When the application restarts on the standby node, it must recalculate or

reread from disk any information it needs to have in memory.

One way to measure the speed of failover is to calculate how long it takes the application to start

up on a normal system after a reboot. Does the application start up immediately? Or are there a

number of steps the application must go through before an end-user can connect to it? Ideally, the

application can start up quickly without having to reinitialize in-memory data structures or tables.

Performance concerns might dictate that data be kept in memory rather than written to the disk.

However, the risk associated with the loss of this data should be weighed against the performance

impact of posting the data to the disk.

Data that is read from a shared disk into memory, and then used as read-only data can be kept

in memory without concern.

A.2.3.2 Keep Logs Small

Some databases permit logs to be buffered in memory to increase online performance. Of course,

when a failure occurs, any in-flight transaction will be lost. However, minimizing the size of this

in-memory log will reduce the amount of completed transaction data that would be lost in case of

failure.

Keeping the size of the on-disk log small allows the log to be archived or replicated more frequently,

reducing the risk of data loss if a disaster were to occur. There is, of course, a trade-off between

online performance and the size of the log.

A.2.3.3 Eliminate Need for Local Data

When possible, eliminate the need for local data. In a three-tier, client/server environment, the

middle tier can often be dataless (i.e., there is no local data that is client specific or needs to be

modified). This “application server” tier can then provide additional levels of availability,

load-balancing, and failover. However, this scenario requires that all data be stored either on the

client (tier 1) or on the database server (tier 3).

A.2.4 Use Restartable Transactions

Transactions need to be restartable so that the client does not need to re-enter or back out of the

transaction when a server fails, and the application is restarted on another system. In other words,

if a failure occurs in the middle of a transaction, there should be no need to start over again from

A.2 Controlling the Speed of Application Failover 259