Managing HP Serviceguard for Linux Ninth Edition, April 2009

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux RH AS ProLiant Cluster

291

292

293

294

295

296

297

298

299

300

Avoid File Locking

In an NFS environment, applications should avoid using file-locking mechanisms,

where the file to be locked is on an NFS Server. File locking should be avoided in an

application both on local and remote systems. If local file locking is employed and the

system fails, the system acting as the backup system will not have any knowledge of

the locks maintained by the failed system. This may or may not cause problems when

the application restarts.

Remote file locking is the worst of the two situations, since the system doing the locking

may be the system that fails. Then, the lock might never be released, and other parts

of the application will be unable to access that data. In an NFS environment, file locking

can cause long delays in case of NFS client system failure and might even delay the

failover itself.

Restoring Client Connections

How does a client reconnect to the server after a failure?

It is important to write client applications to specifically differentiate between the loss

of a connection to the server and other application-oriented errors that might be

returned. The application should take special action in case of connection loss.

One question to consider is how a client knows after a failure when to reconnect to the

newly started server. The typical scenario is that the client must simply restart their

session, or relog in. However, this method is not very automated. For example, a

well-tuned hardware and application system may fail over in 5 minutes. But if users,

after experiencing no response during the failure, give up after 2 minutes and go for

coffee and don't come back for 28 minutes, the perceived downtime is actually 30

minutes, not 5. Factors to consider are the number of reconnection attempts to make,

the frequency of reconnection attempts, and whether or not to notify the user of

connection loss.

There are a number of strategies to use for client reconnection:

• Design clients which continue to try to reconnect to their failed server.

Put the work into the client application rather than relying on the user to reconnect.

If the server is back up and running in 5 minutes, and the client is continually

retrying, then after 5 minutes, the client application will reestablish the link with

the server and either restart or continue the transaction. No intervention from the

user is required.

• Design clients to reconnect to a different server.

If you have a server design which includes multiple active servers, the client could

connect to the second server, and the user would only experience a brief delay.

The problem with this design is knowing when the client should switch to the

second server. How long does a client retry to the first server before giving up and

going to the second server? There are no definitive answers for this. The answer

depends on the design of the server application. If the application can be restarted

on the same node after a failure (see “Handling Application Failures ” following),

the retry to the current server should continue for the amount of time it takes to

Restoring Client Connections 299