Managing HP Serviceguard for Linux, Sixth Edition, August 2006

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux

311

312

313

314

315

316

317

318

319

320

Designing Highly Available Cluster Applications

Restoring Client Connections

Appendix B314

Restoring Client Connections

How does a client reconnect to the server after a failure?

It is important to write client applications to specifically differentiate

between the loss of a connection to the server and other

application-oriented errors that might be returned. The application

should take special action in case of connection loss.

One question to consider is how a client knows after a failure when to

reconnect to the newly started server. The typical scenario is that the

client must simply restart their session, or relog in. However, this

method is not very automated. For example, a well-tuned hardware and

application system may fail over in 5 minutes. But if users, after

experiencing no response during the failure, give up after 2 minutes and

go for coffee and don't come back for 28 minutes, the perceived downtime

is actually 30 minutes, not 5. Factors to consider are the number of

reconnection attempts to make, the frequency of reconnection attempts,

and whether or not to notify the user of connection loss.

There are a number of strategies to use for client reconnection:

• Design clients which continue to try to reconnect to their failed

server.

Put the work into the client application rather than relying on the

user to reconnect. If the server is back up and running in 5 minutes,

and the client is continually retrying, then after 5 minutes, the client

application will reestablish the link with the server and either

restart or continue the transaction. No intervention from the user is

required.

• Design clients to reconnect to a different server.

If you have a server design which includes multiple active servers,

the client could connect to the second server, and the user would only

experience a brief delay.

The problem with this design is knowing when the client should

switch to the second server. How long does a client retry to the first

server before giving up and going to the second server? There are no

definitive answers for this. The answer depends on the design of the

server application. If the application can be restarted on the same

node after a failure (see “Handling Application Failures” following),