Managing HP Serviceguard A.11.20.20 for Linux, May 2013

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux ProLiant Cluster

271

272

273

274

275

276

277

278

279

280

With UDP datagram sockets, however, there is a problem. The client may connect to multiple

servers utilizing the relocatable IP address and sort out the replies based on the source IP address

in the server’s response message. However, the source IP address given in this response will be

the stationary IP address rather than the relocatable application IP address. Therefore, when

creating a UDP socket for listening, the application must always call bind(2) with the appropriate

relocatable application IP address rather than INADDR_ANY.

A.3.6.1 Call bind() before connect()

When an application initiates its own connection, it should first call bind(2), specifying the

application IP address before calling connect(2). Otherwise the connect request will be sent

using the stationary IP address of the system's outbound LAN interface rather than the desired

relocatable application IP address. The client will receive this IP address from the accept(2) call,

possibly confusing the client software and preventing it from working correctly.

A.3.7 Give Each Application its Own Volume Group

Use separate volume groups for each application that uses data. If the application doesn't use

disk, it is not necessary to assign it a separate volume group. A volume group (group of disks) is

the unit of storage that can move between nodes. The greatest flexibility for load balancing exists

when each application is confined to its own volume group, i.e., two applications do not share

the same set of disk drives. If two applications do use the same volume group to store their data,

then the applications must move together. If the applications’ data stores are in separate volume

groups, they can switch to different nodes in the event of a failover.

The application data should be set up on different disk drives and if applicable, different mount

points. The application should be designed to allow for different disks and separate mount points.

If possible, the application should not assume a specific mount point.

A.3.8 Use Multiple Destinations for SNA Applications

SNA is point-to-point link-oriented; that is, the services cannot simply be moved to another

system, since that system has a different point-to-point link which originates in the mainframe.

Therefore, backup links in a node and/or backup links in other nodes should be configured so

that SNA does not become a single point of failure. Note that only one configuration for an SNA

link can be active at a time. Therefore, backup links that are used for other purposes should be

reconfigured for the primary mission-critical purpose upon failover.

A.3.9 Avoid File Locking

In an NFS environment, applications should avoid using file-locking mechanisms, where the file to

be locked is on an NFS Server. File locking should be avoided in an application both on local

and remote systems. If local file locking is employed and the system fails, the system acting as the

backup system will not have any knowledge of the locks maintained by the failed system. This may

or may not cause problems when the application restarts.

Remote file locking is the worst of the two situations, since the system doing the locking may be

the system that fails. Then, the lock might never be released, and other parts of the application will

be unable to access that data. In an NFS environment, file locking can cause long delays in case

of NFS client system failure and might even delay the failover itself.

A.4 Restoring Client Connections

How does a client reconnect to the server after a failure?

It is important to write client applications to specifically differentiate between the loss of a connection

to the server and other application-oriented errors that might be returned. The application should

take special action in case of connection loss.

One question to consider is how a client knows after a failure when to reconnect to the newly

started server. The typical scenario is that the client must simply restart their session, or relog in.

However, this method is not very automated. For example, a well-tuned hardware and application

system may fail over in 5 minutes. But if users, after experiencing no response during the failure,

272 Designing Highly Available Cluster Applications