Managing HP Serviceguard A.11.20.20 for Linux, May 2013

With UDP datagram sockets, however, there is a problem. The client may connect to multiple
servers utilizing the relocatable IP address and sort out the replies based on the source IP address
in the server’s response message. However, the source IP address given in this response will be
the stationary IP address rather than the relocatable application IP address. Therefore, when
creating a UDP socket for listening, the application must always call bind(2) with the appropriate
relocatable application IP address rather than INADDR_ANY.
A.3.6.1 Call bind() before connect()
When an application initiates its own connection, it should first call bind(2), specifying the
application IP address before calling connect(2). Otherwise the connect request will be sent
using the stationary IP address of the system's outbound LAN interface rather than the desired
relocatable application IP address. The client will receive this IP address from the accept(2) call,
possibly confusing the client software and preventing it from working correctly.
A.3.7 Give Each Application its Own Volume Group
Use separate volume groups for each application that uses data. If the application doesn't use
disk, it is not necessary to assign it a separate volume group. A volume group (group of disks) is
the unit of storage that can move between nodes. The greatest flexibility for load balancing exists
when each application is confined to its own volume group, i.e., two applications do not share
the same set of disk drives. If two applications do use the same volume group to store their data,
then the applications must move together. If the applications’ data stores are in separate volume
groups, they can switch to different nodes in the event of a failover.
The application data should be set up on different disk drives and if applicable, different mount
points. The application should be designed to allow for different disks and separate mount points.
If possible, the application should not assume a specific mount point.
A.3.8 Use Multiple Destinations for SNA Applications
SNA is point-to-point link-oriented; that is, the services cannot simply be moved to another
system, since that system has a different point-to-point link which originates in the mainframe.
Therefore, backup links in a node and/or backup links in other nodes should be configured so
that SNA does not become a single point of failure. Note that only one configuration for an SNA
link can be active at a time. Therefore, backup links that are used for other purposes should be
reconfigured for the primary mission-critical purpose upon failover.
A.3.9 Avoid File Locking
In an NFS environment, applications should avoid using file-locking mechanisms, where the file to
be locked is on an NFS Server. File locking should be avoided in an application both on local
and remote systems. If local file locking is employed and the system fails, the system acting as the
backup system will not have any knowledge of the locks maintained by the failed system. This may
or may not cause problems when the application restarts.
Remote file locking is the worst of the two situations, since the system doing the locking may be
the system that fails. Then, the lock might never be released, and other parts of the application will
be unable to access that data. In an NFS environment, file locking can cause long delays in case
of NFS client system failure and might even delay the failover itself.
A.4 Restoring Client Connections
How does a client reconnect to the server after a failure?
It is important to write client applications to specifically differentiate between the loss of a connection
to the server and other application-oriented errors that might be returned. The application should
take special action in case of connection loss.
One question to consider is how a client knows after a failure when to reconnect to the newly
started server. The typical scenario is that the client must simply restart their session, or relog in.
However, this method is not very automated. For example, a well-tuned hardware and application
system may fail over in 5 minutes. But if users, after experiencing no response during the failure,
272 Designing Highly Available Cluster Applications