Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard Linux and Red Hat Global File System for RHEL5, October 2008

7
successfully. During this period application attempts any GFS file system related operations (e.g. a
file open) from any of the cluster node will hang. However, all other applications, that do not
perform any GFS file system, will not be impacted (so reads/writes from/to previously open files
will not hang). Upon confirmation, that the failed node is fenced, DLM and GFS perform recovery.
DLM releases locks of the failed node; GFS recovers the journal of the failed node.
SAN failure
In Red Hat Cluster, in the event of a storage access failure, another node in the cluster will
recovery GFS journals and release the DLM locks held by the failed node. However, the node -
that lost storage access - will not be fenced. Any attempt to access to files in the GFS mount points
from the failed cluster node will result in IO error. In order to have access to the GFS file system
when the FC link is set right, the node must be rebooted. Un-mounting the GFS mount point and
then mounting it again does not restore access to the file system.
HP Serviceguard for Linux and Red Hat GFS Co-existence
The concurrent deployment of HP Serviceguard for Linux and Red Hat GFS clusters must ensure
that the co-existence is stable. It is based on the criteria that, both the clusters, have the same
cluster membership i.e., same set of nodes as members of the cluster; during both failures and
normal cluster operations.
This ensures that in the event of a failure both the clusters, identifies and removes the same set of
failed nodes from the cluster, and proceeds to form the cluster with same set of nodes. Otherwise,
each cluster would remove the different set of members from the cluster, resulting in shutdown of
the whole cluster. Configurations that can lead to the shutdown of whole cluster are not supported
in the concurrent deployment of HP Serviceguard for Linux and Red Hat GFS clusters.
Also the concurrent deployment of HP Serviceguard for Linux and Red Hat GFS clusters must also
ensure that the two clusters don’t interfere with each other, especially with regard to ensuring data
integrity during failures.
In the concurrent deployment, Red Hat cluster’s failure detection is delayed until Serviceguard has
detected the failure, resolved quorum resolution, and removed failed nodes from the cluster. This is
depicted in the Figure 1.
Figure 1 - Red Hat cluster failure detection delayed
GFS suspended
Failure
Detection
GFS
recovery
Cluster gains
quorum
Fencin
g
nodes
Failure
Detection
Startin
g
application on
alternate node
Quorum
Resolution
Resetting
failed nodes
Application
startup
completed
Failure
Application startup hangs if GFS
attem
p
ts to ac
q
uire new locks
Red Hat
Cluster
Serviceguard
for Linux