Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard Linux and Red Hat Global File System for RHEL5, October 2008

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux RH AS ProLiant Cluster

successfully. During this period application attempts any GFS file system related operations (e.g. a

file open) from any of the cluster node will hang. However, all other applications, that do not

perform any GFS file system, will not be impacted (so reads/writes from/to previously open files

will not hang). Upon confirmation, that the failed node is fenced, DLM and GFS perform recovery.

DLM releases locks of the failed node; GFS recovers the journal of the failed node.

SAN failure

In Red Hat Cluster, in the event of a storage access failure, another node in the cluster will

recovery GFS journals and release the DLM locks held by the failed node. However, the node -

that lost storage access - will not be fenced. Any attempt to access to files in the GFS mount points

from the failed cluster node will result in IO error. In order to have access to the GFS file system

when the FC link is set right, the node must be rebooted. Un-mounting the GFS mount point and

then mounting it again does not restore access to the file system.

HP Serviceguard for Linux and Red Hat GFS Co-existence

The concurrent deployment of HP Serviceguard for Linux and Red Hat GFS clusters must ensure

that the co-existence is stable. It is based on the criteria that, both the clusters, have the same

cluster membership i.e., same set of nodes as members of the cluster; during both failures and

normal cluster operations.

This ensures that in the event of a failure both the clusters, identifies and removes the same set of

failed nodes from the cluster, and proceeds to form the cluster with same set of nodes. Otherwise,

each cluster would remove the different set of members from the cluster, resulting in shutdown of

the whole cluster. Configurations that can lead to the shutdown of whole cluster are not supported

in the concurrent deployment of HP Serviceguard for Linux and Red Hat GFS clusters.

Also the concurrent deployment of HP Serviceguard for Linux and Red Hat GFS clusters must also

ensure that the two clusters don’t interfere with each other, especially with regard to ensuring data

integrity during failures.

In the concurrent deployment, Red Hat cluster’s failure detection is delayed until Serviceguard has

detected the failure, resolved quorum resolution, and removed failed nodes from the cluster. This is

depicted in the Figure 1.

Figure 1 - Red Hat cluster failure detection delayed

GFS suspended

Failure

Detection

GFS

recovery

Cluster gains

quorum

Fencin

nodes

Failure

Detection

Startin

application on

alternate node

Quorum

Resolution

Resetting

failed nodes

Application

startup

completed

Failure

Application startup hangs if GFS

attem

ts to ac

uire new locks

Red Hat

Cluster

Serviceguard

for Linux