Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard Linux and Red Hat Global File System for RHEL5, October 2008

file system is mounted before the Serviceguard package starts up. To prevent Serviceguard and

Red Hat clusters from having different membership at the time of a node start up, it is

recommended that the cluster services startup is enabled, machine startup time, for both

Serviceguard and Red Hat clusters.

Cluster startup sequence

In Red Hat cluster, as soon the newly formed nodes gain quorum, all the possible nodes (those

listed in cluster.conf) that are not currently cluster members are fenced. The user can configure a

delay (i.e., post_join_delay) to allow the remaining members to join the cluster without being

fenced. For example if post_join_delay is set to 60 seconds in a 10 node cluster, after 6 nodes

joined, the user needs to startup cluster services on the remaining 4 nodes with 60 seconds to

avoid fencing. This avoids unnecessarily fencing nodes in the common scenario where the user

start cluster services on one node at a time. It is assumed that the non-responsive node (since

cluster services is not started) could have DLM valid locks and are either hung or partitioned.

Hence in order to prevent these nodes from corrupting the file system they fenced before GFS is

used on the newly formed nodes.

Hence in a concurrent deployment, it is recommended that during the initial cluster startup,

Serviceguard cluster is started (i.e., using cmruncl), after ensuring that the Red Hat cluster is

already formed and the nodes configured in /etc/cluster/cluster.conf are its members. This is to

prevent any unnecessary cluster reformation in Serviceguard, which can occur, at the time of Red

Hat cluster startup where nodes could get fenced.

Operations that impact membership

It is important that, whenever an administrator takes an action that can affects the membership of

one cluster then the similar command for the other cluster should be performed on that same node.

For example, if a node is removed from the Serviceguard cluster, then same node should also be

removed from the Red Hat cluster. This is necessary to ensure that, both clusters have the same

membership information.

Some extra care needs to be taken when administering two node clusters. For example, if an

administrator halts only Serviceguard on a node and does not exclude that node from the Red Hat

GFS cluster, then a subsequent network partition or node failure may result in HP Serviceguard for

Linux and the applications it protects become unavailable. This would happen in case where the

only remaining Serviceguard node is chosen to be fenced by Red Hat GFS as a result of the

network partition. The converse is true as well.

Upgrading from RHEL4 to RHEL5

If your RHEL4 systems is setup with GFS file systems that uses the GULM (Grand Unified Lock

Manager) lock manager, you must convert the file systems to use the DLM lock manager. This is

because GULM is not supported in Red Hat Enterprise Linux 5.

Since online migration of GULM to DLM is not supported, applications and cluster services need

to be halted before the migrating from RHEL4 to RHEL5. Before starting the migration, the

Serviceguard packages and Serviceguard cluster are halted.

The RHEL4 to RHEL5 migration is a two-step process described below:

1. While running Red Hat Enterprise Linux 4, convert your GFS file systems to use the DLM

lock manager.

This involves stopping the Red Hat Cluster Services, removal of GULM XML elements from

the cluster configuration file, changes to the configuration file to follow RHEL5 format, and