Clustering Linux Servers with the Concurrent Deployment of HP Serviceguard Linux and Red Hat Global File System for RHEL5, October 2008

11
file system is mounted before the Serviceguard package starts up. To prevent Serviceguard and
Red Hat clusters from having different membership at the time of a node start up, it is
recommended that the cluster services startup is enabled, machine startup time, for both
Serviceguard and Red Hat clusters.
Cluster startup sequence
In Red Hat cluster, as soon the newly formed nodes gain quorum, all the possible nodes (those
listed in cluster.conf) that are not currently cluster members are fenced. The user can configure a
delay (i.e., post_join_delay) to allow the remaining members to join the cluster without being
fenced. For example if post_join_delay is set to 60 seconds in a 10 node cluster, after 6 nodes
joined, the user needs to startup cluster services on the remaining 4 nodes with 60 seconds to
avoid fencing. This avoids unnecessarily fencing nodes in the common scenario where the user
start cluster services on one node at a time. It is assumed that the non-responsive node (since
cluster services is not started) could have DLM valid locks and are either hung or partitioned.
Hence in order to prevent these nodes from corrupting the file system they fenced before GFS is
used on the newly formed nodes.
Hence in a concurrent deployment, it is recommended that during the initial cluster startup,
Serviceguard cluster is started (i.e., using cmruncl), after ensuring that the Red Hat cluster is
already formed and the nodes configured in /etc/cluster/cluster.conf are its members. This is to
prevent any unnecessary cluster reformation in Serviceguard, which can occur, at the time of Red
Hat cluster startup where nodes could get fenced.
Operations that impact membership
It is important that, whenever an administrator takes an action that can affects the membership of
one cluster then the similar command for the other cluster should be performed on that same node.
For example, if a node is removed from the Serviceguard cluster, then same node should also be
removed from the Red Hat cluster. This is necessary to ensure that, both clusters have the same
membership information.
Some extra care needs to be taken when administering two node clusters. For example, if an
administrator halts only Serviceguard on a node and does not exclude that node from the Red Hat
GFS cluster, then a subsequent network partition or node failure may result in HP Serviceguard for
Linux and the applications it protects become unavailable. This would happen in case where the
only remaining Serviceguard node is chosen to be fenced by Red Hat GFS as a result of the
network partition. The converse is true as well.
Upgrading from RHEL4 to RHEL5
If your RHEL4 systems is setup with GFS file systems that uses the GULM (Grand Unified Lock
Manager) lock manager, you must convert the file systems to use the DLM lock manager. This is
because GULM is not supported in Red Hat Enterprise Linux 5.
Since online migration of GULM to DLM is not supported, applications and cluster services need
to be halted before the migrating from RHEL4 to RHEL5. Before starting the migration, the
Serviceguard packages and Serviceguard cluster are halted.
The RHEL4 to RHEL5 migration is a two-step process described below:
1. While running Red Hat Enterprise Linux 4, convert your GFS file systems to use the DLM
lock manager.
This involves stopping the Red Hat Cluster Services, removal of GULM XML elements from
the cluster configuration file, changes to the configuration file to follow RHEL5 format, and