Migrating Symantec Veritas Cluster Server to HP Serviceguard for Linux

Technical white paper | Migrating Symantec Veritas Cluster Server to HP Serviceguard for Linux
The Quorum Server software is available for HP-UX 11i v2, v3, RHEL, and SUSE Linux Enterprise Server. The software can be
downloaded free from software.hp.com
. A single Quorum Server can serve multiple SG/LX and Serviceguard for HP-UX
clusters. Ideally, the Quorum Server must be configured in a separate service cluster for HA in fault domain different from
the fault domain of servers in the cluster serviced by them.
Node fencing
A reboot is done if a cluster node cannot communicate with the majority of cluster members for a predetermined time,
split brain or under other circumstances such as a kernel hang or failure of the cluster daemon (cmcld). This reboot is
initiated by DEADMAN driver, which act as a robust fencing mechanism in SG/LX cluster. There is no manual configuration
needed to configure DEADMAN driver. The DEADMAN driver is a dynamically loadable kernel module that is compiled into
the kernel automatically when SG/LX is installed, but if you are planning to update Linux kernel, then it is required to
recompile the kernel with DEADMAN driver.
Networking in the SG/LX cluster
SG/LX uses one or more heartbeat networks to send heartbeat among all the nodes and to maintain the cluster
membership. SG/LX also uses the heartbeat network for communication between nodes. To build a resilient cluster it is
crucial to have resilient heartbeat network infrastructure. So it is recommended to configure multiple heartbeat networks
and use channel bonding technologies at link level.
Applications deployed in SG/LX cluster can use their own network in the cluster, for their client access. Such networks are
called client access networks. SG/LX can monitor and manage IP addresses used by applications on these networks. During
failover, SG/LX can move the applications IP address on the client access network from the failed node to the target node.
Also, it is okay to configure the client access networks for heartbeat network traffic. SG/LX exchanges only small messages
and does not have demanding bandwidth requirements. So using client access networks for heartbeat does not affect
application accessibility or performance.
VCS overview
VCS is a HA solution from Symantec VCS (VCS) connects multiple, independent systems into a management framework for
increased availability. Each system or node runs its own OS and cooperates at the software level to form a cluster. VCS links
commodity hardware with software to provide application failover and control. When a node or a monitored application
fails, other nodes can take predefined actions to take over and bring up the failed application elsewhere in the cluster.
VCS cluster membership
VCS cluster uses two types of communications, intra system communications and inter system communications. In the intra
system communications, VCS uses protocol called “Inter Process Messaging (IPM)” to communicate with the GUI, the
command line and the agents. In the inter system communications, VCS uses the cluster interconnects for network
communication between cluster systems. The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible
for cluster membership and reliable cluster communications.
When a system crashes or is powered off, it stops sending heartbeats to other systems in the cluster. By default, other
systems in the cluster wait 21 seconds before declaring it dead. The time of 21 seconds derives from 16 seconds default
timeout value for Low Latency Transport (LLT) peer inactive timeout (can be modified by users), plus 5 seconds default
value for GAB stable timeout (can be altered by users).
Each system in the cluster runs a kernel fencing module which is responsible for ensuring valid and current cluster
membership during a membership change through the process of membership arbitration. VCS cluster uses a special
service called the coordination point for membership arbitration. Coordination points provide a lock mechanism to
determine which nodes get to fence off data drives from other nodes. A node must eject a peer from the coordination point
before it can fence the peer from the data drive. The coordination point can be a disk or a server. VCS recommends
configuring three coordinator points. The kernel fencing module registers with the coordination points during normal
operation. At the time of cluster reformation, the kernel fencing module of each system races for the control of the
configured coordination points. VCS prevents split brain by allowing the winner partition to fence the ejected nodes from
accessing the data disks.
6