Migrating Symantec Veritas Cluster Server to HP Serviceguard for Linux

ManualsBrandsHP ManualsSoftwareHP Serviceguard for Linux

Technical white paper | Migrating Symantec Veritas Cluster Server to HP Serviceguard for Linux

The Quorum Server software is available for HP-UX 11i v2, v3, RHEL, and SUSE Linux Enterprise Server. The software can be

downloaded free from software.hp.com

. A single Quorum Server can serve multiple SG/LX and Serviceguard for HP-UX

clusters. Ideally, the Quorum Server must be configured in a separate service cluster for HA in fault domain different from

the fault domain of servers in the cluster serviced by them.

• Node fencing

A reboot is done if a cluster node cannot communicate with the majority of cluster members for a predetermined time,

split brain or under other circumstances such as a kernel hang or failure of the cluster daemon (cmcld). This reboot is

initiated by DEADMAN driver, which act as a robust fencing mechanism in SG/LX cluster. There is no manual configuration

needed to configure DEADMAN driver. The DEADMAN driver is a dynamically loadable kernel module that is compiled into

the kernel automatically when SG/LX is installed, but if you are planning to update Linux kernel, then it is required to

recompile the kernel with DEADMAN driver.

Networking in the SG/LX cluster

SG/LX uses one or more heartbeat networks to send heartbeat among all the nodes and to maintain the cluster

membership. SG/LX also uses the heartbeat network for communication between nodes. To build a resilient cluster it is

crucial to have resilient heartbeat network infrastructure. So it is recommended to configure multiple heartbeat networks

and use channel bonding technologies at link level.

Applications deployed in SG/LX cluster can use their own network in the cluster, for their client access. Such networks are

called client access networks. SG/LX can monitor and manage IP addresses used by applications on these networks. During

failover, SG/LX can move the application’s IP address on the client access network from the failed node to the target node.

Also, it is okay to configure the client access networks for heartbeat network traffic. SG/LX exchanges only small messages

and does not have demanding bandwidth requirements. So using client access networks for heartbeat does not affect

application accessibility or performance.

VCS overview

VCS is a HA solution from Symantec VCS (VCS) connects multiple, independent systems into a management framework for

increased availability. Each system or node runs its own OS and cooperates at the software level to form a cluster. VCS links

commodity hardware with software to provide application failover and control. When a node or a monitored application

fails, other nodes can take predefined actions to take over and bring up the failed application elsewhere in the cluster.

VCS cluster membership

VCS cluster uses two types of communications, intra system communications and inter system communications. In the intra

system communications, VCS uses protocol called “Inter Process Messaging (IPM)” to communicate with the GUI, the

command line and the agents. In the inter system communications, VCS uses the cluster interconnects for network

communication between cluster systems. The Group Membership Services/Atomic Broadcast protocol (GAB) is responsible

for cluster membership and reliable cluster communications.

When a system crashes or is powered off, it stops sending heartbeats to other systems in the cluster. By default, other

systems in the cluster wait 21 seconds before declaring it dead. The time of 21 seconds derives from 16 seconds default

timeout value for Low Latency Transport (LLT) peer inactive timeout (can be modified by users), plus 5 seconds default

value for GAB stable timeout (can be altered by users).

Each system in the cluster runs a kernel fencing module which is responsible for ensuring valid and current cluster

membership during a membership change through the process of membership arbitration. VCS cluster uses a special

service called the coordination point for membership arbitration. Coordination points provide a lock mechanism to

determine which nodes get to fence off data drives from other nodes. A node must eject a peer from the coordination point

before it can fence the peer from the data drive. The coordination point can be a disk or a server. VCS recommends

configuring three coordinator points. The kernel fencing module registers with the coordination points during normal

operation. At the time of cluster reformation, the kernel fencing module of each system races for the control of the

configured coordination points. VCS prevents split brain by allowing the winner partition to fence the ejected nodes from

accessing the data disks.