VERITAS Volume Manager 3.5 Administrator's Guide (September 2002)

Overview of Cluster Volume Management

246 VERITAS Volume Manager Administrator’s Guide

Overview of Cluster Volume Management

In recent years, tightly coupled cluster systems have become increasingly popular in the

realm of enterprise-scale mission-critical data processing. The primary advantage of

clusters is protection against hardware failure. Should the primary node fail or otherwise

become unavailable, applications can continue to run by transferring their execution to

standby nodes in the cluster. This ability to provide continuous availability of service by

switching to redundant hardware is commonly termed failover.

Another major advantage of clustered systems is their ability to reduce contention for

system resources caused by activities such as backup, decision support and report

generation. Businesses can derive enhanced value from their investment in cluster

systems by performing such operations on lightly loaded nodes in the cluster rather than

on the heavilyloaded nodesthat answerrequests for service. This abilityto performsome

operations on the lightly loaded nodes is commonly termed load balancing.

The cluster functionality of VxVM works together with the cluster monitor daemon that is

provided by the host operating system. The cluster monitor informs VxVM of changes in

cluster membership. Each node starts up independently and has its own cluster monitor

plus its own copies of the operating system and VxVM with support for cluster

functionality. When a node joins a cluster, it gains access to shared disks. When a node

leaves a cluster, it no longer has access to shared disks. A node joins a cluster when the

cluster monitor is started on that node.

“Example of a 4-Node Cluster” on page 247illustrates a simple cluster arrangement

consisting of four nodes with similar or identical hardware characteristics (CPUs, RAM

and host adapters), and conﬁgured with identical software (including the operating

system). The nodes are fully connected by a private network and they are also separately

connected to shared external storage (either disk arrays or JBODs: just a bunch of disks) via

SCSI or Fibre Channel. Each node has two independent paths to these disks, which are

conﬁgured in one or more cluster-shareable disk groups.

The private network allows the nodes to share information about system resources and

about each other’s state. Using the private network, any node can recognize which other

nodes are currently active, which are joining or leaving the cluster, and which have failed.

The private network requires at least two communication channels to provide

redundancy against one of the channels failing. If only one channel were used, its failure

would be indistinguishable from node failure—a condition known as network partitioning.