Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters, 1st Edition, December 2006 (B7660-90019)

ManualsBrandsHP ManualsSoftwareHP Serviceguard Continentalclusters

Table Of Contents

1 Designing a Metropolitan Cluster

Designing a Disaster Tolerant Architecture for use with Metrocluster Products
- Single Data Center
- Two Data Centers and Third Location with Arbitrator(s)
  - Figure11 Two Data Centers and Third Location with Arbitrators

Arbitrator Node Configuration Rules

Disk Array Data Replication Configuration Rules

Calculating a Cluster Quorum

Example Failover Scenarios with One Arbitrator

Figure12 Failover Scenario with a Single Arbitrator

Example Failover Scenarios with Two Arbitrators

Figure13 Failover Scenario with Two Arbitrators

Worksheets

Next Steps

2 Designing a Continental Cluster

Designing a Disaster Tolerant Architecture for use with ContinentalClusters

Physical Data Replication using Special Environment files

Multiple Recovery Pairs in a Continental Cluster

Figure24 Multiple Recovery Pair Configuration in a Continental Cluster

Highly Available Wide Area Networking

Data Center Processes

ContinentalClusters Worksheets

Preparing the Clusters

Building the Continentalclusters Configuration

Testing the Continental Cluster

Switching to the Recovery Packages in Case of Disaster

Forcing a Package to Start

Restoring Disaster Tolerance

Maintaining a Continental Cluster

Support for Oracle RAC Instances in a Continentalclusters Environment

Figure213 ContinentalClusters Configuration Files in a Recovery Pair with RAC Support

Oracle Clusterware;clusterware

Initial Startup of Oracle RAC Instance in a Continentalclusters Environment

Failover of Oracle RAC Instances to the Recovery Site

Failback of Oracle RAC Instances After a Failover

3 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP

Files for Integrating XP Disk Arrays with Serviceguard Clusters

Overview of Continuous Access XP Concepts

Creating the Cluster

Preparing the Cluster for Data Replication

Configuring Packages for Disaster Recovery

Completing and Running a Metrocluster Solution with Continuous Access XP

Completing and Running a Continental Cluster Solution with Continuous Access XP

4 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA

Files for Integrating the EVA with Serviceguard Clusters

Overview of EVA and Continuous Access EVA Concepts

Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA

Deleting a Management Server

Defining EVA Storage Cells and DR Groups

Verifying the EVA Configuration

Figure42 EVA Configuration Checklist

Configuring Volume Groups

Building a Metrocluster Solution with Continuous Access EVA

Completing and Running a Continental Cluster Solution with Continuous Access EVA

5 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF

Files for Integrating Serviceguard with EMC SRDF

Overview of EMC and SRDF Concepts

Figure51 EMC R1 and R2 Definitions

Preparing the Cluster for Data Replication

Building a Metrocluster Solution with EMC SRDF

Maintaining a Cluster that uses Metrocluster with EMC SRDF

Managing Business Continuity Volumes

R1/R2 Swapping

Some Further Points

Metrocluster with SRDF/Asynchronous Data Replication

Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous

Building a Continental Cluster Solution with EMC SRDF

6 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture

B Environment File Variables for Metrocluster Continuous Access EVA

C Environment File Variables for Metrocluster with EMC SRDF

D Configuration File Parameters for Continentalclusters

E Continentalclusters Command and Daemon Reference

Glossary

Designing Disaster Tolerant HA Clusters

Using Metrocluster and

Continentalclusters

Manufacturing Part Number: B7660-90019

December 2006

Summary of content (470 pages)

PAGE 1
Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters Manufacturing Part Number: B7660-90019 December 2006
PAGE 2
Legal Notices © Copyright 2006 Hewlett-Packard Development Company, L.P. Publication Date: 2006 Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
PAGE 3
Contents 1. Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products . . . . Single Data Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Two Data Centers and Third Location with Arbitrator(s) . . . . . . . . . . . . . . . . . . . . . Worksheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Disaster Tolerant Checklist .
PAGE 4
Contents Building the Continentalclusters Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Preparing Security Files. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 Creating the Monitor Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Editing the ContinentalClusters Configuration File . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 5
Contents Next Steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Support for Oracle RAC Instances in a Continentalclusters Environment . . . . . . . . Configuring the Environment for Continentalclusters to Support Oracle RAC . . . Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 6
Contents 4. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Files for Integrating the EVA with Serviceguard Clusters . . . . . . . . . . . . . . . . . . . . . Overview of EVA and Continuous Access EVA Concepts . . . . . . . . . . . . . . . . . . . . . . Data Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Copy Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 7
Contents 5. Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Files for Integrating Serviceguard with EMC SRDF. . . . . . . . . . . . . . . . . . . . . . . . . . Overview of EMC and SRDF Concepts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Preparing the Cluster for Data Replication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Installing the Necessary Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 8
Contents Setting up a Recovery Package on the Recovery Cluster . . . . . . . . . . . . . . . . . . . . . Setting up the Continental Cluster Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . Switching to the Recovery Cluster in Case of Disaster . . . . . . . . . . . . . . . . . . . . . . Failback Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Maintaining the EMC SRDF Data Replication Environment . . . . . . . . . . . . . . . . .
PAGE 9
Contents E. Continentalclusters Command and Daemon Reference Glossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
PAGE 10
Contents 10
PAGE 11
Printing History Table 1 Editions and Releases Printing Date Part Number Edition Operating System Releases (see Note below) December 2006 B7660-90019 Edition 1 HP-UX 11i v1 and 11i v2 The printing date and part number indicate the current edition. The printing date changes when a new edition is printed. (Minor corrections and updates which are incorporated at reprint do not cause the date to change.) The part number changes when extensive technical changes are incorporated.
PAGE 12
PAGE 13
Preface This edition divides the contents of the former guide, “Designing Disaster Tolerant High Availability Clusters”, into the following two separate guides: • Understanding and Designing Serviceguard Disaster Tolerant Architectures • Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters The Understanding and Designing Serviceguard Disaster Tolerant Architectures guide provides an overview of Hewlett-Packard Disaster Tolerant high availability cluster technologies and how
PAGE 14
• Chapter 3, Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP, shows how to integrate physical data replication via Continuous Access XP with metropolitan and continental clusters. • Chapter 4, Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA, shows how to integrate physical data replication via Continuous Access EVA with metropolitan and continental clusters.
PAGE 15
Guide to Disaster Tolerant Solutions Documentation Use the following table as a guide for locating specific Disaster Tolerant Solutions documentation: Table 2 Disaster Tolerant Solutions Document Road Map To Set up Read Extended Distance Cluster for Serviceguard/ Serviceguard Extension for RAC Understanding and Designing Serviceguard Disaster Tolerant Architectures Metrocluster with Continuous Access XP Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster T
PAGE 16
Table 2 Disaster Tolerant Solutions Document Road Map (Continued) To Set up Metrocluster with EMC SRDF Read Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters Continental Cluster • Chapter 1: Designing a Metropolitan Cluster • Chapter 5: Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Unde
PAGE 17
Table 2 Disaster Tolerant Solutions Document Road Map (Continued) To Set up Continental Cluster using Continuous Access EVA data replication Continental Cluster using EMC SRDF data replication Continental Cluster using other data replication Read Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 2: Designing a Conti
PAGE 18
Table 2 Disaster Tolerant Solutions Document Road Map (Continued) To Set up Three Data Center Architecture Read Understanding and Designing Serviceguard Disaster Tolerant Architectures • Chapter 1: Disaster Tolerance and Recovery in a Serviceguard Cluster Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters • Chapter 1: Designing a Metropolitan Cluster • Chapter 2: Designing a Continental Cluster • Chapter 6: Designing a Disaster Tolerant Solution Using the Three Data
PAGE 19
Related Publications The following documents contain additional useful information: • Clusters for High Availability: a Primer of HP Solutions, Second Edition.
PAGE 20
PAGE 21
Designing a Metropolitan Cluster 1 Designing a Metropolitan Cluster This chapter describes the configuration and management of a basic metropolitan cluster through the following topics: • Designing a Disaster Tolerant Architecture for use with Metrocluster Products • Single Data Center • Two Data Centers and Third Location with Arbitrator(s) • Package Configuration Worksheet • Disaster Tolerant Checklist • Cluster Configuration Worksheet • Next Steps In addition, this chapter outlines the ge
PAGE 22
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Designing a Disaster Tolerant Architecture for use with Metrocluster Products Metrocluster is designed for use in a metropolitan cluster or metropolitan cluster environment within the 100 km distance limit. All nodes must be members of a single Serviceguard cluster. Two configurations are supported: • A single data center without arbitrators (not disaster tolerant.
PAGE 23
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Two Data Centers and Third Location with Arbitrator(s) This is the recommended and supported disaster tolerant architecture for use with Metropolitan cluster. This architecture consists of two main data centers with an equal number of nodes and a third location with one or more arbitrator nodes or a quorum server node. Figure 1-1.
PAGE 24
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products For packages A and B, data is written to PVOLs on the array in Data Center A and replicated to SVOLs on the array in Data Center B. Likewise the XP disk array in Data Center B is the primary or main disk array for packages C and D, and the secondary or remote for packages A and B.
PAGE 25
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Table 1-1 Chapter 1 Supported System and Data Center Combinations (Continued) Data Center A Data Center B Data Center C Serviceguard Version 2 2 Quorum Server System A. 11.13 or later 3 3 1 Arbitrator Node A. 11.13 or later 3 3 2* Arbitrator Nodes A. 11.13 or later 3 3 Quorum Server System A.11.13 or later 4 4 1 Arbitrator Node A.11.
PAGE 26
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Table 1-1 Supported System and Data Center Combinations (Continued) Data Center A Data Center B Data Center C Serviceguard Version 7 7 2* Arbitrator Nodes A.11.13 or later 7 7 Quorum Server System A.11.13 or later 8 8 Quorum Server System A.11.
PAGE 27
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Arbitrator Node Configuration Rules Although you can use one arbitrator, having two arbitrators provides greater flexibility in taking systems down for planned outages as well as providing better protection against multiple points of failure. Using two arbitrators: • Provides local failover capability to applications running on the arbitrator.
PAGE 28
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products For Continuous Access XP, when using bi-directional configurations, where data center A backs up data center B and data center B backs up data center A, you must have at least four Continuous Access links, two in each direction. Four Continuous Access links are also required in uni-directional configurations in which to allow failback.
PAGE 29
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Figure 1-2 Failover Scenario with a Single Arbitrator node 1 node 3 pkg A A C C1 A1 Continuous Access links D1 node 2 B pkg B Data Center A B1 D arbitrator 1 pkg C node 4 pkg D Data Center B Third Location The scenarios in Table 1-2, based on Figure 1-2, illustrate possible results if one or more nodes fail in a configuration with a single arbitrator.
PAGE 30
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Table 1-2 Node Failure Scenarios with One Arbitrator (Continued) Failure Quorum Result data center A (nodes 1 and 2) 3 of 5 (60%) pkg A and B switch to data center B data center A, then arbitrator 1 2 of 3 (67%) pkg A and B switch, then no change data center A and arbitrator 1 2 of 5 (40%) cluster halts* data center A, then arbitrator 1, then node 3 1 of 2 (50%) cluster halts* arbit
PAGE 31
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Figure 1-3 Failover Scenario with Two Arbitrators node 1 pkg A node 2 node 3 A C C1 A1 D1 Continuous Access links B1 B pkg B Data Center A pkg C node 4 D arbitrator 1 arbitrator 2 pkg D Data Center B Third Location The scenarios in Table 1-3 illustrate possible results if a data center or one or more nodes fail in a configuration with two arbitrators.
PAGE 32
Designing a Metropolitan Cluster Designing a Disaster Tolerant Architecture for use with Metrocluster Products Table 1-3 Node Failure Scenarios with Two Arbitrators (Continued) Failure Quorum Result node 3, then data center A 3 of 5 (60%) pkg A and B switch to data center B data center B 4 of 6 (67%) pkg C and D switch to data center A third location 4 of 6 (67%) no change * Cluster can be manually started with the remaining node.
PAGE 33
Designing a Metropolitan Cluster Worksheets Worksheets Disaster Tolerant Checklist Use this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for a two main data centers and a third location configuration. Figure 1-4 Disaster Tolerant Checklist Data centers A and B have the same number of nodes to maintain quorum in case an entire data center fails.
PAGE 34
Designing a Metropolitan Cluster Worksheets Cluster Configuration Worksheet Use this cluster configuration worksheet either in place of, or in addition to the worksheet provided in the Managing Serviceguard user’s guide. If you have already completed a Serviceguard cluster configuration worksheet, you only need to complete the first part of this worksheet.
PAGE 35
Designing a Metropolitan Cluster Worksheets Package Configuration Worksheet Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the Managing Serviceguard user’s guide. If you have already completed an Serviceguard package configuration worksheet, you only need to complete the first part of this worksheet.
PAGE 36
Designing a Metropolitan Cluster Worksheets Figure 1-7 Package Control Script Worksheet Package Control Script Data 36 VG[0]: LV[0]: FS[0]: FS_MOUNT_OPT[0]: VG[1]: LV[1]: FS[1]: FS_MOUNT_OPT[1]: VG[2]: LV[2]: FS[2]: FS_MOUNT_OPT[2]: VXVM_DG[0]: LV[0]: FS[0]: FS_MOUNT_OPT[0]: VXVM_DG[1]: LV[1]: FS[1]: FS_MOUNT_OPT[1]: VXVM_DG[2]: LV[2]: FS[2]: FS_MOUNT_OPT[2]: IP[0]: SUBNET[0]: IP[1]: SUBNET[1]: X.
PAGE 37
Designing a Metropolitan Cluster Next Steps Next Steps To implement the metropolitan cluster design, use the procedures in the following sections below: Chapter 1 • “Completing and Running a Continental Cluster Solution with Continuous Access XP” on page 209 • “Building a Continental Cluster Solution with EMC SRDF” on page 343 • “Building a Metrocluster Solution with Continuous Access EVA” on page 263 37
PAGE 38
Designing a Metropolitan Cluster Next Steps 38 Chapter 1
PAGE 39
Designing a Continental Cluster 2 Designing a Continental Cluster Unlike metropolitan and campus clusters, which have a single-cluster architecture, a continental cluster uses multiple Serviceguard clusters to provide application recovery over local or wide area network (LAN and WAN).
PAGE 40
Designing a Continental Cluster Understanding Continental Cluster Concepts Understanding Continental Cluster Concepts The ContinentalClusters product provides the ability to monitor a high availability cluster and fail over mission critical applications to another cluster if the monitored cluster should become unavailable.
PAGE 41
Designing a Continental Cluster Understanding Continental Cluster Concepts Two packages are running on the cluster in Los Angeles, and their data is replicated to the cluster in New York. Physical data replication is carried out using ESCON (Enterprise Storage Connect) links between the disk array hardware in New York and Los Angeles via an ESCON/WAN converter at each end. The New York cluster is running a monitor that checks the status of the Los Angeles cluster.
PAGE 42
Designing a Continental Cluster Understanding Continental Cluster Concepts Figure 2-2 Sample Mutual Recovery Configuration New York Cluster NYnode1 Highly Available Network Los Angeles Cluster monitor NYnode2 Disk Array salespkg WAN Data Replication Links LAnode1 monitor LAnode2 Disk Array custpkg ESCON/WAN converter In the above figure, the salespkg is running on the New York cluster and can be recovered by the Los Angeles cluster.
PAGE 43
Designing a Continental Cluster Understanding Continental Cluster Concepts Application Recovery in a Continental Cluster If a given cluster in a recovery pair of a continental cluster should become unavailable, ContinentalClusters allows an administrator to issue a single command, cmrecovercl (described later) to transfer mission critical applications from that cluster to another cluster, making sure that the packages do not run on both clusters at the same time.
PAGE 44
Designing a Continental Cluster Understanding Continental Cluster Concepts • Verify that the monitored cluster has failed • Issue the cluster recovery command Monitoring over a Network A monitor package running on one cluster tracks the health of another cluster in the recovery pair and sends notification to configured destinations if the state of the monitored cluster changes. (If a cluster contains any packages to be recovered it must be monitored.
PAGE 45
Designing a Continental Cluster Understanding Continental Cluster Concepts with regard to both the monitored cluster and the network. However, in many cases the causes of cluster events are indeterminate without additional information that is not available to the software.
PAGE 46
Designing a Continental Cluster Understanding Continental Cluster Concepts Table 2-1 Monitored States and Possible Causes (Continued) Cluster Event (Old state -> New state) NOTE Cluster-related Causes Network-related Causes Unreachable -> Error Serviceguard version or security file mismatch, software error Network problem was fixed, but the error condition still exists Down -> Up Cluster started No network problems Unreachable -> Up Cluster nodes were rebooted and the cluster started Network ca
PAGE 47
Designing a Continental Cluster Understanding Continental Cluster Concepts When problematic cluster events persist, obtain as much information as possible, including authorization to recover, if your business practices require this, and then issue the Continentalclusters recovery command, cmrecovercl. How Notifications Work A central part of the operation of Continentalclusters is the transmission of notifications following the detection of a cluster event.
PAGE 48
Designing a Continental Cluster Understanding Continental Cluster Concepts Notification that a cluster has been in an unreachable state for a short period of time. An alert is sent in this case as a warning that an alarm might be issued later if the cluster’s state remains unreachable for a longer time. The expected process in dealing with alerts is to continue watching for additional notifications and to contact individuals at the site of the monitored cluster to see whether problems exist.
PAGE 49
Designing a Continental Cluster Understanding Continental Cluster Concepts Creating Notifications for Events that Indicate a Return of Service For those events that indicate that the cluster is back online or that communication with the monitor has been restored, use cluster alerts to show the de-escalation of concern. In this case, use a CLUSTER_ALERT line in the configuration file with a time of zero (0), so that notifications are sent as soon as the return to service is detected.
PAGE 50
Designing a Continental Cluster Understanding Continental Cluster Concepts NOTE After a recovery, it is not possible to reverse directions and return a package to its original cluster without first reconfiguring the data replication hardware and/or software and synchronizing data. Therefore, be very cautious when deciding to use the cmrecovercl command. It is for this reason, HP recommends that stringent procedures and processes are in place to aid in making the decision to complete a recovery process.
PAGE 51
Designing a Continental Cluster Understanding Continental Cluster Concepts To keep packages from starting up automatically, when a cluster starts, set the AUTO_RUN (PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) parameter for all primary and recovery packages to NO. Then use the cmmodpkg command with the -e option to start up only the primary packages and enable switching.
PAGE 52
Designing a Continental Cluster Understanding Continental Cluster Concepts How Serviceguard commands work in a Continentalclusters Continentalclusters packages are manipulated manually by the user via Serviceguard commands and by cmcld automatically in the same way as any other packages. In a continental cluster the recovery package are not allowed to run at the same time as the primary, data sender, or data receiver packages.
PAGE 53
Designing a Continental Cluster Understanding Continental Cluster Concepts Table 2-2 Commands Serviceguard and ContinentalClusters Commands (Continued) How the commands work in Serviceguard How the commands work in Continentalclusters cmmodpkg -e enable switching attribute for a highly available package Will not enable switching on a recovery package if any of the primary, data receiver, or data sender package in the same recovery group is running or enabled.
PAGE 54
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Designing a Disaster Tolerant Architecture for use with ContinentalClusters A recovery pair in a continental cluster consists of two Serviceguard clusters. One functions as a primary cluster and the other functions as recovery cluster for a specific application. Prior to Continentalclusters version A.05.00, one recovery pair can be configured in a continental cluster.
PAGE 55
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Serviceguard Clusters Each Serviceguard cluster in a continental cluster provides high availability for an application at the local level at that particular site. For optimal performance and to assure adequate capacity on the recovery cluster, it is best to have similar hardware on both clusters.
PAGE 56
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters necessary to purchase either the Metrocluster with Continuous Access XP, or Metrocluster with Continuous Access EVA, or Metrocluster with EMC SRDF products separately. White papers describing specific implementations are also available at www.docs.hp.
PAGE 57
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Table 2-3 Data Replication and ContinentalClusters (Continued) Replication Type How it Works ContinentalClusters Implication Logical Filesystem Replication Writes to the filesystem on the primary cluster and are duplicated periodically on the recovery cluster. CPU issues are the same as for Logical Database Replication. The software may have to be managed as a separate Serviceguard package.
PAGE 58
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Physical Data Replication using Special Environment files For physical data replication Continentalclusters uses pre-integrated solutions, which uses Continuous Access XP, Continuous Access EVA and EMC SRDF.
PAGE 59
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Data replication needs to be setup to allow for copying data from each primary cluster to the common recovery cluster. Each recovery pair should have its own data replication link. Different storage areas need to be configured with the common recovery cluster to receive data replicated from each primary clusters.
PAGE 60
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Highly Available Wide Area Networking Disaster tolerant networking for Continentalclusters is directly tied to the data replication method. In addition to the reliability of the redundant lines connecting the remote nodes, it is important to consider what bandwidth is needed to support the data replication method that has been chosen.
PAGE 61
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters ContinentalClusters Worksheets Planning is an essential effort in creating a robust continental cluster environment. It is recommended to record the details of your configuration on planning worksheets. These worksheets can be filled in partially before configuration begins, and then completed as you build the continental cluster.
PAGE 62
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Recovery Cluster Name: ___________________________________________ Data Center Name and Location: ___________________________________ Main Contact: ____________________________________________________ Phone Number: ____________________________________________________ Beeper: __________________________________________________________ Email Address: ___________________________________________________ N
PAGE 63
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Primary Cluster/Package Name:_____________________________________ Data Sender Cluster/Package Name:_________________________________ Recovery Cluster/Package Name:____________________________________ Data Receiver Cluster/Package Name:_______________________________ Recovery Group Data: Recovery Group Name: _____________________________________________ Primary Cluster/Package Name:___________________
PAGE 64
Designing a Continental Cluster Designing a Disaster Tolerant Architecture for use with ContinentalClusters Notification:_____________________________________________________ Notification:_____________________________________________________ DOWN: Alert Interval:___________________________________________________ Notification:_____________________________________________________ Notification:_____________________________________________________ UP: Alert Interval:__________________________________________
PAGE 65
Designing a Continental Cluster Preparing the Clusters Preparing the Clusters The steps for configuring the clusters, needed by ContinentalClusters, are as follows: • Set up and test data replication between the sites. • Configure each cluster for Serviceguard operation. Setting up and Testing Data Replication Depending on which data replication method you choose, it can take a week or more to set up and test a data replication method.
PAGE 66
Designing a Continental Cluster Preparing the Clusters If the data replication software is separate from the application itself, then a separate Serviceguard package should be created for it. Some kinds of logical data replication require that a data receiver package be running on the recovery cluster at all times.
PAGE 67
Designing a Continental Cluster Preparing the Clusters Continentalclusters. See the Continentalclusters Release Notes for specifics on your versions requirements. Coordinate with the recovery site to make sure the same versions and patches are installed at both sites. 2. Set up all cabling, being sure to provide redundant disk storage links and network connections. 3. Configure the disks and filesystems. Set up data replication (logical or physical). 4.
PAGE 68
Designing a Continental Cluster Preparing the Clusters NOTE If you are configuring Oracle RAC instances in Serviceguard packages in a CFS or CVM environment, do not specify the CVM_DISK_GROUPS, and CVM_ACTIVATION_CMD fields in the package control scripts as CVM disk group manipulation is addressed by the disk group multi-node package. The primary cluster is shown in Figure 2-5.
PAGE 69
Designing a Continental Cluster Preparing the Clusters b. If this is an existing cluster, determine whether it is necessary to add disks for data replication. This is needed to ensure that there is enough capacity from system resources to run all packages if applications fail over to the other cluster. If not, either add nodes to the existing cluster, or move less critical packages to another cluster. 2. For new clusters, install minimum required versions of HP-UX and Serviceguard.
PAGE 70
Designing a Continental Cluster Preparing the Clusters file. This will ensure that the recovery packages will not start automatically when the recovery cluster forms, but only when the cmrecovercl command is issued. The following elements should be the same in the package configuration for both the primary and recovery packages: c. NOTE • Package services • Failfast settings Modify the package control script (salespkg_bak.
PAGE 71
Designing a Continental Cluster Preparing the Clusters 8. If you are using logical data replication, configure, apply, and test the data receiver package if one is needed. 9. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in the Managing Serviceguard user’s guide. Standard Serviceguard package customizations include modifying the VG, LV, FS, IP, SUBNET, SERVICE_NAME, SERVICE_CMD and SERVICE_RESTART parameters.
PAGE 72
Designing a Continental Cluster Building the Continentalclusters Configuration Building the Continentalclusters Configuration If necessary, use the swinstall command to install the ContinentalClusters product on all nodes in both clusters. Then create the Continentalclusters configuration using the following steps: • Prepare the security files. • Create the monitor package on each cluster containing a recovery package.
PAGE 73
Designing a Continental Cluster Building the Continentalclusters Configuration nynode2.myco.com root Also, be sure to create the /etc/opt/cmom/cmomhosts file on all nodes. This file allows nodes that are running monitor packages and Continentalclusters commands to obtain information from other nodes about the health of each cluster. The file must contain entries that allow access to all nodes in the continental cluster by the nodes where monitors and Continentalclusters commands are running.
PAGE 74
Designing a Continental Cluster Building the Continentalclusters Configuration all All hosts are allowed access. domain Hosts whose names match, or end in, this string are allowed access, for example, hp.com. hostname The named host (for example, kitcat.myco.com) is allowed access. IP address Either a full IP address, or a partial IP address of 1 to 3 bytes for subnet inclusion is allowed. network/netmask This pair of addresses allows more precise inclusion of hosts, (for example, 10.163.121.23/225.
PAGE 75
Designing a Continental Cluster Building the Continentalclusters Configuration Creating the Monitor Package The ContinentalClusters monitoring software is configured as a Serviceguard package so that it remains highly available.
PAGE 76
Designing a Continental Cluster Building the Continentalclusters Configuration b. AUTO_RUN(PKG_SWITCHING_ENABLED used prior to Serviceguard A.11.12) should be set to YES so that the monitor package will fail over between local nodes. (Note, for all primary and recovery packages, AUTO_RUN is always set to NO.) 4. Use the cmcheckconf command to validate the package. # cmcheckconf -P ccmonpkg.config 5. Copy the package configuration file ccmonpkg.config and control script ccmonpkg.
PAGE 77
Designing a Continental Cluster Building the Continentalclusters Configuration Editing the ContinentalClusters Configuration File First, on one cluster, generate an ASCII configuration template file using the cmqueryconcl command. The recommended name and location for this file is /etc/cmcluster/cmconcl.config. (If preferred, choose a different name.) Example: # cd /etc/cmcluster # cmqueryconcl -C cmconcl.
PAGE 78
Designing a Continental Cluster Building the Continentalclusters Configuration The monitor interval defines how long it can take for Continentalclusters to detect that a cluster is in a certain state. The default interval is 60 seconds, but the optimal setting depends on your system’s performance. Setting this interval too low can result in the monitor’s falsely reporting an Unreachable or Error state. If this is observed during testing, use a larger value.
PAGE 79
Designing a Continental Cluster Building the Continentalclusters Configuration #### 3. Events, Alerts, Alarms, and Notifications #### #### #### #### For complete details about how to set the parameters in #### #### this file, consult the cmqueryconcl(1m) manpage or your manual. #### #### #### ########################################################################### ######## Section 1.
PAGE 80
Designing a Continental Cluster Building the Continentalclusters Configuration #### #### #### #### #### #### #### #### #### MONITOR_INTERVAL 1 MINUTE 30 SECONDS CLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME MONITOR_PACKAGE_NAME MONITOR_INTERVAL CONTINENTAL_CLUSTER_NAME eastcoast eastnet.myco.
PAGE 81
Designing a Continental Cluster Building the Continentalclusters Configuration Figure 2-7 Sample ContinentalClusters Recovery Groups New York Cluster Recovery Group for Sales Application: RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE NYnode1 Sales LAcluster/salespkg NYcluster/salespkg_bak Los Angeles Cluster LAnode1 salespkg_bak.config custpkg_bak.conf salespkg_bak.cntl custpkg_bak.cntl WAN LAnode2 salespkg.config custpkg.config salespkg.cntl custpkg.
PAGE 82
Designing a Continental Cluster Building the Continentalclusters Configuration Figure 2-8 Sample Bi-directional Recovery Groups New York Cluster Recovery Group for Sales Application: RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE Sales LAcluster/salespkg NYcluster/salespkg_bak NYnode1 NYnode2 salespkg_bak.config salespkg_bak.cntl custpkg.cntl WAN Los Angeles Cluster LAnode1 custpkg.config LAnode2 salespkg.config custpkg_bak.conf salespkg.cntl custpkg_bak.
PAGE 83
Designing a Continental Cluster Building the Continentalclusters Configuration 2. After the PRIMARY_PACKAGE keyword, enter a primary package definition consisting of the cluster name followed by a slash (/) followed by the package name. Example: PRIMARY_PACKAGE LAcluster/custpkg 3. Optionally, enter a data sender package definition consisting of the cluster name, a slash (/), and the data sender package name after the DATA_SENDER_PACKAGE keyword.
PAGE 84
Designing a Continental Cluster Building the Continentalclusters Configuration #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### 84 During normal operation, the primary package is running an application program on the primary cluster, and the recovery package, which is configured to run the same application, is idle on the recovery c
PAGE 85
Designing a Continental Cluster Building the Continentalclusters Configuration Editing Section 3—Monitoring Definitions Finally, enter monitoring definitions that define cluster events and set times at which alert and alarm notifications are to be sent out. Define notifications for all cluster events—Unreachable, Down, Up, and Error. Although it is impossible to make specific recommendations for every Continentalclusters environment, here are a few general guidelines about notifications. 1.
PAGE 86
Designing a Continental Cluster Building the Continentalclusters Configuration #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### 86 Each monitoring definition specifies a cluster event along with the messages that should be sent to system administrators or other IT staff.
PAGE 87
Designing a Continental Cluster Building the Continentalclusters Configuration #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### #### Chapter 2 DOWN condition in the monitored cluster.
PAGE 88
Designing a Continental Cluster Building the Continentalclusters Configuration #### #### #### A notice of the event is written to a userspecified log file. must be a full path for the user-specified file.The user #### #### #### #### specified file must be under /var/opt/resmon/log directory. #### #### #### #### NOTIFICATION UDP : #### #### #### #### Message is sent to a UDP port on the #### #### specified node.
PAGE 89
Designing a Continental Cluster Building the Continentalclusters Configuration #### #### #### #### #### #### #### #### #### #### #### #### CLUSTER_EVENT westcoast/UNREACHABLE MONITORING_CLUSTER eastcoast CLUSTER_ALERT 5 MINUTES NOTIFICATION EMAIL admin@primary.site "westcoast status unknown for 5 min. Call secondary site." #### #### NOTIFICATION EMAIL admin@secondary.site "Call primary admin. (555) 555-6666.
PAGE 90
Designing a Continental Cluster Building the Continentalclusters Configuration #### #### #### CLUSTER_ALERT 0 MINUTES NOTIFICATION EMAIL admin@secondary.site "Error in monitoring cluster westcoast.
PAGE 91
Designing a Continental Cluster Building the Continentalclusters Configuration MONITORING_CLUSTER CLUSTER_ALERT NOTIFICATION The TEXTLOG notification file should be placed under the /var/opt/resmon/log directory. If any other directory is specified, an error is reported by the cmapplyconcl and cmcheckconcl commands. If you specify any other location for logging, the following error message appears: The target after textlog “ ” is not valid.
PAGE 92
Designing a Continental Cluster Building the Continentalclusters Configuration MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 5 MINUTES...
PAGE 93
Designing a Continental Cluster Building the Continentalclusters Configuration MONITOR_INTERVAL 60 SECONDS CLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME cluster2 cup.hp.com node21 node22 CLUSTER_NAME CLUSTER_DOMAIN NODE_NAME NODE_NAME MONITOR_PACKAGE_NAME MONITOR_INTERVAL cluster3 cup.hp.com node31 node32 ccmonpkg 60 SECONDS # Section 2.
PAGE 94
Designing a Continental Cluster Building the Continentalclusters Configuration NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alert” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster2 DOWN alert” CLUSTER_ALARM 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/CCTextlog “DRT: (Ora-test) DOWN alarm” NOTIFICATION SYSLOG “DRT: (Ora-test) cluster2 DOWN alarm” CLUSTER_EVENT cluster3/DOWN MONITORING_CLUSTER cluster1 CLUSTER_ALERT 0 SECONDS NOTIFICATION TEXTLOG /var/opt/resmon/log/logging “DR
PAGE 95
Designing a Continental Cluster Building the Continentalclusters Configuration “DRT: (Ora-test) NOTIFICATION “DRT: (Ora-test) UP alert” SYSLOG cluster3 UP alert” Checking and Applying the Continentalclusters Configuration After editing the configuration file on any of the participating clusters in the Continentalcluster, halt any monitor packages that are running, then use the following steps to apply the configuration to all nodes in the continental cluster. 1. Verify the content of the file.
PAGE 96
Designing a Continental Cluster Building the Continentalclusters Configuration Figure 2-9 ContinentalClusters Configuration Files New York Cluster NYnode1 recovery package files salespkg_bak.config salespkg_bak.cntl custpkg_bak.config custpkg_bak.cntl contclust config file cmconcl.config contclust config file cmconcl.config contclust monitor pkg ccmonpkg.config ccmonpkg.cntl contclust monitor pkg ccmonpkg.config ccmonpkg.
PAGE 97
Designing a Continental Cluster Building the Continentalclusters Configuration Starting the ContinentalClusters Monitor Package Starting the monitoring package enables all ContinentalClusters monitoring functionality. Before doing this, ensure that the primary packages selected to be protected are running normally and that data sender and receiver packages, if they are being used for logical data replication, are working properly. If using physical data replication, make sure that it is operational.
PAGE 98
Designing a Continental Cluster Building the Continentalclusters Configuration Use the following steps to ensure the components are functioning correctly: Make sure all daemons are running. # ps -ef | grep cmcl Two important Continentalclusters daemons are cmclsentryd and cmclrmond. 4. Check the cluster configuration on each cluster using the cmviewcl -v command. a. Ensure that each primary package is running correctly. b.
PAGE 99
Designing a Continental Cluster Building the Continentalclusters Configuration CAUTION Never issue the cmrunpkg command for a recovery package when ContinentalClusters is enabled, because there is no guaranteed way of preventing a package that is running on one cluster from running on the other cluster if the package is started using this command. The potential for data corruption is great. Chapters 3, 4 and 5 contain additional suggestions on testing the data replication and package configuration.
PAGE 100
Designing a Continental Cluster Building the Continentalclusters Configuration Figure 2-10 Recovery Checklist Identify the level of alert that the monitoring site received. Cluster Alert Cluster Alarm Contact the monitored site by phone or beeper to rule out the following: WAN networking failure, primary cluster and packages are still fine. Cluster and/or package have come back up but UP notification not yet received by recovery site.
PAGE 101
Designing a Continental Cluster Testing the Continental Cluster Testing the Continental Cluster This section presents some test procedures and scenarios. Some scenarios presume certain configurations that may not apply to all environments. Additionally, these tests do not eliminate the need to perform standard Serviceguard testing for each cluster individually. CAUTION Data and system corruption can occur as a result of testing. System and data backups should always be done prior to testing.
PAGE 102
Designing a Continental Cluster Testing the Continental Cluster Testing Continentalclusters Operations Use the following procedures to exercise typical Continentalclusters behaviors: 1. Halt both clusters in a recovery pair, then restart both clusters. The monitor packages on both clusters should start automatically. The Continentalclusters packages (primary, data sender, data receiver, and recovery) should not start automatically.
PAGE 103
Designing a Continental Cluster Testing the Continental Cluster If physical data replication is used disconnect the physical replication links between the disk arrays: — Powering off the disk array at the primary site — Powering off the disk array at the recovery site • Testing cmrecovercl -f as well as cmrecovercl Depending on the condition, the primary packages should be running to test real life failures and recovery procedures. 4.
PAGE 104
Designing a Continental Cluster Switching to the Recovery Packages in Case of Disaster Switching to the Recovery Packages in Case of Disaster Once the clusters are configured and tested, packages will be able to fail over to an alternate node in another data center and still have access to the data they need to function. The primary steps for failing over a package are: 1. Receive notification that a monitored cluster is unavailable. 2. Verify that it is necessary and safe to start the recovery packages.
PAGE 105
Designing a Continental Cluster Switching to the Recovery Packages in Case of Disaster alert or alarm, the timer is reset to 0 after each change of state; thus, the time to the alert or alarm will be the configured interval plus the time used by all the earlier state changes. NOTE The cmrecovercl command is fully enabled only after a CLUSTER_ALARM is issued; however, the command may be used with the -f option when a CLUSTER_ALERT has been issued.
PAGE 106
Designing a Continental Cluster Switching to the Recovery Packages in Case of Disaster • Check to make sure the secondary devices are in read-write mode. If you are using database or software data replication make sure the data copy at the recovery site is in read-write mode as well. • If LVM and physical data replication are used, the ID of the primary cluster is also replicated and written on the secondary devices in the recovery site.
PAGE 107
Designing a Continental Cluster Switching to the Recovery Packages in Case of Disaster This should only be used after positive confirmation from the remote site. In a multiple recovery pair configuration where more than one primary cluster is sharing the same recovery cluster, running cmrecovercl without any option will attempt to recover packages for all of the recovery groups of the configured primary clusters.
PAGE 108
Designing a Continental Cluster Switching to the Recovery Packages in Case of Disaster As the processing of each recovery group occurs (the message about the data receiver package appears only using logical data replication with data sender and receiver packages): Processing the recovery group nfsgroup on recovery cluster eastcoast Disabling switching for data receiver package nfsreceiverpkg on recovery cluster eastcoast Halting data receiver package nfsreceiverpkg on recovery cluster eastcoast Starting rec
PAGE 109
Designing a Continental Cluster Switching to the Recovery Packages in Case of Disaster will continue to be received. The following table shows the status of Continentalclusters packages after recovery has taken place, and applications are now running on the local cluster.
PAGE 110
Designing a Continental Cluster Switching to the Recovery Packages in Case of Disaster NOTE 110 If the remote cluster comes back up following a cluster event but the primary packages cannot run, halt the primary cluster with the cmhaltcl command, then issue cmrecovercl with the -f option.
PAGE 111
Designing a Continental Cluster Forcing a Package to Start Forcing a Package to Start The cmforceconcl command is used to force a Continentalclusters package to start even if the status of a remote package in the recovery group is unknown. This command is used as a prefix to a cmrunpkg and cmmodpkg command. Under normal circumstances, Continentalclusters will not allow a package to start in the recovery cluster unless it can determine that the package is not running in the primary cluster.
PAGE 112
Designing a Continental Cluster Restoring Disaster Tolerance Restoring Disaster Tolerance After a failover to a cluster occurs, restoring disaster tolerance has many challenges, the most significant of which are: • Restoring the failed cluster. Depending on the nature of the disaster it may be necessary to either create a new cluster or to restore the cluster. Before starting up the new or the failed cluster, make sure the AUTO_RUN flag for all of the Continentalclusters application packages is disabled.
PAGE 113
Designing a Continental Cluster Restoring Disaster Tolerance b. Halt the recovered application on the surviving cluster if necessary, and start it on the repaired cluster. c. To keep application down time to a minimum, start the primary package on the cluster before resynchronizing the data of the next recovery group. 5.
PAGE 114
Designing a Continental Cluster Restoring Disaster Tolerance 2. Edit the Continentalclusters ASCII configuration file. It is necessary to change the definitions of monitoring clusters, and switch the names of primary and recovery packages in the definitions of recovery groups. It may also be necessary to re-create data sender and data receiver packages. 3. Check and apply the Continentalclusters configuration. # cmcheckconcl -v -C cmconcl.config # cmapplyconcl -v -C cmconcl.config 4.
PAGE 115
Designing a Continental Cluster Restoring Disaster Tolerance The cmswitchconcl command cannot be used for the recovery groups that have both data sender and data receiver packages specified. To restore disaster tolerance with cmswitchconcl while continuing to run the packages on the surviving cluster, use the following procedures: 1. Halt the monitor package on each cluster. # cmhaltpkg ccmonpkg 2. Run this command.
PAGE 116
Designing a Continental Cluster Restoring Disaster Tolerance 4. Restart the monitor packages on each cluster. # cmmodpkg -e ccmonpkg 5. View the status of the Continentalcluster. # cmviewconcl NOTE The cluster shared storage configuration file /etc/cmconcl/ccrac/ccrac.config is not updated by cmswitchconcl.
PAGE 117
Designing a Continental Cluster Restoring Disaster Tolerance CLUSTER_DOMAIN NODE_NAME cup.hp.com node1 NODE_NAME node2 MONITOR_PACKAGE_NAME ccmonpkg CLUSTER_NAME ClusterB CLUSTER_DOMAIN cup.hp.com NODE_NAME node3 NODE_NAME node4 MONITOR_PACKAGE_NAME ccmonpkg MONITOR_INTERVAL 60 SECONDS ### Section 2.
PAGE 118
Designing a Continental Cluster Restoring Disaster Tolerance “CC alert: DOWN” NOTIFICATION SYSLOG “CC alert: DOWN” CLUSTER_ALARM NOTIFICATION 90 SECONDS TEXTLOG /var/opt/resmon/log/data/events.log “CC alarm: DOWN” NOTIFICATION SYSLOG “CC alarm: DOWN” sample.output ### Section 1. Cluster Information CONTINENTAL_CLUSTER_NAME Sample_CC_Cluster CLUSTER_NAME ClusterA CLUSTER_DOMAIN cup.hp.
PAGE 119
Designing a Continental Cluster Restoring Disaster Tolerance PRIMARY_PACKAGE ClusterB/pkgZ RECOVERY_PACKAGE ClusterA/pkgZ' RECOVERY_GROUP_NAME RG4 PRIMARY_PACKAGE ClusterB/pkgW RECOVERY_PACKAGE ClusterA/pkgW' DATA_RECEIVER_PACKAGE ClusterA/pkgR2 ### Section 3.
PAGE 120
Designing a Continental Cluster Restoring Disaster Tolerance CLUSTER_ALERT NOTIFICATION 0 MINUTES SYSLOG “CC alert: UP” Newly Created Cluster Will Run Primary Packages After creating a new cluster to replace the damaged cluster, restore the critical applications to the new cluster and restore the other cluster to its role as a backup for the recovered packages. 1. Configure the new cluster as a Serviceguard cluster.
PAGE 121
Designing a Continental Cluster Restoring Disaster Tolerance 6. Restart the monitor package on the surviving cluster. # cmrunpkg ccmonpkg 7. View the status of the Continentalcluster. # cmviewconcl Newly Created Cluster Will Function as Recovery Cluster for All Recovery Groups After replacing the failed cluster, if the downtime involved in moving the applications back is a concern, then do the following: • Change the surviving cluster to the role of primary cluster for all recovery groups.
PAGE 122
Designing a Continental Cluster Maintaining a Continental Cluster Maintaining a Continental Cluster The following common maintenance tasks are described in this section: CAUTION • Adding a Node to a Cluster or Removing a Node from a Cluster • Adding a Package to a Continental Cluster • Removing a Package from the Continental Cluster • Changing Monitoring Definitions • Checking the Status of Clusters, Nodes and Packages • Reviewing Log Files • Renaming a Continental Cluster • Deleting a Con
PAGE 123
Designing a Continental Cluster Maintaining a Continental Cluster 3. Edit the Continentalclusters configuration ASCII file to add or remove the node in the cluster. 4. For added nodes, ensure that the /etc/cmcluster/cmclnodelist and /etc/opt/cmom/cmomhosts files are set up correctly on the new node. Refer to “Preparing Security Files” on page 72.
PAGE 124
Designing a Continental Cluster Maintaining a Continental Cluster 7. Use the cmapplyconcl command to apply the new Continentalclusters configuration. 8. Restart the monitor packages on both clusters. 9. View the status of the continental cluster. # cmviewconcl Removing a Package from the Continental Cluster To remove a package from the Continentalclusters configuration, you must first remove the recovery group from the Continentalclusters configuration file.
PAGE 125
Designing a Continental Cluster Maintaining a Continental Cluster 3. Use the cmapplyconcl command to apply the new configuration. 4. Restart the monitor packages on both clusters. 5. View the status of the continental cluster. # cmviewconcl Checking the Status of Clusters, Nodes, and Packages To check on the status of the continental clusters and associated packages, use the cmviewconcl command, which lists the status of the clusters, associated package status, and configured events status.
PAGE 126
Designing a Continental Cluster Maintaining a Continental Cluster PRIMARY CLUSTER cjc838 STATUS down EVENT LEVEL POLLING INTERVAL ALARM 20 CONFIGURED EVENT STATUS DURATION LAST NOTIFICATION SENT alert unreachable 15 sec -alarm unreachable 30 sec -alarm down 0 sec Fri May 12 12:13:06 PDT 2000 alert error 0 sec -alert up 20 sec -alert up 40 sec -PACKAGE RECOVERY GROUP prg1 PACKAGE cjc838/primary cjc1234/recovery ROLE primary recovery STATUS down up The following is the output of a cmviewconcl command
PAGE 127
Designing a Continental Cluster Maintaining a Continental Cluster INTERVAL PTST_dts1 Unmonitored CONFIGURED EVENT NOTIFICATION SENT alert alert alarm alert alert alarm alert alert PACKAGE RECOVERY GROUP 1 min STATUS DURATION LAST unreachable unreachable unreachable down down down error up 1 2 3 1 2 3 0 1 --------- min min min min min min sec min hpgroup10 PACKAGE PTST_sanfran/PACKAGE1 PTST_dts1/PACKAGE1 PACKAGE RECOVERY GROUP unmonitored ROLE primary recovery STATUS down down ROLE primary re
PAGE 128
Designing a Continental Cluster Maintaining a Continental Cluster PRIMARY PRIMARY NODE nynode2 up up 12.1 56.1 STATUS up STATE running lan0 lan2 Network_Parameters: INTERFACE STATUS PRIMARY up PRIMARY up PATH 4.1 56.1 NAME lan0 lan1 PACKAGE ccmonpkg STATE running PKG_SWITCH enabled STATUS up Script_Parameters: ITEM NAME STATUS Service ccmonpkg.
PAGE 129
Designing a Continental Cluster Maintaining a Continental Cluster Use the ps command to check for the status of the Continentalclusters monitor daemons cmclrmond and cmclsentryd, which should be running on the cluster node where the monitor package is running. Reviewing Messages and Log Files The Continentalclusters commands—cmquerycl, cmcheckconcl, cmapplyconcl, and cmrecovercl—all display messages on the standard output, which is the first place to look for error messages.
PAGE 130
Designing a Continental Cluster Maintaining a Continental Cluster Messages from the Continentalclusters daemon are reported in log file /var/adm/cmconcl/sentryd.log, and Object Manager messages appear in /var/opt/cmom/cmomd.log. These messages may be helpful in troubleshooting. Use the cmreadlog command to view the entries in these files. Examples: # /opt/cmom/tools/bin/cmreadlog -f /var/adm/cmconcl/sentryd.log slog.txt # /opt/cmom/tools/bin/cmreadlog -f /var/opt/cmom/cmomd.log \ omlog.
PAGE 131
Designing a Continental Cluster Maintaining a Continental Cluster Renaming a Continental Cluster To rename an existing continental cluster, perform the following steps: 1. Remove the continental clusters configuration. # cmdeleteconcl 2. Edit the CONTINENTAL_CLUSTER_NAME field in the configuration ASCII file, and run the cmapplyconcl command to configure the continental cluster with a new name. Checking Java File Versions Some components of Continentalclusters are executed from Java .jar files.
PAGE 132
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment Support for Oracle RAC Instances in a Continentalclusters Environment Support for Oracle RAC instances means that the RAC instances running on the primary cluster will be restarted by Continentalclusters on the recovery cluster to continue serving the clients' databases requests upon a primary cluster failure. Figure 2-11 is a sample of Oracle RAC instances running in the Continentalclusters environment.
PAGE 133
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment Oracle RAC instances are only supported in the Continentalclusters environment for physical replication using HP StorageWorks Continuous Access XP, or EMC Symmetrix Remote Data Facility (SRDF) using HP SLVM or Veritas Cluster Volume Manager (CVM) or Cluster File Systems (CFS) from Symantec for volume management.
PAGE 134
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment Figure 2-12 Sample Oracle RAC Instances in a ContinentalClusters Environment After Failover New York Secondary Serviceguard Cluster Running Oracle RAC NYnode1 RAC Instance1 Highly Available Network Los Angeles Primary Serviceguard Cluster NYnode2 XP Disk Array RAC Instance2 WAN X LAnode1 LAnode2 Disk Array Continuous Access XP or Continuous Access EVA or EMC SRDF Data Replication Configuring the E
PAGE 135
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment EVA for replication is supported only with SLVM for storage. For more information on specific Oracle RAC configurations that are supported, refer Table 2-7 on page 135. For complete installation and configuration information of Oracle and HP StorageWorks products, refer to the Oracle RAC and HP StorageWorks manuals.
PAGE 136
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment 2. Configure the database storage using one of the following software: • Shared Logical Volume Manager (SLVM) • Cluster Volume Manager (CVM) • Cluster File Systems (CFS) You need to configure the SLVM volume groups or CVM disk groups on the disk arrays to store the Oracle database. Configure the volume groups or disk groups on both primary and recovery clusters.
PAGE 137
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment CVM_DISK_GROUP variables. The instance packages should be configured to have a dependency with the required CVM disk group multi-node package. d. Run the following commands of the CFS scripts to add and configure the disk groups and file system mount points multi-node packages (MNP) to the clusters. These multi-node packages manipulate the disk group, and mount-point activities in the cluster.
PAGE 138
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment vxedit -g set user= group= set mode= This step is required because when you import disks or volume groups to the recovery site, the access rights for the imported disks or volume groups are set to root by default. As a result, the database instances do not start. To eliminate this behavior, you must set the access rights to persistent.
PAGE 139
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment For details on how to configure an Oracle RAC instance in a Serviceguard package, refer to the Using Serviceguard Extension for RAC user’s guide. In the Continentalclusters environment, you can configure each RAC instance in a failover type package or you can configure all RAC instances in a single multi-node package. 6. Setup the environment file.
PAGE 140
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment a. Login as root on one node of the primary cluster. b. Change to your own directory: # cd c. Copy the file: # cp /opt/cmconcl/scripts/ccrac.config \ ccrac.config.mycopy d. Edit the file ccrac.config.mycopy to fit your environment. The following parameters need to be edited: CCRAC_ENV - fully qualified Metrocluster environment file name.
PAGE 141
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment This parameter is mandatory when CVM disk groups or CFS are used. This parameter cannot be declared when SLVM volume groups are used. CCRAC_INSTANCE_PKGS - the names of the configured RAC instance packages accessing in parallel the database stored in the specified volume groups. This parameter is mandatory.
PAGE 142
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment CCRAC_CLUSTER[0]=PriCluster1 CCRAC_ENV_LOG[0]=/tmp/db1_prep.log CCRAC_ENV[1]=/etc/cmconcl/ccrac/db2/db2EnvFile_srd f.env CCRAC_CVM_DGS[1]=racdg01 racdg02 CCRAC_INSTANCE_PKGS[1]=ccracPkg3 ccracPkg4 CCRAC_CLUSTER[1]=PriCluster2 CCRAC_ENV_LOG[1]=/tmp/db2_prep.log CCRAC_ENV[2]=/etc/cmconcl/ccrac/db3/db3EnvFile_xpc a.
PAGE 143
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment RECOVERY_PACKAGE ClusterB/instancepkg1' RECOVERY_GROUP_NAME PRIMARY_PACKAGE RECOVERY_PACKAGE instanceRG2 ClusterA/instancepkg2 ClusterB/instancepkg2' Packages instancepkg1 and instancepkg2 are configured to run on primary cluster “ClusterA”. Packages instancepkg1’ and instancepkg2’ are configured to be restarted or recovered on the recovery cluster “ClusterB” upon primary cluster failure.
PAGE 144
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment Figure 2-13 ContinentalClusters Configuration Files in a Recovery Pair with RAC Support New York Cluster NYnode1 recovery package files RACinstance1_bak.config RACinstance1_bak.cntl RACinstance2_bak.config contclust config file cmconcl.config contclust config file cmconcl.config contclust monitor pkg ccmonpkg.config ccmonpkg.cntl cont clust RAC spec. file /etc/cmconcl/ccrac/ccrac \ .
PAGE 145
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment Serviceguard/Serviceguard Extension for RAC and Oracle Clusterware Configuration The following are the required configurations for Continentalclusters RAC instance recovery support for the cluster environment running with Serviceguard/Serviceguard Extension for RAC and CRS (Oracle Cluster Software): 1.
PAGE 146
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment Initial Startup of Oracle RAC Instance in a Continentalclusters Environment To ensure that the disk array will be ready for access in shared mode for the Oracle RAC instances, it is recommended that the user runs the Continentalclusters tool /opt/cmconcl/bin/ccrac_mgmt.ksh to initially startup the configured instance packages.
PAGE 147
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment is the index used in the /etc/cmconcl/ccrac/ccrac.config file for the target set of the Oracle RAC instance packages. 4. To stop all the RAC instance packages configured to run as primary packages on the local cluster. # /opt/cmconcl/bin/ccrac_mgmt.ksh stop To stop a specific set of RAC instance packages. # /opt/cmconcl/ccrac_mgmt.
PAGE 148
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment Failover of Oracle RAC Instances to the Recovery Site Upon a disaster that disables the primary cluster, to start up a Continentalclusters recovery process, run the following command: # cmrecovercl For the cluster environment running with Serviceguard and Oracle Clusterware, confirm that the Clusterware daemons and the required Oracle services, such as listener, GSD, ONS, and VIP, are started on all the no
PAGE 149
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment If the Continentalclusters Oracle RAC support is enabled (the /etc/cmconcl/ccrac/ccrac.config file exists), the following messages will be prompted to the user when the command cmrecovercl is invoked and confirmations are needed for the process to proceed. WARNING: This command will take over for the primary cluster LACluster by starting the recovery package on the recovery cluster NYCluster.
PAGE 150
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment recovery on secondary_cluster is necessary. Continuing with this command while the applications are running on the primary cluster may result in data corruption.
PAGE 151
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment If you have configured the Oracle RAC instance package such that there is one instance for every package, the instance or recovery group can be recovered individually. If you have configured all instances as a single multi-node package (MNP), recovering the recovery group of this package starts all instances.
PAGE 152
Designing a Continental Cluster Support for Oracle RAC Instances in a Continentalclusters Environment cfsdgadm deactivate c. Deport the disk groups using the following command: vxdg deport The recovery cluster is now ready to failback packages and applications to the primary cluster. 3. Synchronize the data between the two participating clusters. Make sure that the data integrity and the data currency are at the expected level at the primary site. 4.
PAGE 153
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP 3 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP The HP StorageWorks Disk Array XP Series allows you to configure data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the Continuous Access XP software and the additional files that integrate the XP with Serviceguard clusters.
PAGE 154
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Files for Integrating XP Disk Arrays with Serviceguard Clusters Files for Integrating XP Disk Arrays with Serviceguard Clusters Metrocluster is a set of executable programs and an environmental file that work in an Serviceguard cluster to automate failover to alternate nodes in the case of disaster in metropolitan cluster. The Metrocluster/Continuous Access product contains the following files.
PAGE 155
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Files for Integrating XP Disk Arrays with Serviceguard Clusters Metrocluster/Continuous Access needs to be installed on all nodes that will run a Serviceguard package whose data are on an HP StorageWorks Disk Array XP Series, and where the data is replicated to a second XP using the Continuous Access XP facility.
PAGE 156
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts Overview of Continuous Access XP Concepts The HP Storage Works Disk Array XP Series may be configured for use in data replication from one XP series unit to another. This type of physical data replication is a part of the Metrocluster/Continuous Access and Continentalclusters solutions.
PAGE 157
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts Device Groups and Fence Levels A device group is the set of XP devices that are used by a given package. The device group is the basis on which PVOLs and SVOLs are created. The fence level of the device group is set when you define it. All devices defined in a given device group must be configured with the same fence level.
PAGE 158
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts • The Continuous Access links fail. • The application continues to modify data. • The link is restored. • Resynchronization from PVOL to SVOL starts, but does not finish.
PAGE 159
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts the side file, for later transmission to the remote XP disk array. When synchronous replication is used, the primary system cannot complete a transaction until a message is received acknowledging that data has been written to the remote site.
PAGE 160
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts In case all the Continuous Access links fail, the remaining data in the side file that has not been copied over to the SVOL will be tracked in the bit map. The application continues to modify the data on the PVOL, which will also be tracked in the bit map. The SVOL only contains a copy of the data up to the point the failure of the Continuous Access links.
PAGE 161
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts • When making paired volumes, the Raid Manager registers a CTGID to the XP Series disk array automatically at paircreate time, and the device group in the configuration file is mapped to a CTGID. Efforts to create a CTGID with a higher number will be terminated with a return value of EX_ENOCTG.
PAGE 162
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts Continuous Access Journal Overview Continuous Access XP Journal is an asynchronous data replication between two HP XP12000 storage disk arrays. As depicted in Figure 3-3, Continuous Access Journal uses two main features, “disk-based journaling” and “pull-style replication”.
PAGE 163
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts Figure 3-3 Chapter 3 Journal Based Replication 163
PAGE 164
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts Continuous Access Journal performs remote copy operations for data volume pairs. Each Continuous Access Journal pair consists of primary data volumes (PVOL) and secondary data volumes (SVOL) which are located in different storage arrays. The Continuous Access Journal PVOL contains the original data, and the SVOL contains the duplicate data.
PAGE 165
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts Pull-Based Replication In addition to disk-based journaling, Continuous Access Journal uses pull-style replication. The primary storage system does not dedicate resources to pushing data across the replication link.
PAGE 166
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts loss of consistency. The recovery time may be extended a bit during temporary link failures or congestion, but the asynchronous replication process does not fail, and the catch-up process is simple and automatic. Data consistency is preserved.
PAGE 167
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts depending on the “Use of Cache” parameter and/or amount of data in cache. If the “Use of Cache” is set to “Use”, journal data will be stored into the journal cache. If it is set to “No Use”, journal data will bypass the cache and move directly to the journal volumes.
PAGE 168
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts To accommodate, the Continuous Access Journal retains the PAIR state when the Continuous Access links fail while the Continuous Access Asynchronous switches to PSUE state as long as the journal volumes has enough space.
PAGE 169
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts Journal Group Requirement The journal groups require that each data volume pair be assigned to one and only one journal group. Configuring XP12000 Continuous Access Journal One journal group can contain multiple journal volumes. Each of the journal volumes can have different volume sizes and different RAID configurations.
PAGE 170
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Overview of Continuous Access XP Concepts Journal volumes can be registered in a journal group or can be deleted from a Journal group. Journal volumes cannot be registered or deleted when data copying is performed (that is, when one or more data volume pairs exist).
PAGE 171
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Creating the Cluster Creating the Cluster Create the cluster or clusters according to the process described in the Managing Serviceguard user’s guide. In the case of a metropolitan cluster, create a single Serviceguard cluster with components on multiple sites. In the case of a continental cluster, create two distinct Serviceguard clusters on different sites.
PAGE 172
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication Preparing the Cluster for Data Replication This section assumes that you have already created one or more Serviceguard clusters for use in a disaster tolerant configuration. The following three sets of procedures will prepare Serviceguard clusters for use with Continuous Access XP data replication in a metropolitan or continental cluster.
PAGE 173
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication For example: horcm0 11000/udp #Raid Manager instance 0 For more detail, see the /opt/cmcluster/toolkit/SGCA/Samples/services.example file. 4. Use the ioscan command to determine what devices on the XP disk array have been configured as command devices.
PAGE 174
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication # export HORCMPERM=/etc/horcmperm0.conf If the Raid Manager protection facility is not used or disabled, export the HORCPERM environment variable. # export HORCMPERM=MGRNOINST 8. Start the Raid Manager instance by using horcmstart.sh . # horcmstart.sh 0 9.
PAGE 175
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication NOTE There must also be alternate links for each device, and these alternate links must be on different busses inside the XP disk array. These alternate links, for example, may be CL2-E and CL2-F. Unless the devices have been previously paired either on this or another host, the devices will show up as SMPL (simplex).
PAGE 176
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication Figure 3-4 Disaster Tolerant Cluster replicated data for package A PVOL PVOL Local XP Disk Array A pkg A PVlinks Remote XP Continuous Access Disk Array PVlinks A’ link node 1 node 2 pkg B SVOL SVOL replicated data for package B network network Highly Available Network network PVlinks B node 3 network Continuous Access link PVlinks node 4 B’ As an exampl
PAGE 177
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication When using the paircreate command to create PVOL/SVOL Continuous Access pairs, specify the -c 15 switch to ensure the fastest data copy from PVOL to SVOL.
PAGE 178
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication # instance of Raid Manager before the changes will be recognized. This can be done using the following commands: # # horcmshutdown.sh # horcmstart.sh # # After restarting the Raid Manager instance, you should confirm that there # are no configuration errors reported by running the pairdisplay command # with the "-c" option.
PAGE 179
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication # # # # # # single command device could prevent access to the device group. Each command device must have alternate links (PVLinks). The first command device is the primary command device. The second command device is a redundant command device and is used only upon failure of the primary command device.
PAGE 180
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication pkgB pkgC pkgD pkgB_d1 pkgC_d1 pkgD_d1 CL1-E CL1-E CL1-E 0 0 0 4 5 2 #/************************* HORCM_INST ************************************/ # # This parameter is used to define the network address (IP address or host # name) of the remote hosts which can provide the remote Raid Manager access # for each of the device group secondary volumes.
PAGE 181
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication Continentalclusters packages. The device group name (dev_group) is user-defined and must be the same on each host in the continental cluster that accesses the XP disk array. The device group name (dev_group) must be unique within the cluster; it should be a name that is easily associated with the application name or Serviceguard package name.
PAGE 182
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication # # See the Metrocluster and Raid Manager documentation for more information # on configuring this script.
PAGE 183
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication Defining Storage Units Both LVM and VERITAS VxVM storage can be used in disaster tolerant clusters. The following sections show how to set up each type: Creating and Exporting LVM Volume Groups using Continuous Access XP Use the following procedure to create and export volume groups: 1.
PAGE 184
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication 6. On the recovery cluster import the VGs on all of the systems that might run the Serviceguard recovery package and backup the LVM configuration.
PAGE 185
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication 4. Create the disk group to be used with the vxdg command only on the primary system. # vxdg init logdata /dev/dsk/c5t0d0 5. Verify the configuration. # vxdg list 6. Use the vxassist command to create the logical volume. # vxassist -g logdata make logfile 2048m 7. Verify the configuration. # vxprint -g logdata 8. Make the filesystem.
PAGE 186
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Preparing the Cluster for Data Replication # vxdg -tfC import logdata 5. Start the logical volume in the disk group. # vxvol -g logdata startall 6. Create a directory to mount the volume. # mkdir /logs 7. Mount the volume. # mount /dev/vx/dsk/logdata/logfile /logs 8. Check to make sure the file system is present, then unmount the file system. # umount /logs 9. Resynchronize the Continuous Access pair device.
PAGE 187
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Configuring Packages for Disaster Recovery Configuring Packages for Disaster Recovery When you have completed the following steps, packages will be able to fail over to an alternate node in another data center and still have access to the data that they need in order to operate.
PAGE 188
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Configuring Packages for Disaster Recovery If you are using a fence level of ASYNC, then the RUN_SCRIPT_TIMEOUT should be greater than the value of HORCTIMEOUT in the package environment file (see step 7g below). NOTE If you are using the EMS disk monitor as a package resource, you must not use NO_TIMEOUT. Otherwise, package shutdown will hang if there is no access from the host to the package disks.
PAGE 189
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Configuring Packages for Disaster Recovery env. The following examples demonstrate how the environment file name should be chosen. Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_xpca.env. Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_xpca.env. 7.
PAGE 190
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Configuring Packages for Disaster Recovery g. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array. h.
PAGE 191
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Configuring Packages for Disaster Recovery pkgname_xpca.env Metrocluster/Continuous Access environment file pkgname.config Serviceguard package ASCII configuration file pkgname.
PAGE 192
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP No additional steps are required after cluster and package configuration to complete the setup of the metropolitan cluster.
PAGE 193
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP This display shows that 79% of a current copy operation has completed. Synchronous fence levels (NEVER and DATA) show 100% in this column when the volumes are in a PAIR state.
PAGE 194
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP # pairdisplay -g oradb -fe Group Seq#, LDEV# P/S,Status, Fence, %, oradb 30053 64 P-VOL PAIR Never, 75 oradb 30054 C8 S-VOL PAIR Never, 64 P-LDEV# M CTG JID C8 1 1 AP EM E-Seq# 0 2 0 E-LDEV# Viewing the Journal Volumes Information - Raid Manager using the “raidvchkscan” Command The raidvchkscan command supports the option (-v jnl [unit#]
PAGE 195
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP — P(S)JSF: this means “P(S)vol Journal Suspend Full” — P(S)JSE: this means “P(S)vol Journal Suspend Error” including Link failure Chapter 3 • AP: shows the number of active path on the initiator port in Continuous Access links. • Q-Marker: Displays the sequence number in the journal group.
PAGE 196
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP Figure 3-5 196 Q-Marker and Q-CNT Chapter 3
PAGE 197
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP • U(%): Displays the usage rate of the journal data. • D-SZ: Displays the capacity for the journal data on the journal group. • Seq#: Displays the serial number of the XP12000. • Num: Displays the number of LDEV (journal volumes) configured for the journal group. • LDEV#: Displays the first LDEV number of journal volumes.
PAGE 198
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP Following is a partial list of failures that require running pairresync to restore disaster-tolerant data protection: • Failure of all Continuous Access links without restart of the application • Failure of all Continuous Access links with Fence Level “DATA” with restart of the application on a primary host • Failure of the entire second
PAGE 199
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP Using the pairresync Command The pairresync command can be used with special options; after a failover in which the recovery site has started the application, and has processed transaction data on the disk at the recovery site, but the disks on the primary site are intact.
PAGE 200
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP Timing Considerations In a journal group, many journal volumes can be configured to hold a significant amount of the journal data (host-write data). The package startup time may increase significantly when a Metrocluster Continuous Access package fails over. Delay in package startup time will occur in these situations: 1.
PAGE 201
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP In this case, either use FORCEFLAG to startup the package on SVOL site or fix the problem and resume the data replication with the following procedures: 1. Split the device group pair completely (pairsplit -g -S). 2. Re-create a pair from original PVOL as source (use paircreate command). 3.
PAGE 202
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP • At T1, device pair is in PVOL-PAIR/SVOL-PAIR and the AP value is 0 in SVOL site. • At T2, a failover occurs; package failover from PVOL site to SVOL site. The Metrocluster Continuous Access issues SVOL-Takeover and the state will become SVOL-PSUS(SSWS) and PVOL-PAIR. • At T3, all Continuous Access links have been recovered.
PAGE 203
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP The monitor, as a package service, periodically checks the status of the XP/Continuous Access device group that is configured for the package, and sends notification to the user via email, syslog, and console if there is a change in the status of the package’s device group.
PAGE 204
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP XP/Continuous Access Device Group Monitor Operation Overview The XP/Continuous Access device group monitor runs as a package service. The user can configure the monitor's setting through the package's environment file.
PAGE 205
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP • If you want to receive notification messages over email, uncomment the MON_NOTIFICATION_EMAIL variable and set it to a fully qualified email address. Multiple email addresses can be configured using comma as separator between the addresses.
PAGE 206
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP • send a notification on every third polling, if the state of the device group remains the same. • send the notifications to sysadmin1@hp.com and sysadmin2@hp.com. • log notifications to system log file, syslog. • display notifications to system console.
PAGE 207
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP Configure XP/Continuous Access Device Group Monitor as a Service of the Package Add the monitor as a service in the package's configuration file and control script file as follows: • In the package's configuration file, add the following lines: SERVICE_NAME pkgXdevgrpmon.
PAGE 208
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Metrocluster Solution with Continuous Access XP Troubleshooting the XP/Continuous Access Device Group Monitor The following is a guideline to help identify the cause of potential problems with the XP/Continuous Access device group monitor. • Problems with email notifications: XP/Continuous Access device group monitor uses SMTP to send out email notifications.
PAGE 209
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP The following section describes how to configure a continental cluster solution using Continuous Access XP, which requires the Metrocluster Continuous Access product.
PAGE 210
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP Create an Serviceguard package configuration file in the primary cluster. # cd /etc/cmcluster/ # cmmakepkg -p .ascii Customize it as appropriate to your application. Be sure to include the pathname of the control script (/etc/cmcluster// .
PAGE 211
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP a. If necessary, add the path where the Raid Manager software binaries have been installed to the PATH environment variable. If the software is in the usual location, /usr/bin, you can just uncomment the line in the script. b. Uncomment the behavioral configuration environment variables starting with AUTO_.
PAGE 212
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP 8. Distribute Metrocluster/Continuous Access configuration, environment and control script files to other nodes in the cluster by using ftp or rcp: # rcp -p /etc/cmcluster/pkgname/* \ other_node:/etc/cmcluster/pkgname See the example script Samples/ftpit to see how to semi-automate the copy using ftp.
PAGE 213
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP 12. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and package failover. 13. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted at this time.
PAGE 214
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP # mkdir /etc/cmcluster/ Create an Serviceguard package configuration file in the recovery cluster. # cd /etc/cmcluster/ # cmmakepkg -p .ascii Customize it as appropriate to your application. Make sure to include the pathname of the control script (/etc/cmcluster// .
PAGE 215
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP 5. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Managing Serviceguard user’s guide for more information on these functions. 6. Copy the environment file template /opt/cmcluster/toolkit/SGCA/xpca.env to the package directory, naming it pkgname_xpca.env.
PAGE 216
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP g. Uncomment the FENCE variable and set it to either ASYNC, NEVER, or DATA according to your business requirements or special Metrocluster requirements. This variable is used to compare with the actual fence level returned by the array. h.
PAGE 217
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP other files Any other scripts you use to manage Serviceguard packages 11. Edit the file /etc/rc.config.d/raidmgr, specifying the Raid Manager instance to be used for Continentalclusters, and specify that the instance be started at boot time.
PAGE 218
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP 3. Edit the continental cluster security file /etc/opt/cmom/cmomhosts to allow or deny hosts read access by the monitor software. 4. On all nodes in both clusters copy the monitor package files from /opt/cmconcl/scripts to/etc/cmcluster/ccmonpkg.
PAGE 219
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP Switching to the Recovery Cluster in Case of Disaster It is vital the administrator verify that recovery is needed after receiving a cluster alert or alarm. Network failures may produce false alarms. After validating a failure, start the recovery process using the cmrecovercl [-f] command.
PAGE 220
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP Failback in Scenarios 1 and 2 After reception of the Continentalclusters alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster.
PAGE 221
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP # pairresync -g -c 15 -swaps This starts the resynchronization, which can take a long time if the entire primary disk array was lost or a short time if the primary array was intact at the time of failover. 2. When resynchronization is complete, halt the Continentalclusters recovery packages at the recovery site.
PAGE 222
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP 3. Since the paired volumes have a status of SMPL at both the primary and recovery sites, the XP views the two halves as unmirrored. From a system at the primary site, manually create the paired volume. # paircreate -g -f -vr -c 15 See the XP Raid Manager user’s guide on more paircreate command options.
PAGE 223
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP • failure of the entire recovery Data Center for a given application package • failure of the recovery XP disk array for a given application package while the application is running on a primary host Following is a partial list of failures that require full resynchronization to restore disaster-tolerant data protection.
PAGE 224
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP • /etc/cmcluster//.log • /etc/cmcluster/.
PAGE 225
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP Chapter 3 • The value of RUN_SCRIPT_TIMEOUT in the package ASCII file should be set to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time due to getting status from the XP disk array. (See the previous paragraph for more information on the extra startup time).
PAGE 226
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access XP Completing and Running a Continental Cluster Solution with Continuous Access XP 226 Chapter 3
PAGE 227
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA 4 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA The HP StorageWorks Enterprise Virtual Array (EVA) allows you to configure data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the Continuous Access EVA software and the additional files that integrate the EVA with Serviceguard clusters.
PAGE 228
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Files for Integrating the EVA with Serviceguard Clusters Files for Integrating the EVA with Serviceguard Clusters Metrocluster consists of a script, program files, and an environment file that work in an Serviceguard metropolitan cluster to automate failover to alternate nodes in the case of a disaster. The Metrocluster Continuous Access EVA product contains the following files.
PAGE 229
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Files for Integrating the EVA with Serviceguard Clusters Table 4-1 Metrocluster Continuous Access EVA Template Files (Continued) Name Description /opt/cmcluster/toolkit/SGCA EVA/caeva.env The Metrocluster Continuous Access EVA environment file. This file must be customized for specific EVA DR groups and Serviceguard packages. Copies of this file must be customized for each separate Serviceguard package.
PAGE 230
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Overview of EVA and Continuous Access EVA Concepts Overview of EVA and Continuous Access EVA Concepts Continuous Access EVA provides remote data replication from primary EVA systems to remote EVA systems. Continuous Access EVA uses the remote-copy function of the Hierarchical Storage Virtualization (HSV) controller running Virtual Controller Software (VCS) to achieve host-independent data replication.
PAGE 231
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Overview of EVA and Continuous Access EVA Concepts Copy Sets Vdisks are user-defined storage allotments of virtual or logical data storage. A pairing relationship can be created to automatically replicate a logical disk to another logical disk. The generic term for this is a copy set.
PAGE 232
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Overview of EVA and Continuous Access EVA Concepts DR Group Properties Properties are defined for every DR group that is created. DR group properties are described below: • Name: A unique name given to each DR group. HP recommends that the names of replicating DR groups at the source and destination be the same.
PAGE 233
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Overview of EVA and Continuous Access EVA Concepts Log Disk The DR group has storage allocated on demand called a log. The virtual log collects host write commands and data if access to the destination storage system is severed. When a connection is later re-established, the contents of the log are written to the destination Vdisk to synchronize it with the source Vdisk.
PAGE 234
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Overview of EVA and Continuous Access EVA Concepts Managed Sets A managed set is a collection of DR groups selected for the purpose of managing them. For example, a managed set can be created to manage all DR groups of a particular application that reside in separate storage arrays.
PAGE 235
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Overview of EVA and Continuous Access EVA Concepts Metrocluster Continuous Access EVA software uses WBEM API to communicate with SMI-S to automatically manage the DR Groups that are used in the application packages.
PAGE 236
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA When the following procedures are completed, an adoptive node will be able to access the data belonging to a package after it fails over. Setting up the Storage Hardware 1.
PAGE 237
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Figure 4-1 Configuration of Virtual Disks and DR groups For more detailed information on setting up Command View EVA for configuring, managing and monitoring your HP StorageWorks Enterprise Virtual Array Storage System, refer to the HP StorageWorks Command View EVA Getting Started Guide (part number: AA-RQZBE-TE).
PAGE 238
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA hosts. The destination volume access mode needs to be changed to Read-only mode before the DR group can be used. The destination volumes need to be presented to its local host. NOTE In the Metrocluster Continuous Access EVA environment, it is required that the destination volume access mode be set to read-only mode.
PAGE 239
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA # /sbin/sssu “FILE ” 4. After changing the access mode of the destination Vdisk, it is necessary to run the ioscan command and the insf command on remote clustered nodes to create the special device file name for the destination Vdisk on remote EVA.
PAGE 240
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA The first utility, called smispasswd, is a Command Line Interface (CLI) that provides functions for defining Management Server list and SMI-S username and password pair. The second utility, called evadiscovery, is also a CLI that provides functions for defining EVA storage cells and DR group information.
PAGE 241
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 1. Create a configuration input file (A template of this file can be found in /opt/cmcluster/toolkit/SGCAEVA/smiseva.conf). 2. Copy the template file /opt/cmcluster/toolkit/SGCAEVA/smiseva.conf to the /etc/dtsconf/ directory. # cp /opt/cmcluster/toolkit/SGCAEVA/smiseva.conf \ /etc/dtsconf/smiseva.conf 3.
PAGE 242
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA An example of the smiseva.conf file is as follows: ############################################################## # # # smiseva.conf CONFIGURATION FILE (template) # # for use with the smispasswd utility # # in the Metrocluster Continuous Access # # EVA Environment # # Note: This file MUST be edited before it can be used.
PAGE 243
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA # # # # # # # # # # # # # # # # # # # or tab(s). The order of fields is significant. The first field must be a hostname or IP address, the second field must be a user login name on the host. The third field must be ‘y’ or ‘n’ to use SSL connect. The last field must be the namespace of the SMI-S service.
PAGE 244
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA NOTE The username and password stored in the mapping file are the same as the username and password used with the SSSU tool. Enter password of 15.13.172.11: ********** Re-enter password of 15.13.172.11: ********** Enter password of 15.13.172.12: ********** Re-enter password of 15.13.172.
PAGE 245
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA smispasswd -h -n -u -s Table 4-2 Individual Management Server Information Command Options Chapter 4 Description -h This is either a DNS resolvable hostname or IP address of the Management Server -n This is the name space configured for the SMI-S CIMOM
PAGE 246
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA When you issue the command with these options, the “Enter password:” will prompt you to input the password associated with the username. After inputting a password and issuing the command, the “Re-enter password:” request will prompt you to re-enter the same password again for verification.
PAGE 247
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Defining EVA Storage Cells and DR Groups On the same node, which the management server list was created, define the EVA storage cells and DR Groups information to be used in the Metrocluster Continuous Access EVA environment, and use the evadiscovery tool with the following steps: 1. Create a configuration input file.
PAGE 248
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA # Metrocluster Continuous Access EVA configuration, this file is copied to # all cluster nodes. # # Edit the file to include the appropriate data about the EVA # storage systems and DR groups that will be used in your # Metrocluster Continuous Access EVA environment.
PAGE 249
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA # quotes. # “DR Group - OracleDB1” Enter a DR group name in double # quotes. # # # “5000-1FE1-5000-4081” Enter first storage WWN in double # quotes. # “5000-1FE1-5000-4084” Enter second storage WWN in double # quotes. # # ”DR Group - Package2” Enter a DR group name in double quotes.
PAGE 250
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA % Verifying the storage systems and DR Groups ……… Generating the mapping data ………… Adding the mapping data to the file /etc/dtsconf/caeva.map ……… The mapping data is successfully generated. The command generates the mapping data and stores it in /etc/dtsconf/caeva.map The mapping file /etc/dtsconf/caeva.
PAGE 251
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA package. In this case the DR Group's internal IDs are regenerated by the EVA system. Update the external configuration file if any name of storage systems or DR groups is changed, run the evadiscovery utility, and redistribute the map file /etc/dtsconf/caeva.map to all Metrocluster clustered nodes.
PAGE 252
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Configuring Volume Groups This section describes the required steps to create a volume group for use in a Metrocluster Continuous Access EVA environment. Identifying Special Device File Name for Vdisk in DR Group using Secure Path V3.0D or V3.
PAGE 253
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA From the output file, look for the special device file name that corresponds to the WWN identifier of the Vdisk in the DR group. Use the special device file while creating the volume group, which is described in section, “Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F”.
PAGE 254
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA For more detailed information on setting up Command View EVA for configuring, managing, and monitoring your HP StorageWorks Enterprise Virtual Array Storage System, refer to the HP StorageWorks Command View EVA Getting Started Guide (part number: AA-RQZBE-TE). Identifying Special Device Files using Secure Path v3.
PAGE 255
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA /dev/dsk/c9t0d2 /dev/dsk/c15t0d2 /dev/dsk/c21t0d2 /dev/dsk/c4t0d2 /dev/dsk/c10t0d2 /dev/dsk/c16t0d2 /dev/dsk/c22t0d2 Active Active Active Active Active Active Active From the output display identify the device file listing that corresponds with the WWN of the vdisk in the DR group.
PAGE 256
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA In the following sample listing there are eight device files that correspond to different paths to the same vdisk. Use all the device files identified while creating a volume group which is described in section, “Configuring Volume Groups using PVLinks”. ======================= ======================= mc-node1.cup.hp.com Virtual Disk Name..
PAGE 257
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Creating Volume Groups using Source Volumes for Secure Path v3.0D, v3.0E, and v3.0F Use the following procedure to create volume groups for source volumes and export them for access by other nodes. NOTE Create volume groups only for source storage on a locally connected EVA unit.
PAGE 258
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA 7. Create a back-up config file that will contain the cluster ID, having already an ID on disks/luns. # vgcfgbackup /dev/vgname 8. Use the vgexport command with the -p option to export the Volume Groups on the primary system without removing the HP-UX device files.
PAGE 259
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA The following commands are an example of how VG using Pvlink is created for the vdisk identified by WWN 6005-08b4-0010-203d-0000-6000-0017-0000: # # # # # # # # # pvcreate vgcreate vgextend vgextend vgextend vgextend vgextend vgextend vgextend -f /dev/dsk/c16t0d1 /dev/vgname /dev/dsk/c16t0d1 /dev/vgname /dev/dsk/c17t0d1 /dev/vgname /dev/dsk
PAGE 260
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Importing Volume Groups on Nodes at the Same Site Use the following procedure to import volume groups on cluster nodes located at the same site as the EVA on which you are doing the Logical Volume Manager configuration. The sample script mk2imports can be modified to automate these steps.
PAGE 261
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Importing Volume Groups on Nodes at the Remote Site Use the following procedure to import volume groups on all cluster nodes located at the site of the remote EVA. The sample script mk2imports can be modified to automate these steps. 1. Define the Volume Groups on all nodes at the same site that will run the Serviceguard package.
PAGE 262
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Preparing a Serviceguard Cluster for Metrocluster Continuous Access EVA Figure 4-4 262 EVA Command View DR Group Properties Chapter 4
PAGE 263
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA Configuring Packages for Automatic Disaster Recovery After completing the following steps, packages will be able to automatically fail over to an alternate node in another data center and still have access to the data that they need in order to operate.
PAGE 264
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA Set the value of RUN_SCRIPT_TIMEOUT in the package configuration file to NO_TIMEOUT or to a large enough value to take into consideration the extra startup time required to obtain status from the EVA. NOTE If using the EMS disk monitor as a package resource, do not use NO_TIMEOUT.
PAGE 265
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA replication technology (caeva) used. The extension of the file must be env. The following examples demonstrate how the environment file name should be chosen. Example 1: If the file name of the control script is pkg.cntl, the environment file name would be pkg_caeva.env. Example 2: If the file name of the control script is control_script.
PAGE 266
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA g. Set the DC1_SMIS_LIST variable to the list of Management Servers which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. h. Set the DC1_HOST_LIST variable to the list of clustered nodes which resides in Data Center 1. Multiple names are defined using a comma as a separator between the names. i.
PAGE 267
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA Using ftp may be preferable at your organization, since it does not require the use of a.rhosts file for root. Root access via .rhosts may create a security issue. 10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: pkgname.cntl Seviceguard package control script pkgname_caeva.
PAGE 268
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA If the log disk is not full, when a Continuous Access connection is re-established, the contents of the log are written to the destination Vdisk to synchronize it with the source Vdisk. This process of writing the log contents, in the order that the writes occurred, is called merging.
PAGE 269
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Building a Metrocluster Solution with Continuous Access EVA # cmmodpkg -e pkgname Planned maintenance is treated the same as a failure by the cluster. If you take a node down for maintenance, package failover and quorum calculation is based on the remaining nodes. Make sure that the nodes are taken down evenly at each site, and that enough nodes remain on-line to form a quorum if a failure occurs.
PAGE 270
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA The following section describes how to configure a continental cluster solution using Continuous Access EVA, which requires the HP Metrocluster with Continuous Access EVA product.
PAGE 271
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA Set the AUTO_RUN flag to NO. This is to ensure the package will not start when the cluster starts. Only after the primary packages start, use cmmodpkg to enable package switching on all primary packages.
PAGE 272
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA Example 2: If the file name of the control script is control_script.sh, the environment file name would be control_script_caeva.env. 6. Edit the environment file _caeva.env as follows: a. Set the CLUSTER_TYPE variable to CONTINENTAL b.
PAGE 273
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA i. Set the DC2_STORAGE_WORLD_WIDE_NAME variable to the world wide name of the EVA storage system which resides in Data Center 2. This WWN can be found on the front panel of the EVA controller, or from command view EVA UI. j. Set the DC2_SMIS_LIST variable to the list of Management Server, which resides in Data Center 2.
PAGE 274
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA 10. Using standard Serviceguard commands (cmruncl, cmhaltcl, cmrunpkg, cmhaltpkg), test the primary cluster for cluster and package startup and package failover. 11. Any running package on the primary cluster that will have a counterpart on the recovery cluster must be halted at this time.
PAGE 275
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA switching on a recovery package will be automatically set by the cmrecovercl command on the recovery cluster when it successfully starts the recovery package. 3. Create a package control script. # cmmakepkg -s pkgname.cntl Customize the control script as appropriate to your application using the guidelines in Managing Serviceguard.
PAGE 276
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA removing any quotes around the file names. The operator may create the FORCEFLAG file in this directory. See Appendix B for an explanation of these variables. c. Set the DT_APPLICATION_STARTUP_POLICY variable to one of two policies: Availability_Preferred, or Data_Currency_Preferred. d.
PAGE 277
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA l. Set the QUERY_TIME_OUT variable to the number of seconds to wait for a response from the SMI-S CIMOM in Management Server. The default timeout is 300 seconds. The recommended minimum value is 20 seconds. 7.
PAGE 278
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA Setting up the Continental Cluster Configuration The steps below are the basic procedure for setting up the Continentalclusters configuration file and the monitoring packages on the two clusters. For complete details on creating and editing the configuration file, refer to Chapter 2, “Designing a Continental Cluster.” 1.
PAGE 279
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status. NOTE 8. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. 9.
PAGE 280
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA Failover to Recovery Site After reception of the Continentalcluster’s alerts and alarm, the administrators at the recovery site follow the prescribed processes and recovery procedures to start the protected applications on the recovery cluster.
PAGE 281
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA Failover Scenarios The goal of HP Continentalclusters is to maximize system and application availability. However, even systems configured with Continentalclusters can experience hardware failures at the primary site or the recovery site, as well as the hardware or networking failures connecting the two sites.
PAGE 282
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA Failback to the Primary Site In this scenario the disk array is repaired or a new EVA array is commissioned at the primary site.
PAGE 283
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA As described in the above scenario, Continentalclusters can be reconfigured to provide monitoring and recovery for the application now running on its recovery cluster. This is done by switching the identities of the sites in the applications context.
PAGE 284
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with Continuous Access EVA Completing and Running a Continental Cluster Solution with Continuous Access EVA 284 Chapter 4
PAGE 285
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF 5 Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF The EMC and Symmetrix Remote Data Facility (EMC SRDF) disk arrays allows configuration of physical data replication solutions to provide disaster tolerance for Serviceguard clusters over long distances. This chapter describes the EMC SRDF software and the additional files that integrate the EMC with Serviceguard clusters.
PAGE 286
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Files for Integrating Serviceguard with EMC SRDF Files for Integrating Serviceguard with EMC SRDF Metrocluster is a set of scripts and an environment file that work in an Serviceguard cluster to automate failover to alternate nodes in the case of disaster in a metropolitan cluster.
PAGE 287
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Files for Integrating Serviceguard with EMC SRDF facility. In the event of node failure, the integration of Metrocluster with EMC SRDF with the package will allow the application to fail over in the following ways: • Among local host systems that are attached to the same EMC Symmetrix.
PAGE 288
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Overview of EMC and SRDF Concepts Overview of EMC and SRDF Concepts EMC and Symmetrix Remote Data Facility (SRDF) is a Symmetrix-based business continuance and disaster recovery solution. SRDF is a configuration of Symmetrix systems, the purpose of which is to maintain multiple, real-time copies of logical volume data in more than one location.
PAGE 289
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Overview of EMC and SRDF Concepts Figure 5-1 EMC R1 and R2 Definitions Symmetrix Array B1 Symmetrix Array R1 B2 R2 Optional BVCs R1a Data Center A SRDF link may be bidirectional for different disk devices There may be multiple R1/R2 devices Packages with primary nodes in this data center. See this Symmetrix as the R1 side and the Symmetrix in Data Center B as the R2 side.
PAGE 290
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication Preparing the Cluster for Data Replication When the following procedures are completed, an adoptive node will be able to access the data belonging to a package after it fails over. Use the convenience scripts in the /opt/cmcluster/toolkits/SGSRDF/Samples to automate some of the tasks in the following sections: • mk3symgrps.
PAGE 291
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication Issue the following command on each node after the hardware is installed. # symcfg discover This builds the CLI database on the node. Display what is in the EMC Solutions Enabler database.
PAGE 292
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication Determining Symmetrix Device Names on Each Node To correctly specify the device file names when creating Symmetrix device groups, be sure to map the HP-UX device files to the R1 and R2 Symmetrix devices. Use the following steps to gather the necessary information: 1.
PAGE 293
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication Figure 5-3 Sample syminq Output from a Node on the R2 Side Device Name /dev/rdsk/c4t0d0 /dev/rdsk/c4t0d1 /dev/rdsk/c4t0d2 /dev/rdsk/c4t0d3 /dev/rdsk/c4t1d0 /dev/rdsk/c4t1d1 /dev/rdsk/c4t1d0 /dev/rdsk/c3t1d1 /dev/rdsk/c3t1d0 /dev/rdsk/c3t1d1 /dev/rdsk/c3t3d0 /dev/rdsk/c3t3d1 /dev/rdsk/c3t3d2 Type Product Vendor ID Rev Ser Num Cap(KB) R2 R2 R1 R1 BCV BCV GK GK BCV BCV R2 R2 R1
PAGE 294
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication — The next three hexadecimal digits are the unique Symmetrix device number that is seen in the output of the status command: # symrdf -g symdevgrpname query This is used by the Metrocluster with Symmetrix SRDF control script and saved in the file /etc/cmcluster/package_name/symrdf.out. The contents of this file may be useful for debugging purposes.
PAGE 295
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication 019B 0017 019C 0018 019C 0019 R1:5 R1:5 R1:5 RW RW RW RW RW RW RW RW RW Figure 5-5 S.. S.. S..
PAGE 296
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication Figure 5-6 Sample symrdf list Output from R2 Side Local Device View STATUS Sym RDF MODES RDF S T A T E S -------- ----- --------- R1 Ivn Dev RDev Typ:G SA RA LNK Mode Dom ACp Tracks 000 001 004 005 006 007 008 000 001 004 005 006 007 008 OFF OFF OFF OFF OFF OFF OFF R2:1 R2:1 R2:1 R2:1 R1:2 R1:2 R2:1 NR NR RW RW RW RW RW WD WD WD WD RW RW WD RW RW RW RW RW RW RW SYN S
PAGE 297
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication Table 5-2 Symmetrix ID, device #, and type ID 95 Dev# 005 Type R1 ID 50 Dev# 014 Type R2 ID 95 Dev# 00A Type R2 ID 50 Dev# 012 Type R1 ID 95 Dev# 040 Type GK ID 50 Dev# 041 Type GK ID 95 Dev# 028 Type BCV Chapter 5 Mapping for a 4 Node Cluster connected to 2 Symmetrix Arrays Node 1 /dev/rdsk device file name Node 2 /dev/rdsk device file na
PAGE 298
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Preparing the Cluster for Data Replication NOTE The Symmetrix device number may be the same or different in each of the Symmetrix units for the same logical device. In other words, the device number for the logical device on the R1 side of the SRDF link may be different from the device number for the logical device on the R2 side of the SRDF link.
PAGE 299
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Building a Metrocluster Solution with EMC SRDF Setting up 1 by 1 Configurations The most common Symmetrix configuration used with Metrocluster with EMC SRDF is a 1 by 1 configuration in which there is a single Symmetrix frame at each Data Center. This section describes how to set up this configuration using EMC Solutions Enabler and HP-UX commands.
PAGE 300
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF NOTE The sample scripts mk3symgrps.nodename can be modified to automate these steps. 1. Use the symdg command, or modify the mk3symgrps.nodename script to define an R1 and an R2 device group for each package. # symdg create -type RDF1 devgroupname Issue the above command on nodes attached to the R1 side.
PAGE 301
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF The script must be customized for each system including: • Particular HP-UX device file names. • Symmetrix device group name (an arbitrary, but unique name may be chosen for each group that defines all of the volume groups (VGs), which belong to a particular Serviceguard package). • Keyword RDF1 or RDF2.
PAGE 302
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Verifying the EMC Symmetrix Configuration When finished with all these steps, use the symrdf list command to get a listing of all devices and their states. Back up the EMC Solutions Enabler database on each node, so that these configuration steps do not have to be repeated if a failure corrupts the database.
PAGE 303
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Importing Volume Groups on Other Nodes Use the following procedure to import volume groups. The sample script mk2imports can be modified to automate these steps: 1. Import the VGs on all of the other systems that might run the Serviceguard package and backup the LVM configuration.
PAGE 304
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Grouping the Symmetrix Devices at Each Data Center The use of R1/R2 devices in M by N configurations of multiple Symmetrix frames is enabled by means of consistency groups. A consistency group is a set of Symmetrix RDF devices that are configured to act in unison to maintain the integrity of a database.
PAGE 305
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Figure 5-8 2 X 2 Node and Data Center Configuration with Consistency Groups Data Center A node 1 When these links both go down... x Data Center B node 3 x pkg A pkg C These links are suspended by EMC PowerPath...
PAGE 306
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF on all nodes and the Symmetrix CLI database on each node has already been setup, as described in the section, “Preparing the Cluster for Data Replication” on page 290. CAUTION M by N configurations cannot be used with R1/R2 swapping. Figure 5-9 depicts a 2 by 2 configuration. Data in this figure are used in the example commands given in the following sections.
PAGE 307
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Creating Symmetrix Device Groups For each node on the R1 side (node1 and node2), create the device groups as follows. Note: It is necessary to create two device groups since device groups do not span frames. The following examples are based on the configuration shown in Figure 5-9. 1. Create device groups using the following commands on each node on the R1 side.
PAGE 308
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF # symbcv -g dgoraA add dev 01B # symbcv -d dgoraB add dev 052 # symbcv -d dgoraB add dev 053 6. To manage the BCV devices from the R1 side, it is necessary to associate the BCV devices with the device groups that are configured on the R1 side. Use the following commands on hosts directly connected to the R1 Symmetrix.
PAGE 309
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF # symgate -sid 363 define dev 00B # symgate -sid 021 -g dgoraA associate dev 002 # symgate -sid 363 -g dgoraB associate dev 00B Creating the Consistency Groups To configure consistency groups for using Metrocluster with EMC SRDF, first create device groups and gatekeeper groups as described in previous sections.
PAGE 310
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF NOTE This important step must be carried out on every node. 4. Establish the BCV devices in the secondary Symmetrix as a mirror of the standard device. From either node3 or node4. # symmir -cg cgoradb -full est # symmir -cg cgoradb -full est Alternatively, from either node1 or node2.
PAGE 311
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF 4. Create the logical volumes. (XXXX indicates size in MB) # lvcreate -L XXXX /dev/vgoraA # lvcreate -L XXXX /dev/vgoraB 5. Install a VxFS file system on the logical volumes. # newfs -F vxfs /dev/vgoraA/rlvol1 # newfs -F vxfs /dev/vgoraB/rlvol1 6. Create map files to permit exporting the volume groups to other systems.
PAGE 312
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Creating VxVM Disk Groups using Metrocluster with EMC SRDF If using VERITAS storage, use the following procedure to create disk groups. It is assumed VERITAS root disk (rootdg) has been created on the system where configuring the storage. The following section shows how to set up VERITAS disk groups. On one node do the following: 1.
PAGE 313
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Validating VxVM Disk Groups using Metrocluster with EMC SRDF The following section shows how to validate VERITAS diskgroups. On one node do the following: 1. Deport the disk group. # vxdg deport logdata 2. Enable other cluster nodes to have access to the disk group. # vxdctl enable 3. Split the SRDF link to enable R2 Read/Write permission.
PAGE 314
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF In a Metrocluster/SRDF environment, VxVM commands should not be run against write-disabled disks. This is due to VxVM potentially putting these disks into an offline state. Subsequent activation of a VxVM disk group might fail when the disks are again write-enabled, and requires a vxdisk scandisks to be executed prior to disk group activation.
PAGE 315
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Data Center A, and R2 volumes are at Data Center B. R1 volumes for pkg C and pkg D are at Data Center B, and R2 volumes are at Data Center A.
PAGE 316
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF • Create the EMC Solutions Enabler database, and build Symmetrix device groups, consistency groups, and gatekeepers for each package. Export exclusive volume groups for each package as described in “Preparing the Cluster for Data Replication” on page 290. This must be done on each node that will potentially run the package.
PAGE 317
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF NOTE If using the EMS disk monitor as a package resource, do not use NO_TIMEOUT. Otherwise, package shutdown will hang if there is not access from the host to the package disks. This toolkit may increase package startup time by 5 minutes or more.
PAGE 318
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF /etc/cmcluster/pkgname/pkgname_srdf.env NOTE If not use a package name as a filename for the package control script, it is necessary to follow the convention of the environment file name. This is the combination of the file name of the package control script without the file extension, an underscore and type of the data replication technology (srdf) used. The extension .
PAGE 319
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF e. Uncomment the DEVICE_GROUP variables EMC Symmetrix for the local disk array and set it to the Symmetrix device group names given in the symdg list command. If you are using an M by N configuration, configure the DEVICE_GROUP variable with the name of the consistency group. f. Uncomment the RETRY and RETRYTIME variables.
PAGE 320
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF 10. Verify that each node in the Serviceguard cluster has the following files in the directory /etc/cmcluster/pkgname: pkgname.cntl Serviceguard package control script pkgname_srdf.env Metrocluster EMC SRDF environment file pkgname.ascii Serviceguard package ASCII configuration file pkgname.
PAGE 321
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF 2. Split the logical SRDF links for the package. # Samples/pre.cmquery 3. Distribute the Metrocluster EMC SRDF configuration changes. # cmapplyconf -P pkgconfig 4. Restore the logical SRDF links for the package. # Samples/post.cmapply 5. Start the package with the appropriate Serviceguard command. # cmmodpkg -e pkgname No checking of the status of the SA/FA ports is done.
PAGE 322
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF site fails, and the application starts up on the R2 side. Since the resynchronization did not complete when there was a failure on the R1 side, the data on the R2 side is corrupt. Using the BCV in Resynchronization In the case described above, you use the business continuity volumes, which protect against a rolling disaster.
PAGE 323
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF takes place when the resync is in progress. This ensures the package would not automatically start and operate on the inconsistent data in the event of a rolling disaster. As demonstrated above, the re-sync is a manual process and initiated by an operator after the links are fixed.
PAGE 324
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF R1/R2 Swapping using Metrocluster SRDF The Metrocluster SRDF package can be configured to automatically do R1/R2 swapping upon package failover. To enable R1/R2 swapping in the package, set the environment variable AUTOSWAPR2 in the _srdf.env file to 1 or 2.
PAGE 325
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF Scenario 1: In this scenario, the package failover is due to host failure or due to planned downtime maintenance. The SRDF links and the Symmetrix frames are still up and running.
PAGE 326
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Metrocluster Solution with EMC SRDF CAUTION 326 R1/R2 Swapping cannot be used in an M by N Configuration.
PAGE 327
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Some Further Points Some Further Points Following are listed some EMC Symmetrix specific requirements: • R1 and R2 devices have been correctly defined and assigned to the appropriate nodes in the internal configurations that is downloaded by EMC support staff. • R1 devices are locally protected (RAID 1 or RAID S); R2 devices are locally protected (RAID 1, RAID S or BCV).
PAGE 328
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Some Further Points — Domino Mode is not enabled — the SRDF links fail — the application continues to modify the data — the link is restored — resynchronization from R1 to R2 starts, but does not finish — the R1 side fails Although the risk of this occurrence is extremely low, if the business cannot afford even a minor amount risk, then it is required to enable Domino Mode to ensure that the data at the R2 side are always con
PAGE 329
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Some Further Points The Symmetrix device group name must be the same on each host for both R1 side and R2 side. This group name is placed variable DEVICE_GROUP defined in the pkg.env file. Although the name of the device group must be the same on each node, the special device file names specified may be different on each node. Symmetrix Logical Device names MUST be default names of the form “DEVnnn” (for example, DEV001).
PAGE 330
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Some Further Points 330 • This toolkit may increase package startup time by 5 minutes or more. Packages with many disk devices will take longer to start up than those with fewer devices due to the time needed to get device status from the Symmetrix. Clusters with multiple packages that use devices on the Symmetrix will cause package startup time to increase when more than one package is starting at the same time.
PAGE 331
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication Metrocluster with SRDF/Asynchronous Data Replication The following sections presents concepts, functionality and requirements for configuring Metrocluster using SRDF/Asynchronous data replication. SRDF/Asynchronous delivers asynchronous data replication solutions featuring a consistent and restartable copy of the production data at the remote side.
PAGE 332
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication • 332 At the R2 site, there is a receive cycle (N-1), which is receiving data from the transmit cycle at R1. The apply cycle (N-2) at the remote site is marking all the tracks from a previous cycle as write-pending to the secondary devices (R2). The data is considered committed to the R2 side devices at cycle switch time.
PAGE 333
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication Figure 5-12 Chapter 5 SRDF/Asynchronous Basic Functionality 333
PAGE 334
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication Requirements for using SRDF/Asynchronous in a Metrocluster Environment The following describes the hardware and software requirements for setting up SRDF/Asynchronous in a Metrocluster environment: Hardware Requirements • EMC supports SRDF/Asynchronous on Symmetrix DMX Series only.
PAGE 335
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication Preparing the Cluster for SRDF/Asynchronous Data Replication The following sections, “Metrocluster with SRDF/Asynchronous Data Replication”, and “Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous” describe architectures and configurations for preparing SRDF/Asynchronous data replication.
PAGE 336
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication Figure 5-13 336 Metrocluster Topology using SDRF/Asynchronous Chapter 5
PAGE 337
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Metrocluster with SRDF/Asynchronous Data Replication Data replication can utilize any extended SAN devices that support SRDF Links, for example DWDM, Fiber Channel over Internet Protocol, etc. However, since the network for a Serviceguard cluster heartbeat requires a “Dark Fiber” link, it is recommended to utilize the DWDM links for SRDF/Asynchronous data replication.
PAGE 338
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous The following sections, “Building a Device Group for SRDF/Asynchronous”, and “Package Configuration using SRDF/Synchronous or SRDF/Asynchronous” describe the steps for building a device group and package configuration in an SRDF/Asynchronous environment.
PAGE 339
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous 2. Create an RDF1 type device group. For example, the group name AsynDG. On R1 side: # symdg create AsynDG -type RDF1 On R2 side: # symdg create AsynDG -type RDF2 3. All devices from the RDF (RA) group configuration are added to the device group for SRDF/Asynchronous operation.
PAGE 340
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous 7. Consistency protection must be enabled to ensure the data consistency on R2 side for the SRDF/Asynchronous devices in the device group. # symrdf -g AsynDG enable 8. If the SRDF pairs are not in a Consistent state at this point, initiate an establish command to synchronize the data on the R2 side from the R1 side.
PAGE 341
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous 1. Copy the template file that is shipped with the Metrocluster with EMC SRDF product from /opt/cmcluster/toolkit/SGSRDF/srdf.env to the package directory. 2. Customize the template file based on the requirements in your environment.
PAGE 342
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Configuring Metrocluster with EMC SRDF using SRDF/Asynchronous Package Failover using SRDF/Asynchronous The EMC Solutions Enabler provides a control operation checkpoint to confirm that the data written in the current SRDF/Asynchronous cycle has been successfully committed to the R2 side. When a package fails over to secondary site, Metrocluster with EMC SRDF ensures the most current data when the SRDF link is still up.
PAGE 343
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF Building a Continental Cluster Solution with EMC SRDF The following section describes how to configure a continental cluster solution using EMC SRDF, which requires the Metrocluster with EMC SRDF product. Setting up a Primary Package on the Primary Cluster Use the procedures in this section to configure a primary package on the primary cluster.
PAGE 344
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF # cp /opt/cmcluster/toolkit/SGSRDF/srdf.env \ /etc/cmcluster/pkgname/pkgname_srdf.env 6. Create an Serviceguard Application package configuration file. # cd /etc/cmcluster/ # cmmakepkg -p .conf Customize it as appropriate to your application. Be sure to include Node names, the pathname of the control script (/etc/cmcluster//.
PAGE 345
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory must be unique for each package and is used for status data files. For example, set PKGDIR to /etc/cmcluster/. d.
PAGE 346
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF 11. Add customer-defined run and halt commands in the appropriate places according to the needs of the application. See the Serviceguard manual for more information on these functions. 12. Distribute EMC SRDF package configuration, environment, and control script files to other nodes in the primary cluster by using ftp or rcp. # rcp -p /etc/cmcluster//.
PAGE 347
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF Setting up a Recovery Package on the Recovery Cluster The installation of EMC SRDF, Serviceguard, and Continentalclusters software is exactly the same as in the previous section. The procedures below will install and configure a recovery package on the recovery cluster.
PAGE 348
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF Be sure to set AUTO_RUN to NO in the package ASCII file. 7. Edit the recovery package environment file _srdf.env as follows: a. Add the path for EMC Solutions Enabler software binaries. b. Make sure that all AUTO* variables are uncommented. c.
PAGE 349
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF 10. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names. 11. Apply the Serviceguard configuration using the cmapplyconf command or SAM for the recovery cluster. 12. Test the cluster and packages.
PAGE 350
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF 1. Split the SRDF logical links for the disks associated with the application package. See the script Samples/pre.cmquery for an example of how to automate this task. The script must be customized with the Symmetrix device group names. 2. Generate the Continentalclusters configuration using the following command: # cmqueryconcl -C cmconcl.config 3.
PAGE 351
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF # cmapplyconcl -C cmconcl.config 9. Start the monitor package on both clusters. The monitor package for a cluster checks the status of the other cluster and issues alerts and alarms, as defined in the Continentalclusters configuration file, based on the other cluster’s status. 10. Check /var/adm/syslog/syslog.log for messages. Also check the ccmonpkg package log file. 11.
PAGE 352
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF Failback Scenarios There is no failback counterpart to the “pushbutton” failover from the primary cluster to the recovery cluster. Failback is dependent on the original nature of the failover, the state of primary and secondary Symmetrix SRDF volumes (R1 and R2) and the condition of the primary cluster.
PAGE 353
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF After power is restored to the primary site, the Symmetrix device groups may be in the status of Failed Over. The procedure to move the application packages back to the primary site are different depending on the status of the device groups. The following procedure applies to the situation where the device groups have a status of “Failed Over”: 1.
PAGE 354
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF 6. Verify device group is synchronized. # symrdf list 7. Manually bring the package back if the package does not come up, and the device group status is “failed over.” # symrdf -g pkgCCB_r1 failback Execute an RDF ’Failback’ operation for device group ’pkgCCB_r1’ (y/[n]) ? y An RDF ’Failback’ operation execution is in progress for device group ’pkgCCB_r1’. Please wait...
PAGE 355
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF RDev Pair --- ---- ----- --------- ---------------------------000 RW 001 WD 000 R2:2 RW WD RW Synchronized 001 R2:2 RW WD RW Invalid ------ ------ --- SYN DIS OFF 0 0 WD SYN DIS OFF 12 0 WD ftsys1a# symrdf list Symmetrix ID: 000183500021 Local Device View --------------------------------------------------------------------------STATUS M O D E S RDF S T A T E S
PAGE 356
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF SRDF paired volumes. Since the systems at the primary site are accessible, but the Symmetrix is not, the control file will evaluate the paired volumes with a local status of “failed over”.
PAGE 357
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF Since the most current data will be at the remote or recovery site, this command to synchronize from the remote site). Wait for the synchronization process to complete before progressing to the next step. Failure to wait for the synchronization to complete will result in the package failing to start in the next step. 6.
PAGE 358
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF 4. Verify SRDF Links. # symrdf list On the recovery cluster, do the following: 1. Start the recovery cluster. # cmruncl -v The recovery cluster comes up with ccmonpkg up. The application packages (bkpkgX) stay down, and ccmonpkg is up. 2. Do not manually start application packages on the recovery cluster; this will cause data corruption. 3. Confirm recovery cluster status.
PAGE 359
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF CAUTION Never enable package switching on both the primary package and the recovery package. 4. Halt the monitor package. # cmhaltpkg ccmonpkg 5. To apply the new continental cluster configuration. # cmapplyconcl -C 6. Restart the monitor package.
PAGE 360
Building Disaster Tolerant Serviceguard Solutions Using Metrocluster with EMC SRDF Building a Continental Cluster Solution with EMC SRDF 360 Chapter 5
PAGE 361
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture 6 Designing a Disaster Tolerant Solution Using the Three Data Center Architecture This chapter describes Three Data Center architecture through the following topics: NOTE Chapter 6 • Overview of Three Data Center Concepts • Overview of HP XP StorageWorks Three Data Center Architecture • Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP • Configuring an XP Three Data Center S
PAGE 362
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Overview of Three Data Center Concepts Overview of Three Data Center Concepts A Three Data Center solution integrates Serviceguard, Metrocluster Continuous Access XP, Continentalclusters and HP StorageWorks XP 3DC Data Replication Architecture.
PAGE 363
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Overview of Three Data Center Concepts Figure 6-1 Chapter 6 Three Data Center Solution Overview 363
PAGE 364
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Overview of Three Data Center Concepts The Three Data Center solution provides the following benefits: 364 • Maintains high performance. Using synchronous replication over a short distance in a Metrocluster environment provides the highest level of data currency and application availability without significant impact to application performance. • Allows swift recovery.
PAGE 365
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP A Three Data Center configuration uses a disaster tolerant architecture made up of two data centers which are located locally in a Metrocluster and a third data center located remotely.
PAGE 366
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP It is recommended to maintain a consistent copy of the volume at the remote site, using HP StorageWorks Business Copy XP (BC-XP). This is particularly useful in case of a rolling disaster, which is a disaster that occurs before the cluster is able to recover from a non-disastrous failure.
PAGE 367
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Figure 6-2 shows a typical configuration of a Disaster Tolerant Three Data Center architecture.
PAGE 368
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Figure 6-2 368 Three Data Center Architecture Chapter 6
PAGE 369
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Overview of HP XP StorageWorks Three Data Center Architecture HP XP StorageWorks Three data center architecture enables data to be replicated over three data centers concurrently using a combination of Continuous Access Synchronous and Continuous Access Journaling data replication.
PAGE 370
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Figure 6-3 370 XP Three Data Center Multi-Target Bi-Link Configuration Data Replication Chapter 6
PAGE 371
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Figure 6-4 Chapter 6 3DC Multi-Hop Bi-Link Configuration Data Replication 371
PAGE 372
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Three Data Center Multi-Hop Bi-Link Configuration In an XP 3DC Multi-Hop Bi-Link configuration the data enters the system on one XP array, is replicated synchronously to the next XP array, and from there is replicated to the last XP array.
PAGE 373
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP A mirror unit descriptor (MU#) is a special index number available with all volumes that provides an individual designator for each copy of the volume. The mirror unit descriptor is provided in the Raid Manager configuration files to indicate the nature of the copy.
PAGE 374
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Figure 6-5 374 Mirror Unit Descriptors Chapter 6
PAGE 375
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Figure 6-6 depicts a typical Three Data Center pair configurations with MU# usage in Multi-Target and Multi-Hop topologies. NOTE The MU# h2 device group pair must be defined in the XP Three Data Center configuration, since it is being used as a bridge for the remote site pair state query.
PAGE 376
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Configuring an XP Three Data Center Solution After the hardware set up is completed for all three data centers including the data replication links between data centers according to Multi-Hop-Bi-Link or Multi-Target-Bi-Link configuration the next step is the software installation and configuration.
PAGE 377
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Create another Serviceguard cluster with components in the third data center as described in the Managing Serviceguard user's guide. This cluster will act as a recovery cluster in the Continentalclusters environment. Creating the Continental Cluster Install Continentalclusters software on all nodes participating in the 3DC solution.
PAGE 378
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP HP StorageWorks RAID Manager Configuration XP RAID Manager host based software is used to create and manage the device group pairs in a three data center configuration.
PAGE 379
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP • HORCM_CMD-enter the primary and alternate link device file names for both the primary and redundant command devices (for a total of four raw device file names). 6. If the Raid Manager protection facility is enabled, set the HORCPERM environment variable to the pathname of the HORCM permission file, then export the variable.
PAGE 380
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP The raidscan command must be invoked separately for each host interface connection to the disk array. For example, if there are two Fibre Channel host adapters. # raidscan -p CL1-A # raidscan -p CL1-B NOTE There must also be alternate links for each device, and these must be on different busses inside the XP disk array.
PAGE 381
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Multi-Target-Bi-Link configurations, two device groups represent real Continuous Access-Sync and Continuous Access-Journal pairs. The third is a “phantom” device group that can be used as a bridge to communicate with the far site.
PAGE 382
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Multi-Target Raid Manager Configuration For a Multi-Target topology, the DC1, is configured as primary site of an application and is the source of the data replicating to the DC2 and DC3 as shown in Figure 6-7.
PAGE 383
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Sample Raid Manager Configuration on a DC1 NodeA (multi-target bi-link) HORCM_MON # ip_address NodeA service horcm0 HORCM_CMD # dev_name /dev/rdsk/c6t12d0 poll(10ms) 1000 dev_name /dev/rdsk/c9t12d0 timeout(10ms) 3000 dev_name HORCM_DEV # dev_group dg dev_name dg_d0 port# CL3-E TargetID 6 dg_1 dg_1_d0 CL3-E 6 HORCM_INST # d
PAGE 384
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP # communicate with DC3 nodes dg_p NodeC.dc3.
PAGE 385
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Multi-Hop Raid Manager Configuration Figure 6-8 depicts a Multi-Hop topology where DC1 is configured as the primary site and is the data replicating source to DC2 and DC3.
PAGE 386
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Sample Raid Manager Configuration on a DC1 NodeA (multi-hop-bi-link) HORCM_MON # ip_address service poll(10ms) timeout(10ms) NodeA horcm0 1000 3000 HORCM_CMD # dev_name /dev/rdsk/c6t12d0 HORCM_DEV # dev_group dg dev_name /dev/rdsk/c9t12d0 dev_name dg_d0 # phantom device group dg_p dg_p_d0 port# CL3-E dev_name TargetID 6 CL3-E 6
PAGE 387
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP # communicate with DC3 nodes dg_1 NodeC.dc3.
PAGE 388
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP This parameter is used to describe the Serial number of XP Storage Array as follows: This parameter is used to describe the LDEV number in an XP Storage Array, and is supported by the three types of formats as follows: • Specifying CU:LDEV in hex used by SVP or Web console Example for LDEV# 260 01:04 • Speci
PAGE 389
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP NOTE The Raid Manager configuration file must be different for each host, especially for the HORCM_MON and HORCM_INST fields. Creating Device Group Pairs An application configured for an XP Three Data Center solution contains two device groups; Continuous Access-Sync and Continuous Access-Journal device groups.
PAGE 390
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP — paircreate -g dg_1 -vl -f async -c 15 -jp 2 -js 2 NOTE Paired devices must be of compatible sizes and types. Only OPEN-V LUNs are supported for three data center configuration. NOTE There is no need to issue the “paircreate” command on phantom device groups.
PAGE 391
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Where the VG name and minor number nn are unique for each volume group defined in the node. 2. Create the Volume Group only on the one node in primary data center (DC1). Use the vgcreate and the vgextend command, specifying the appropriate special device file names. See the sample script /opt/cmcluster/toolkit/SGCA/Samples/mk1VGs. 3.
PAGE 392
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 2. Create the disk group to be used with the vxdg command only on one node. # vxdg init logdata c5t0d0 3. Verify the configuration. # vxprint -g logdata 4. Use the vxassist command to create logical volumes. # vxassist -g logdata make logfile 2048m 5. Verify the configuration. # vxprint -g logdata 6. Make the filesystem.
PAGE 393
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP Package Configuration in a Three Data Center Environment This procedure must be repeated on all the participating nodes for each Serviceguard package. As there are two Serviceguard clusters, packages must be configured individually in each cluster.
PAGE 394
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 5. Copy the environment file template /opt/cmcluster/toolkit/SGCA/xpca.env to the package directory, naming it _xpca.env # cp /opt/cmcluster/toolkit/SGCA/xpca.env \ /etc/cmcluster/pkgname/_xpca.
PAGE 395
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP c. Uncomment the PKGDIR variable and set it to the full path name of the directory where the control script has been placed. This directory, which is used for status data files, must be unique for each package. For example, set PKGDIR to /etc/cmcluster/_name, removing any quotes around the file names. d.
PAGE 396
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP • multi-target-bi-link: This value represents Multi-Target Three Data Center configuration with 2 Continuous Access links. • multi-hop-bi-link: This value represents the Multi-Hop Three Data Center configuration with 2 Continuous Access links.
PAGE 397
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP • DC1_DC2_DEVICE_GROUP=”dg” • DC2_DC3_DEVICE_GROUP=”dg_p” • DC1_DC3_DEVICE_GROUP=”dg_1” For Three Data Center Multi-Hop topology: • DC2_DC3 must use Continuous Access-Journal • DC1_DC3 uses phantom device A typical definition of values would be as follows: • DC1_DC2_DEVICE_GROUP=”dg” • DC2_DC3_DEVICE_GROUP=”dg_1” • DC1_DC3
PAGE 398
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Designing a Disaster Tolerant Architecture Using Three Data Center with Continuous Access XP 8. Distribute the Metrocluster/Continuous Access configuration, environment and control script files to all the other nodes in the three data centers. 9. Verify that each node in all data centers has the following files in the directory /etc/cmcluster/: • .cntl Seviceguard package control script. • _xpca.
PAGE 399
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Timing Considerations Timing Considerations In a journal device group, many journal volumes can be configured to hold a significant amount of the journal data (host-write data). The package startup time may increase significantly when a package fails over. Delays in package startup times will occur in these situations: 1. Recovering from a broken pair affinity.
PAGE 400
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Bandwidth for Continuous Access and Application Recovery Time Bandwidth for Continuous Access and Application Recovery Time When a disaster event in the entire Metrocluster causes an application package to be manually failed over to the recovery site (the third data center), the Continentalclusters and storage software performs the following actions: • Perform a takeover by issuing a command to the third data center XP array v
PAGE 401
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Bandwidth for Continuous Access and Application Recovery Time XP array to the SVOL site XP array. The HORCTIMEOUT environment variable in the package's environment file should be configured greater than or equal to this time value. The HORCTIMEOUT value is used by the RAID Manager takeover command to determine the maximum amount of time to allow for the takeover to complete.
PAGE 402
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Data Maintenance with the Failure of a Metrocluster Continuous Access XP Failover Data Maintenance with the Failure of a Metrocluster Continuous Access XP Failover The following section describes data maintenance in the event of a Swap Takeover in a Metrocluster Continuous Access XP environment.
PAGE 403
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Data Maintenance with the Failure of a Metrocluster Continuous Access XP Failover For Multi-Target Topology: 1. split the Continuous Access-Sync device group pair completely (pairsplit -g dg -S). 2. re-create the Continuous Access-Sync pair from original PVOL as source. (use paircreate command) 3. startup package on its primary site.
PAGE 404
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Data Maintenance with the Failure of a Metrocluster Continuous Access XP Failover NOTE You can specify “none” as the copy mode for initial copy operations. If the none mode is selected, full copy operations are not performed. The user is responsible for using the none mode only when the user is sure that data in the primary data volume is exactly the same as data in the secondary data volume.
PAGE 405
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Failback Scenarios Failback Scenarios This section describes the procedures for the following failback scenarios in a Three Data Center environment.
PAGE 406
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Failback Scenarios MULTI-HOP-BI-LINK (DC1 > DC2 > DC3) Data Recovery from DC3 to DC1 The following describes the process to restart a package back to DC1 after the package fails over to DC3: 1. Log on to any node at DC2 and perform the following: a. Check the pair status of the Sync device group: # pairvolchk -g dg -s b. If the local dg volume is in PVOL or SVOL-SSWS. # pairsplit -g dg Go to Step 2 Or c.
PAGE 407
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Failback Scenarios d. Swap the role of Journal device group between DC2 and DC3 # horctakeover -g dg_1 -t 360 e. Wait for the Sync device group to attain the PAIR state. # pairevtwait -g dg_1 -t 300 -s pair 3. Resync sync device group to get latest data from DC2 to DC1. • If in Step 1, the dg pair has been brought to SMPL state. a. Create the DC1 and DC2 Sync device group. # paircreate -g dg -f never/data -c 15 -vl b.
PAGE 408
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Failback Scenarios # horctakeover -g dg 5. Log on and perform the following from any DC2 node: a. Wait for PSUS to come up for the Journal Device Group. # pairevtwait -g dg_1 -t 300 -s psus b. Resync the Journal device group. # pairresync -g dg_1 c. Wait for the PAIR state to come up for the Journal device group.
PAGE 409
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Failback Scenarios # pairvolchk -g dg_1 -s -c b. If dg_1(DC3) is in PVOL # pairresync -g dg_1 Or If dg_1(DC3) is in SVOL-SSWS. # pairresync -g dg_1 -c 15 -swapp c. Wait for the PAIR state to come up for the Journal device group. # pairevtwait -g dg_1 -t 300 -s pair d. Swap the Journal device group role between DC1 and DC3. # horctakeover -g dg_1 -t 360 e. Wait for PAIR state to come up.
PAGE 410
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Failback Scenarios # pairresync -g dg -c 15 -swaps a. Wait for PAIR state to come up # pairevtwait -g dg -t 300 -s pair NOTE 410 Refer to “HP StorageWorks RAID Manager XP User's Guide” for explanation of different command options.
PAGE 411
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Additional Reading Additional Reading The following documents contain additional useful information: • Managing Serviceguard Twelfth Edition (B3936-90100) • Understanding and Designing Serviceguard Disaster Tolerant Architectures (B7660-90018) • Designing Disaster Tolerant HA Clusters Using Metrocluster and Continentalclusters (B7660-90019) Use the following URL to access HP’s High Availability web page: • http://www.hp.
PAGE 412
Designing a Disaster Tolerant Solution Using the Three Data Center Architecture Additional Reading 412 Chapter 6
PAGE 413
Environment File Variables for Serviceguard Integration with Continuous Access XP A Environment File Variables for Serviceguard Integration with Continuous Access XP This appendix lists all Environment File variables that have been modified or added for disaster tolerant Serviceguard solutions that employ Continuous Access XP.
PAGE 414
Environment File Variables for Serviceguard Integration with Continuous Access XP to minimize the down time of the application with the trade-off of having to manually resynchronize the pairs while the application is running at the primary site. If the package has been configured for a three data center environment, this parameter is applicable only when the package is attempting to start up in either the primary (DC1) or secondary (DC2) data center.
PAGE 415
Environment File Variables for Serviceguard Integration with Continuous Access XP Metrocluster/Continuous Access conservatively assumes that the data on the SVOL site may by non-current and uses the value of AUTO_NONCURDATA to determine whether the package is allowed to automatically start up. If the value is 1, Metrocluster/Continuous Access allows the package to startup; otherwise, the package will not be started.
PAGE 416
Environment File Variables for Serviceguard Integration with Continuous Access XP When the package device group fence level is set to NEVER or ASYNC, you are not guaranteed that the remote (SVOL) data site still contains current data (The application can continue to write data to the device group on the PVOL site if the Continuous Access links or SVOL site are down, and it is impossible for Metrocluster/Continuous Access to determine whether the data on the SVOL site is current.
PAGE 417
Environment File Variables for Serviceguard Integration with Continuous Access XP NOTE This variable is also used for the combination of PVOL_PFUS and SVOL_PSUS. When either the side file or journal volumes have reached threshold timeout, the PVOL will become PFUS. If there is a Continuous Access link, or some other hardware failure, and we fail over the secondary site, the SVOL will become PSUS(SSWS) but the PVOL will remain PFUS.
PAGE 418
Environment File Variables for Serviceguard Integration with Continuous Access XP latest data. When starting the package in this state on the PVOL side, you run the risk of losing any changed data in the PVOL. Values: 0—(Default) Do NOT startup the package at the primary site. Require user intervention to choose which side has the good data and resynchronizing the PVOL and SVOL or force the package to start by creating the FORCEFLAG file.
PAGE 419
Environment File Variables for Serviceguard Integration with Continuous Access XP 1—Startup the package after making the SVOL writable. The risk of using this option is that the SVOL data may actually be non-current and the data written to the PVOL side after the hardware failure may be loss. This parameter is not required to be set if a package is configured for three data centers environment because three data center does not support Asynchronous mode of data replication.
PAGE 420
Environment File Variables for Serviceguard Integration with Continuous Access XP This parameter is not required to be set if a package is configured for three data centers environment because three data center does not support Asynchronous mode of data replication. Leave this parameter with its default value in all data centers. AUTO_SVOLPSUS (Default=0) This parameter applies when the PVOL and SVOL both have the state of suspended (PSUS).
PAGE 421
Environment File Variables for Serviceguard Integration with Continuous Access XP is supported only when the HP Metrocluster product is installed. A type of “continental” is supported only when the HP Continentalclusters product is installed. If the package is configured for three data centers (3DC), the value of this parameter must be set to “metro” for DC1 and DC2 nodes and “continental” for DC3 nodes. DEVICE_GROUP The Raid Manager device group for this package.
PAGE 422
Environment File Variables for Serviceguard Integration with Continuous Access XP Continuous Access link present between DC2 and DC3 but a phantom device group must be present between DC2 and DC3. DC1_NODE_LIST Comma separated list of all node names that are in data center one (DC1). The node list should begin and end with quotation marks. For example, DC1_NODE_LIST=”node1, node2”. This parameter is not required to be set for a package configured for two data centers.
PAGE 423
Environment File Variables for Serviceguard Integration with Continuous Access XP The Raid Manager device group between DC2 and DC3 for this package. This device group is defined in the /etc/horcm<#>.conf file. This will be a phantom device group if 3DC_TOPOLOGY=”multi-target-bi-link”. This parameter is not required to be set for a package configured for two data centers. DC1_DC3_DEVICE_GROUP The Raid Manager device group between DC1 and DC3 for this package.
PAGE 424
Environment File Variables for Serviceguard Integration with Continuous Access XP The Continuous Access Journal is used for asynchronous data replication. Fence level ascyn is used for a journal group pair. NOTE If the package is configured for three data centers (3DC), this parameter holds the fence level of device group between DC1 and DC2. As the device group between DC1 and DC2 is always synchronous, the fence level will be either “data” or “never”.
PAGE 425
Environment File Variables for Serviceguard Integration with Continuous Access XP In asynchronous mode, when there is an Continuous Access link failure, both the PVOL and SVOL sides change to a PSUE state. However, this change will not take place until the Continuous Access link timeout value, configured in the Service Processor (SVP), has been reached.
PAGE 426
Environment File Variables for Serviceguard Integration with Continuous Access XP Future releases may allow a value of 1. NOTE Values: 0—(Default) Single frame. 1—Multiple frames. If this parameter is set to 1, then the device group must be created with the “data” fence level, and the FENCE parameter must be set to “data” in this script. 426 PKGDIR Contains the full path name of the package directory.
PAGE 427
Environment File Variables for Serviceguard Integration with Continuous Access XP The following table describes AUTO_* variables settings and the package’s startup behavior that are supported with Metrocluster with Continuous Access XP version A.05.00 on HP-UX 11.0 and 11i or later systems.
PAGE 428
Environment File Variables for Serviceguard Integration with Continuous Access XP Table A-2 AUTO_NONCURDATA (Continued) Local State Remote State SVOL-PAIR; MINAP=0 PVOL-PAIR; MINAP>0, MINAP=0 Fence Level (Continuous Access-Journal) AUTO_NON CURDATA =0 (Default) AUTO_NONCURDAT A =1 or FORCEFLAG=yes Do not start with exit 1 Perform SVOL takeover, which changes SVOL to PSUS(SSWS).
PAGE 429
Environment File Variables for Serviceguard Integration with Continuous Access XP Table A-4 Local State PVOL_PSUS AUTO_PSUSSSSWS Remote State Fence Level AUTO_PSUSSSSWS =0 (Default) SVOL_PSUS (SSWS) NEVER /DATA/ ASYNC Do not start with exit 1 AUTO_PSUSSSSWS =1 or FORCEFLAG=yes Pairresync-swapp works, package starts up. * If pairresync-swapp works, package starts up. *If pairresync-swapp fails, package does not start with exit 1.
PAGE 430
Environment File Variables for Serviceguard Integration with Continuous Access XP Table A-6 Local State AUTO_SVOLPSUE Remote State SVOL_PSUS PVOL_PSUS SVOL_PSUS EX_ENORMT SVOL_PSUS EX_CMDIOE Table A-7 Local State AUTO_SVOLPSUE =0 (Default) NEVER /DATA/ ASYNC Do not start with exit 2. AUTO_SVOLPSUE =1 or FORCEFLAG=yes Perform a SVOL to PSUS(SSWS). After the takeover succeeds, package starts with a warning message about non-current data in the package’s control log file.
PAGE 431
Environment File Variables for Serviceguard Integration with Continuous Access XP This parameter defines the polling interval for the monitor service (if configured). If the parameter is not defined (commented out), the default value is 10 minutes. Otherwise, the value will be set to the desired polling interval in minutes. MON_NOTIFICATION_FREQUENCY (Default=0) This parameter controls the frequency of notification messages sent when the state of the device group remains the same.
PAGE 432
Environment File Variables for Serviceguard Integration with Continuous Access XP the parameter is set to 1, the monitor will send console notifications. If the parameter is not defined (commented out), the default value is 0. AUTO_RESYNC This parameter defines the pre-defined resynchronization actions that the monitor can perform when the package is on the PVOL side and the monitor detects the Continuous Access data replication link is down.
PAGE 433
Environment File Variables for Metrocluster Continuous Access EVA B Environment File Variables for Metrocluster Continuous Access EVA This appendix lists all Environment File variables that have been modified or added for Metrocluster with Continuous Access EVA. It is recommended that you use the default settings for most of these variables, so exercise caution when modifying them: CLUSTER_TYPE This parameter defines the clustering environment in which the script is used.
PAGE 434
Environment File Variables for Metrocluster Continuous Access EVA Availability_Preferred: The user chooses this policy if he prefers application availability. Metrocluster software will allow the application to start as long as the data is consistent; even though, it may not be current. Data_Currency_Preferred: This policy is chosen if it is preferred the application operates on consistent and current data.
PAGE 435
Environment File Variables for Metrocluster Continuous Access EVA DC1_SMA_LIST A list of the SMA servers that reside in Data Center 1. Multiple names can be defined by using commas as separators. DC1_HOST_LIST A list of the clustered nodes that reside in Data Center 1. Multiple names can be defined by using commas as separators. DC2_STORAGE_WORLD_WIDE_NAME The world wide name of the EVA storage system that resides in Data Center 2. This storage system name is defined when the storage is initialized.
PAGE 436
Environment File Variables for Metrocluster Continuous Access EVA 436 Appendix B
PAGE 437
Environment File Variables for Metrocluster with EMC SRDF C Environment File Variables for Metrocluster with EMC SRDF This appendix lists all Serviceguard control script variables that have been modified or added for Metrocluster with EMC SRDF.
PAGE 438
Environment File Variables for Metrocluster with EMC SRDF This variable indicates that when the package is being started on an R1 host and the Symmetrix is being synchronized from the Symmetrix on the R2 side, the package will halt unless the operator creates the $PKGDIR/FORCEFLAG file. The package halts because performance degradation of the application will occur while the resynchronization is in progress.
PAGE 439
Environment File Variables for Metrocluster with EMC SRDF A value of 0 for this variable indicates that when the package is being started on an R2 host and at least one (but not all) SRDF links are down, the package will be automatically started. This will normally be the case when the ‘Partitioned+Suspended’ RDF Pairstate exists. We cannot check the state of all Symmetrix volumes on the R1 side to validate conditions, but the Symmetrix on the R2 side should be in a ‘normal’ state.
PAGE 440
Environment File Variables for Metrocluster with EMC SRDF the normal setting if you are not using consistency groups. A value of 1 indicates that you are using consistency groups. (Consistency groups are required for M by N configurations.) If CONSISTENCYGROUPS is set to 1, AUTOSWAPR2 cannot be set to 1. Ensure that you have the minimum requirements for Consistency Groups by referring to Metrocluster release notes.
PAGE 441
Environment File Variables for Metrocluster with EMC SRDF RETRY Default: 5. This is the number of times a SymCLI command is repeated before returning an error. Use the default value for the first package, and slightly larger numbers for additional packages making sure that the total of RETRY*RETRYTIME is approximately 5 minutes.
PAGE 442
Environment File Variables for Metrocluster with EMC SRDF 442 Appendix C
PAGE 443
Configuration File Parameters for Continentalclusters D Configuration File Parameters for Continentalclusters This appendix lists all Continentalclusters configuration file variables. See Chapter 2, “Designing a Continental Cluster,” for suggestions on coding these parameters.
PAGE 444
Configuration File Parameters for Continentalclusters • unreachable - the cluster is unreachable • down - the cluster is down, but nodes are responding • error - an error is detected The maximum length is 47 characters. When the MONITORING_CLUSTER detects a change in status, one or more notifications are sent, as defined by the NOTIFICATION parameter, at time intervals defined by the CLUSTER_ALERT and CLUSTER_ALARM parameters.
PAGE 445
Configuration File Parameters for Continentalclusters The parameter consists of a pair of names: the name of the cluster that receives the data to be replicated (usually the Recovery Cluster) as defined in the Serviceguard cluster configuration ASCII file, followed by a slash (“/”), followed by the name of the data replication receiver package as defined in the Serviceguard package configuration ASCII file.
PAGE 446
Configuration File Parameters for Continentalclusters MONITORING_CLUSTER Name This is name of the cluster that polls the cluster named in the CLUSTER_EVENT and sends notification. Maximum length is 31 bytes. NODE_NAME nodename This is the unqualified node name as defined in the DNS name server configuration. Maximum size is 31 bytes. NOTIFICATION Destination “Message” This is a destination and message associated with a specific CLUSTER_ALERT or CLUSTER_ALARM.
PAGE 447
Configuration File Parameters for Continentalclusters • TEXTLOG Pathname - append the specified message to a specified text log file. • UDP Nodename:Portnumber - send the specified message to a UDP port on the specified node. Any number of notifications may be associated with a given alert or alarm.
PAGE 448
Configuration File Parameters for Continentalclusters 448 Appendix D
PAGE 449
Continentalclusters Command and Daemon Reference E Continentalclusters Command and Daemon Reference This appendix lists all commands and daemons used with Continentalclusters. Manual pages are also available online. cmapplyconcl [-v] [-C] filename This command verifies the Continentalclusters configuration as specified in filename, creates or updates the binary, and distributes it to all nodes in the continental cluster.
PAGE 450
Continentalclusters Command and Daemon Reference This command verifies the Continentalclusters configuration specified in filename. It is not necessary to halt the Serviceguard cluster in order to run this command; however, the Continentalclusters monitor package must be halted. This command will parse the ASCII_file to ensure proper syntax, check parameter lengths, and validate object names such as the CLUSTER_NAME and NODE_NAME. Options are: -C filename The name of the ASCII configuration file.
PAGE 451
Continentalclusters Command and Daemon Reference configuration on the reachable nodes. If this option is used and some node has configuration files for a continental cluster with a different name, you will be prompted to indicate whether to proceed with deleting the configuration on that node. cmforceconcl ServiceguardPackageEnableCommand This command is used to force a Continentalclusters package to start.
PAGE 452
Continentalclusters Command and Daemon Reference Declares an alternate location for the configuration file. The default is/etc/cmcluster/cmoncl.config. cmreadlog -f input_file [output_file] This command formats the content of Object Manager and other log files for easier reading. The command is used when reading the /var/opt/cmom/cmomd.log file and the /var/adm/cmconcl/sentryd.log file. Options are: -f input_file Specifies the name of the managed object file (MOF file) to be read.
PAGE 453
Continentalclusters Command and Daemon Reference are started by enabling package switching globally (cmmodpkg -e) for each package. This will cause the package to be started on the first available node within the recovery cluster. The cmrecovercl command can only be run on a recovery cluster. The cmrecovercl command will fail if there has not been sufficient time since the primary cluster became unreachable.
PAGE 454
Continentalclusters Command and Daemon Reference 454 Appendix E
PAGE 455
Glossary Business Recovery Service Glossary A application restart Starting an application, usually on another node, after a failure. Application can be restarted manually, which may be necessary if data must be restarted before the application can run (example: Business Recovery Services work like this.) Applications can by restarted by an operator using a script, which can reduce human error.
PAGE 456
Glossary campus cluster C campus cluster A single cluster that is geographically dispersed within the confines of an area owned or leased by the organization such that it has the right to run cables above or below ground between buildings in the campus. Campus clusters are usually spread out in different rooms in a single building, or in different adjacent or nearby buildings. See also extended distance cluster.
PAGE 457
Glossary database replication consistency group A set of Symmetrix RDF devices that are configured to act in unison to maintain the integrity of a database. Consistency groups allow you to configure R1/R2 devices on multiple Symmetrix frames in Metrocluster with EMC SRDF. continental cluster A group of clusters that use routed networks and/or common carrier networks for data replication and cluster communication to support package failover between separate clusters in different data centers.
PAGE 458
Glossary disaster disaster An event causing the failure of multiple components or entire data centers that render unavailable all services at a single location; these include natural disasters such as earthquake, fire, or flood, acts of terrorism or sabotage, large-scale power outages. remote site.
PAGE 459
Glossary mirroring filesystem replication The process of replicating filesystem changes from one node to another. filesystem or the database. Complex transactions may result in the modification of many diverse physical blocks on the disk. G LUN (Logical Unit Number) A SCSI term that refers to a logical disk device composed of one or more physical disk mechanisms, typically configured into a RAID level.
PAGE 460
Glossary mission critical application mission critical application Hardware, software, processes and support services that must meet the uptime requirements of an organization. Examples of mission critical application that must be able to survive regional disasters include financial trading services, e-business operations, 911 phone service, and patient record databases. O off-line data replication.
PAGE 461
Glossary recovery package example EMC’s Symmetrix Remote Data Facility or the HP StorageWorks E Disk Array XP Series Continuous Access), or software-based where data is replicated on multiple disks using dedicated software on the primary node (for example, MirrorDisk/UX). planned downtime An anticipated period of time when nodes are taken down for hardware maintenance, software maintenance (OS and application), backup, reorganization, upgrades (software or hardware), etc.
PAGE 462
Glossary regional disaster regional disaster A disaster, such as an earthquake or hurricane, that affects a large region. Local, campus, and proximate metropolitan clusters are less likely to protect from regional disasters. remote failover Failover to a node at another data center or remote location. resynchronization The process of making the data between two sites consistent and current once systems are restored following a failure. Also called data resynchronization.
PAGE 463
Glossary WAN data replication solutions replication. Minimizes the chance of inconsistent or corrupt data in the event of a rolling disaster. T transaction processing monitor (TPM) Software that allows you to modify an application to store in-flight transactions in an external location until that transaction has been committed to all possible copies of the database or filesystem, thus ensuring completion of all copied transactions.
PAGE 464
Glossary WAN data replication solutions 464 Glossary
PAGE 465
Index A adding a node to Continentalclusters configuration, 122 adding a recovery group in ContinentalClusters, 111, 123, 124 alerts how used, 47 application recovery in a continental cluster, 43 applying the continental clusters configuration, 95 arbitrator nodes, 27, 28 AUTO_FENCEDATA_SPLIT in MetroCluster/CA, 413, 432 AUTO_NONCURDATA in MetroCluster/CA, 414 AUTO_PSUEPSUS in MetroCluster/CA, 416 AUTO_RUN (PKG_SWITCHING_ENABLED) setting to NO in a continental cluster, 67 AUTO_SMPLNORMT in MetroCluster/CA,
PAGE 466
Index ContinentalClusters command, 451 cmreadlog ContinentalClusters command, 452 cmrecovercl, 105 ContinentalClusters command, 452 how the command works in ContinentalClusters, 109 command line cmrecovercl, 105 symdg, 299 symgate, 301 command line interface, EMC Symmetrix, 290 concepts in ContinentalClusters, 40 configuring a three-data-center architecture, 24 additional nodes in Continentalclusters, 122 arbitrator nodes, 27 configuring, 65 Continentalcluster Recovery cluster hardware, 68 Continentalclus
PAGE 467
Index device groups creating, 299 device names EMC Symmetrix logical devices, 300 mapping, 293 mapping Symmetrix to command line symld, 300 mapping Symmetrix to HP-UX, 296 device names, EMC Symmetrix, 292 DEVICE_GROUP in MetroCluster/CA, 421, 434 devices gatekeeper, 301 disaster recovery automating with MetroCluster, 315 configuring packages in MetroCluster/CA, exporting volume groups, 258, 302 extended distance cluster, 22 using Continentalclusters, 104 disaster tolerance restoring to Continentalclusters
PAGE 468
Index Metrocluster, 285 MetroCluster/CA maintenance, 192, 267 overview, 228, 286 Metrocluster/CA, 153, 227 metropolitan cluster, 22 modifying Continentalclusters configuration file, 77 Monitor, 202 MONITOR_INTERVAL ContinentalClusters configuration file parameter, 445 MONITOR_PACKAGE_NAME ContinentalClusters configuration file parameter, 445 monitoring, 97 receiving ContinentalClusters notification, 104 sample package configuration file for Continentalclusters, 76 monitoring definitions entering in Contine
PAGE 469
Index PRIMARY_PACKAGE ContinentalClusters configuration file parameter, 447 Q quorum, 28 R RAC, 132, 134 Raid Manager creating configuration files in MetroCluster/CA, 240 sample configuration file, 177 recovery cluster, 40, 68 recovery group adding, 111, 123, 124 defining in ContinentalClusters, 80 recovery procedure documenting, 99 testing testing the recovery procedure, 100 RECOVERY_GROUP_NAME ContinentalClusters configuration file parameter, 447 RECOVERY_PACKAGE ContinentalClusters configuration file par
PAGE 470
Index Veritas Cluster Volume Manager/Cluster File System, 132, 134 Veritas CVM/CFS, 134 volume groups creating, 257, 302 importing and exporting, 258, 302 W WAITTIME in MetroCluster/CA, 426, 430, 431 wide area cluster, 40 worksheet cluster configuration with MetroCluster/CA, 34 Continentalclusters, 61 power supply configuration, 61, 62, 63 worksheet for package configuration MetroCluster/CA, 35 worksheet, Metrocluster, 34 worksheet, package, 35 X XP series verify configuration, 251 XP series disk array with