HP Serviceguard Metrocluster with EMC SRDF for Linux B.01.00.
Legal Notices © Copyright 2013 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Contents 1 Introduction...............................................................................................5 Overview of EMC SRDF............................................................................................................5 Terms and concepts.............................................................................................................5 Types of configuration..........................................................................................................
6 Troubleshooting........................................................................................38 Troubleshooting Metrocluster....................................................................................................38 Metrocluster log.................................................................................................................38 SYMAPI log.......................................................................................................................
1 Introduction The EMC Symmetrix Remote Data Facility (EMC SRDF) disk arrays allows you to configure physical data replication solutions to provide disaster recovery for Serviceguard clusters over long distances. Overview of EMC SRDF EMC SRDF is a Symmetrix-based business continuance and disaster recovery solution. SRDF is a configuration of Symmetrix systems, the purpose of which is to maintain multiple, real-time copies of data in more than one location.
another Symmetrix system. Logical volumes (devices) are assigned to SRDF groups. For more information, see EMC® Symmetrix® Remote Data Facility (SRDF®) Product Guide at http:// support.emc.com. SRDF/Synchronous SRDF/Synchronous mode of operation ensures that every write by a host connected to a Symmetrix unit at the R1 site is replicated to the R2 site before the local Symmetrix unit at R1 sends back an acknowledgement to the host.
For information about EMC SRDF, see the document EMC® Symmetrix® Remote Data Facility Product Guide available at EMC documentation website. Overview of solution for Metrocluster with EMC SRDF for Linux Overview of a Metrocluster configuration A Metrocluster is configured with the nodes at Site1 and Site2. When Site1 and Site2 form a Metrocluster, a third location is required where Quorum Server or arbitrator nodes must be configured.
Figure 2 Overview of a Metrocluster configuration Quorum Server Network Switch Ethernet Network Switch Network Switch Network Switch DWDM DWDM Node A FC Switch Network Switch Node C Node B FC Switch Node D FC Switch FC Switch FC Switch FC Switch IP Network FC Switch Arrays FC Switch Arrays Figure 2 (page 8) depicts an example of two applications distributed in a Metrocluster with EMC SRDF for Linux environment balancing the server and replication load.
2 Configuring an application in a Metrocluster solution Installing the necessary software Before you begin any configuration, ensure the following software is installed on all the nodes: • Symmetrix EMC Solutions Enabler software that allows the management of the Symmetrix disks from the node. • If you are building an M by N configuration using RDF Enginuity Consistency Assist (RDF-ECA), you must install only Symmetrix EMC Solutions Enabler 7610 or later. You do not have to install any other software.
Device ---------------------------Name Type Vendor ---------------------------/dev/sda R1 EMC /dev/sdb R1 EMC /dev/sdc R1 EMC /dev/sdd R1 EMC /dev/sde R1 EMC /dev/sdf R1 EMC /dev/sdg R1 EMC /dev/sdh R1 EMC /dev/sdi R1 EMC /dev/sdj R1 EMC Product Device --------------------------- --------------------ID Rev Ser Num Cap (KB) --------------------------- --------------------SYMMETRIX 5875 27080A4000 1966080 SYMMETRIX 5875 27080A5000 1966080 SYMMETRIX 5875 27080A6000 1966080 SYMMETRIX 5875 27080A7000 1966080 SY
---------------------------------------------------------------------------STATUS MODES RDF S T A T E S Sym RDF --------- ----- R1 Inv R2 Inv ---------------------Dev RDev Typ:G SA RA LNK MDATE Tracks Tracks Dev RDev Pair ---- ---- -------- --------- ----- ------- ------- --- ---- ------------034E 80B2 R2:2 RW WD RW S..2. 0 0 WD RW Synchronized 034F 80B3 R2:2 RW WD RW S..2. 0 0 WD RW Synchronized 0350 80B4 R2:2 RW WD RW S..2. 0 0 WD RW Synchronized 0351 80B5 R2:2 RW WD RW S..2.
NOTE: The Symmetrix device number may be the same or different in each of the Symmetrix units for the same logical device. In other words, the device number for the logical device on the R1 side of the SRDF link may be different from the device number for the logical device on the R2 side of the SRDF link. When determining the configuration for the Symmetrix devices for a new installation, HP recommends using the same Symmetrix device number for both the R1 and R2 devices.
# symdg create -type RDF2 Run this command on nodes attached to the R2 side. The Device Group Name must be the same on each node on the R1 and R2 side. 2. Use the symld command to add all LUNs that comprise the Volume Group for that package on that host to a Symmetrix device group. All disks that belong to Volume Groups are owned by an application package and must be added to a single Symmetrix device group.
3. All devices from the RDF (RA) group configuration are added to the device group for SRDF/Asynchronous operation. For example, if the RDF group displayed in the symrdf list is group number 2, then all devices in this RDF group must be managed together within one device group for SRDF/Asynchronous operation. # symld -g addall -rdfg 2 4. 5. Repeat steps 1 through 3 on every host that runs Serviceguard packages.
Configuring Gatekeeper devices Gatekeeper device is a special LUN through which host communicates with Symmetrix storage. These are unique to a Symmetrix unit. They are not replicated across the SRDF link. These devices are marked GK in the syminq output, and are usually 2880 KB in size. They must be unique per the Serviceguard package to prevent contention in the Symmetrix when commands are run, such as two or more packages starting up at the same time. 1.
Figure 4 Devices and Symmetrix Units in M by N configurations Array Node 1 Gatekeeper /dev/sdb (002) Array Node 3 Gatekeeper /dev/sdb (010) R1 Devices /dev/sdm (00C) /dev/sdl (00D) R2 Devices /dev/sdc (018) /dev/sde (019) Symmetrix A BCV Devices /dev/sdf (01A) /dev/sdh (018) Array Symmetrix C Array Gatekeeper /dev/sda (0DB) Gatekeeper /dev/sdw (009) R1 Devices /dev/sdz (010) /dev/sdv (011) Pkg B Node 2 Symmetrix B R2 Devices /dev/sda (050) /dev/sdd (051) BCV Devices /dev/sdg (052) /dev/sdq (053
Figure 5 Example of an M by N configuration - 2 by 1 configuration Node 3 Array Array Node 1 SRDF Links Node 4 Pkg A BCVs R2 Vols R1 Vols SRDF Links Node 5 Array Node 2 Node 6 Pkg B R1 Vols Third Location (Arbitrators) Figure 6 shows a bidirectional 2 by 2 configuration with additional packages on node3 and node4, and R1 and R2 volumes at both data centers. In this configuration, R1 volumes and pkg A and pkg B are at Data Center A, and R2 volumes are at Data Center B.
the devices. If an I/O cannot be written to a remote Symmetrix because a remote device or an RDF link has failed, the data flow to the other Symmetrix is halted in less than one second. Once mirroring is resumed, any updates to the data is propagated with normal SRDF operation. Figure 7 shows how the use of consistency groups (depicted as dashed rectangle lines) ensures that the other two links are also suspended when there is a break in the links between two of the Symmetrix frames.
4. For each node on the R2 side (node3 and node4), assign the R2 devices to the device groups. # symld -sid -g add dev 018 # symld -sid -g add dev 019 # symld -sid -g add dev 050 # symld -sid -g add dev 051 5. On each node on the R2 side (node3 and node4), associate the local BCV devices to the R2 device group.
# symgate -sid -g associate dev 002 # symgate -sid -g associate dev 00B Creating the consistency groups To configure consistency groups for using Metrocluster with EMC SRDF for Linux, first create device groups and gatekeeper groups as described in “Configuring gatekeeper devices” (page 19). The following examples are based on the configuration shown in Figure 4. For each package, to create consistency groups: 1.
NOTE: 4. This important step must be carried out on every node. Establish the BCV devices in the secondary Symmetrix as a mirror of the standard device.
3. All devices from RDF (RA) groups configuration are added to the composite group for SRDF/Asynchronous MSC operation. For example, the RDF groups 6 and 7 are added to the composite group for SRDF/Asynchronous MSC operation. # symcg -cg -rdfg 6 addall pd # symcg -cg –rdfg 7 addall pd 4. 5. Repeat the steps 1-3 on each host that must run Serviceguard packages.
Serviceguard variables are not defined on your system, then include the file /etc/ cmcluster.conf in your login profile for the root user. For more information on these parameters, see Understanding the Location of Serviceguard Files and Enabling Serviceguard Command Access sections in Managing Serviceguard for Linux A.11.20 available at http://www.hp.com/go/ linux-serviceguard-docs. Create the cluster or clusters according to the process described in the Managing Serviceguard for Linux A.11.
Configuring volume groups Configuring LVM volume group LVM storage can be used in the disaster recovery clusters. The following procedure explains setting up the LVM volume group. Before you create volume groups, you must enable activation protection for logical volume groups, preventing the volume group from being activated by more than one node at the same time. For more information on enabling activation protection for logical volume groups, see Managing HP Serviceguard A.11.
# vgchange -a y # vgcfgbackup # vgchange -a n # vgchange --deltag $(uname -n) 10. After backing up the lvm configuration, establish the SRDF links. # symrdf -g establish -v Installing and configuring an application The disks where the application binaries and configuration files reside must not be replicated. Only the disks where application data resides must be replicated. The following section describes how to configure a package for the application.
NOTE: If external_pre_script is specified in a Metrocluster package configuration, the external_pre_script is executed after the execution of Metrocluster module scripts in package startup. Metrocluster module scripts are always executed first during package startup. 5. Run the package on a node in the Serviceguard cluster. # cmrunpkg -n 6. Enable global switching for the package.
◦ the application continues to modify the data. ◦ the link is restored. ◦ resynchronization from R1 to R2 starts, but does not finish. ◦ the R1 side fails. Although the risk of such an occurrence is extremely low, if the business cannot afford even a minor risk, then the Domino Mode must be enabled to ensure that the data at the R2 side is always consistent.
• The value of RUN_SCRIPT_TIMEOUT in the package configuration file must be set to NO_TIMEOUT or to a large enough value to account for the extra startup time because of the time taken for getting status from the Symmetrix. Data replication can utilize any extended SAN devices that support SRDF Links, for example DWDM, Fiber Channel over Internet Protocol, and so on.
3 Metrocluster features Data replication storage failover preview In an actual failure, packages are failed over to the standby site. As part of the package startup, the underlying storage is failed over based on the parameters defined in the Metrocluster package ascii configuration file. The storage failover can fail due to many reasons, and can be categorized as the following: • Incorrect configuration or setup of Metrocluster and data replication environment.
Application Detach (LAD)) allows you to do this kind of maintenance while keeping the packages running. The packages are no longer monitored by Serviceguard, but the applications continue to run. Packages in this state are called detached packages. When you have done the necessary maintenance, you can restart the node or cluster, and normal monitoring will resume on the packages. For more information on the LAD feature, see Managing HP Serviceguard A.11.20.20 for Linux available at http://www.hp.
4 Understanding Failover/Failback scenarios Table 5 (page 31) describes the package startup behavior in various failure scenarios depending on the AUTO parameters and the presence of FORCEFLAG file in the package directory. Table 5 Package startup behavior in various failure scenarios Failover/Failback Failover to recovery site (R2) due to application failure in all the nodes on the primary site (R1).
Table 5 Package startup behavior in various failure scenarios (continued) Failover/Failback SRDF States AUTO parameters Metrocluster behaviour the package directory and restart the package. To automate failover, set AUTOR1UIP to 0. However, it is better to wait for the update to complete before starting up the package. 32 Failover sync or within the async primary site (R1) when the recovery site (R2) or the SRDF link is down.
Table 5 Package startup behavior in various failure scenarios (continued) Failover/Failback SRDF States AUTO parameters SRDF links are split. Failover to the sync or recovery site async (R2) when the SRDF Links are in mixed state. ( This can happen with consistency groups where one link is in Partitioned state and the other is in Suspended state). Metrocluster behaviour the package, create a FORCEFLAG in the package directory and restart the package. To automate failover, set AUTO_NO_REPLICATION to 1.
5 Administering Metrocluster Adding a node to a Metrocluster To add a node to Metrocluster with EMC SRDF for Linux: 1. Add the node in a cluster by editing the Serviceguard cluster configuration file and applying the configuration: # cmapplyconf -C cluster.config 2. 3. 4. Configure the device groups or consistency groups used by the Metrocluster packages on the newly added node. For more information, see “Creating Symmetrix device groups” (page 12)or “Creating the consistency groups” (page 20).
Managing Business Continuity Volumes (BCV) The use of BCV is recommended with all implementations of Metrocluster with EMC SRDF for Linux, and it is required with M by N configurations, which employ consistency groups. These BCV devices provide a good copy of the data when it is necessary to recover from a rolling disaster—a second failure that occurs while attempting to recover from the first failure.
R1/R2 swapping This section describes how the R1/R2 swapping can be done via the Metrocluster package and manual procedures. Each of these methods allow swapping the SRDF personality for each device designation of a specified device group. When swapped, every source R2 device becomes a target R2 device, and a target R1 device becomes a source R1 device.
# symrdf -g establish CAUTION: R1/R2 Swapping cannot be used in an M by N Configuration.
6 Troubleshooting Troubleshooting Metrocluster Analyse Metrocluster and symapi log files to understand the problem in the respective environment and follow a recommended action based on the error or warning messages. Metrocluster log Regularly review the following files for messages, warnings, and recommended actions. It is good to review these files after each system, data center, and/or application failures: • View the system log at /var/log/messages.
7 Support and other resources Information to collect before contacting HP Ensure that the following information is available before you contact HP: • Software product name • Hardware product model number • Operating system type and version • Applicable error message • Third-party hardware or software • Technical support registration number (if applicable) How to contact HP Use the following methods to contact HP technical support: • In the United States, see the Customer Service / Contact HP Un
HP authorized resellers For the name of the nearest HP authorized reseller, see the following sources: • In the United States, see the HP U.S. service locator website: http://www.hp.com/service_locator • In other locations, see the Contact HP worldwide website: http://welcome.hp.com/country/us/en/wwcontact.html Documentation feedback HP welcomes your feedback. To make comments and suggestions about product documentation, send a message to: docsfeedback@hp.
IMPORTANT An alert that calls attention to essential information. NOTE An alert that contains additional or supplementary information. TIP An alert that provides helpful information.
A Checklist and worksheet for configuring Metrocluster with EMC SRDF for Linux Disaster recovery checklist Use this checklist to ensure you have adhered to the disaster recoveryarchitecture guidelines for two main data centers and a third location configuration. Data centers A and B have the same number of nodes to maintain quorum in case an entire data center fails. Arbitrary nodes or Quorum Server nodes are located in a separate location from either of the primary data centers (A or B).
Member Timeout: _________________________________________________________ Network Polling Interval: ___________________________________________________ AutoStart Delay: __________________________________________________________ Package configuration worksheet Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the latest version of the Managing HP Serviceguard for Linux manual available at http:// www.hp.com/go/linux-serviceguard-docs.
B Package attributes for Metrocluster with EMC SRDF for Linux This appendix lists all Serviceguard package attributes that are modified or added for Metrocluster with EMC SRDF for Linux. HP recommends that you use the default settings for most of these variables, so exercise caution when modifying them.
“async” for Asynchronous. If RDF_MODE is not defined, synchronous mode is assumed. RETRY Default: 60. This is the number of times a SymCLI command is repeated before returning an error. Use the default value for the first package, and slightly larger numbers for additional packages ensuring that the total of RETRY*RETRY_INTERVAL is approximately five minutes.
C Sample output of the cmdrprev command The following procedure shows you how to use the cmdrprev command to preview the data replication preparation for a package in an MC SRDF environment. 1. Verify that the Metrocluster environment file for the package pkga is present in the package directory on node . 2.
Glossary A arbitrator Nodes in a disaster recovery architecture that act as tie-breakers when all of the nodes in a data center go down at the same time. These nodes are full members of the Serviceguard cluster and must conform to the minimum requirements. The arbitrator must be located in a third data center to ensure that the failure of an entire data center does not bring the entire cluster down.
E, F ESCON Enterprise Storage Connect. A type of fiber-optic channel used for inter-frame communication between EMC Symmetrix frames using EMC SRDF or between HP StorageWorks E P9000 or XP series disk array units using Continuous Access P9000 or XP. failback Failing back from a backup node, which may or may not be remote, to the primary node that the application normally runs on. failover The transfer of control of an application or service from one node to another node after a failure.
RDF-ECA (RDF Enginuity Consistency Assist) A Solutions Enabler feature to provide consistency protection for SRDF/Synchronous devices. RDF-ECA is used for M by N Symmetrix configurations using Metrocluster with EMC SRDF for Linux. Recovery Cluster A cluster on which recovery of a package takes place following a failure on the cluster. resynchronization The process of making the data between two sites consistent and current after the systems are restored following a failure.
Index C R command line symdg, 12 command line interface, EMC Symmetrix, 9 configuration Symmetrix array, 6 configuring gatekeeper devices, 15 verifiying EMC Symmetrix configuration, 14 creating EMC Symmetrix device groups, 12 required software, 9 D device groups creating, 12 device names EMC Symmetrix logical devices, 13 mapping, 10 mapping Symmetrix to command line symld, 13 mapping Symmetrix to Linux, 11 device names, EMC Symmetrix, 9 devices gatekeeper, 15 disk command line interface, 9 device names,