Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access EVA P6000 for Linux B.12.00.
Legal Notices © Copyright 2014 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Contents 1 Introduction...............................................................................................5 Overview of P6000/EVA and HP P6000 Continuous Access Concepts............................................5 Copy sets............................................................................................................................5 Data replication Groups (DR Groups).....................................................................................5 Write modes.................
5 Administering Metrocluster.........................................................................32 Adding a node to Metrocluster ................................................................................................32 Maintaining EVA P6000 Continuous Access replication in Metrocluster.........................................32 HP P6000 Continuous Access Link failure scenarios................................................................32 Planned maintenance.................................
1 Introduction This document describes how to configure data replication solutions using HP P6000/EVA disk Arrays to provide disaster recovery for Serviceguard clusters over long distances. It also gives an overview of the HP P6000 Continuous Access software and the additional files that integrate HP P6000/EVA disk Arrays with Metrocluster.
The replicating direction of a DR group is always from a source to a destination. In bidirectional replication, an array can have both source and destination virtual disks that will reside in separate DR groups. That is, one virtual disk cannot be both a source and destination simultaneously. Bidirectional replication enables you to use both arrays for primary storage while they provide disaster protection for another site.
DR Group write history log The DR group write history log is a virtual disk that stores a DR group's host write data. The log is created when you create the DR group. Once the log is created, it cannot be moved. In synchronous mode or basic asynchronous mode, the DR group write history log stores data when replication to the destination DR group is stopped because the destination DR group is unavailable or suspended. This process is called logging.
shutdown of the system before the redundant system takes over. An unplanned failover occurs when a failure or outage occurs that may not allow an orderly transition of roles. NOTE: Failover can take other forms: • Controller failover — The process that occurs when one controller in a pair assumes the workload of a failed or redirected controller in the same array. • Fabric or path failover — I/O operations transfer from one fabric or path to another.
For more information on remote data replication concepts and planning a remote replication solution, see HP P6000 Continuous Access Implementation Guide available at http://www.hp.com/ support/manual—>storage -> Storage Software -> Storage Replication Software -> HP P6000 Continuous Access Software. Overview of a Metrocluster with HP P6000 Continuous Access configuration A Metrocluster is configured with the nodes at Site A and Site B.
Figure 1 Sample Configuration of Metrocluster with Continuous Access EVA P6000 for Linux Quorum Server A B Node 1 Node 2 Node 3 Node 4 Metrocluster Site A Router Site B Router Synchronous / Enhanced Asynchronous EVA Disk Array EVA Disk Array DC1 for App A DC2 for App B DC1 for App B DC2 for App A Figure 1 depicts an example of two applications distributed in a Metrocluster with Continuous Access EVA P6000 for Linux environment balancing the server and replication load.
2 Configuring an application in a Metrocluster environment Installing the necessary hardware and software When the following procedures are complete, an adoptive node will be able to access the data belonging to a package after it fails over. Setting up the storage hardware 1. 2. 3. Before you configure this product, you must correctly cable the P6000/EVA with redundant paths to each node in the cluster that will run packages accessing data on the array.
The following is a sample of the site definition in a Serviceguard cluster configuration file: SITE_NAME san_francisco SITE_NAME san_jose NODE_NAME SFO_1 SITE san_francisco ..... NODE_NAME SFO_2 SITE san_francisco ........ NODE_NAME SJC_1 SITE san_jose ....... NODE_NAME SJC_2 SITE san_jose ........ Use cmviewcl command to view the list of sites that are configured in the cluster and their associated nodes.
Figure 2 P6000/EVA Management Console using HP P6000 Command View For more information on setting up HP P6000 Command View for configuring, managing, and monitoring HP P6000/EVA Storage System , see HP P6000 Command View User Guide available at http://www.hp.com/support/manuals -> storage-> Storage Software-> Storage Device Management Software-> HP StorageWorks HP P6000 Command View Software.
3. After you customize the sssu_input file, run the SSSU command as follows to set the destination Vdisk to read-only mode. # /sbin/sssu “FILE ” 4. To create the special device file name for the Vdisk on P6000/EVA, after changing the access mode of the destination Vdisk, run the /usr/bin/rescan-scsi-bus.sh command to detect and activate the disks, and then run lsscsi command to display the configured disks.
1. Create a configuration input file. A template of this file is available at the following location for Red Hat and SUSE. For an example of the smiseva.conf file, see“smiseva.conf file” (page 41). The smiseva.conf is available at: $SGCONF/mccaeva/smiseva.conf 2. Copy the template file smiseva.conf to the desired location. # cp $SGCONF/cmcaeva/smiseva.conf 3.
Adding or updating Management Server information To add or update individual Management Server login information to the map file, use the following command options shown in Table 2: smispasswd -h -n -p -u -s Table 2 Individual Management Server information Command Options Description -h This is either a DNS resolvable hostname or IP address of the Management Server -n This is the name space configured for the S
1. Create a configuration input file. This file will contain the names of storage pairs and DR groups. A template of configuration input file is available at the following location for Red Hat and SUSE. $SGCONF/mccaeva/mceva.conf 2. Copy the template file to the desired location. $SGCONF/mccaeva/mceva.conf 3. 4. 5. For each pair of storage units, enter the WorldWideName (WWN).
NOTE: Run the evadiscovery tool after all the storage DR Groups are configured or when there is any change to the storage device. For example, the user removes and recreates a DR group that is used by an application package. In this case the DR Group's internal IDs are regenerated by the P6000/EVA system. If any name of storage systems or DR groups is changed, update the external configuration file, run the evadiscovery utility, and redistribute the map file caeva.map to all Metrocluster clustered nodes.
Figure 3 P6000 Command View for the WWN identifier Configuring LVM volume group using Metrocluster with Continuous Access EVA P6000 for Linux LVM storage can be used in Metrocluster. The following section show how to set up LVM volume group. Before you create volume groups, you can create parttions on the disks and must enable activation protection for logicalvolume groups, preventing the volume group from being activated by more than one node at the same time.
7. On the source disk site, run the following commands on all the other systems that might run the Serviceguard package. If required, take a back up of a LVM configuration. # vgchange --addtag $(uname -n) # vgchange -a y # vgcfgbackup # vgchange -a n # vgchange --deltag $(uname -n) 8. To verify the Volume Group configuration on the target disk site. • 9. To failover the DR group: a. Select the remote storage system from the HP P6000 Command View. b.
Table 4 Temp.config file Attributes Attributes Description dts_pkg_dir This is the package directory for this Metrocluster modular package. The Metrocluster Environment file is generated for this package in this directory. This value must be unique for all packages. DT_APPLICATION_STARTUP_POLICY This is a parameter used to define a policy for starting an application. It can be set to Availability_Preferred or Data_Currency_Preferred policy.
Configure a cluster with sites to use either of these policies. For information on configuring the failover policy to site_preferred or site_preferred_manual, see “Site Aware Failover Configuration” (page 11). 3. Validate the package configuration file. # cmcheckconf -P temp.config 4. Apply the package configuration file. # cmapplyconf -P temp.
Figure 4 Creating modular package 4. If the product Metrocluster with Continuous Access EVA P6000 for Linux Toolkit is installed, you will be prompted to configure a Metrocluster package. Select the dts/mccaeva module, and then click Next. Figure 5 Selecting Metrocluster module 5. 6. You will be prompted next to include any other toolkit modules. In case, application being configured has a Serviceguard toolkit, select the appropriate toolkit; otherwise, move to the next screen. Enter the package name.
Figure 7 Selecting additional modules for the package 8. Review the node order in which the package will start, and modify other attributes, if needed. Click Next. Figure 8 Configuring generic failover attributes 9. 24 Configure the attributes for a Metrocluster package. All the mandatory attributes (marked with *) must be accurately filled. a. Select Application start up policy from the list. b. Specify the DR Group name, and then enter values for Wait Time and Query Timeout , if required. c.
Figure 9 Specifying the list of management servers for DC1 and DC2. 10. Enter the values for other modules selected in step 7. 11. After you enter the values for all the modules, review all the inputs given to the various attributes in the final screen. If you want to validate the package configuration click on Check Configuration, else click on Apply Configuration.
3 Metrocluster features Data replication storage failover preview In an actual failure, packages are failed over to the standby site. In package startup, the underlying storage is failed over based on the parameters defined in Metrocluster package. The storage failover might fail under the following conditions: • Incorrect configuration or setup of Metrocluster and data replication environment.
Live Application Detach There may be circumstances in which you want to do maintenance that involves halting a node, or the entire cluster, without halting or failing over the affected packages. Such maintenance might consist of anything short of rebooting the node or nodes, but a likely case is networking changes that will disrupt the heartbeat. New command options in Serviceguard for Linux B.12.00.
4 Understanding failover/failback scenarios Metrocluster package failover/failback scenarios This section discusses the package start up behaviors in various failure scenarios depending on DT_APPLICATION_STARTUP_POLICY and replication mode. Table 6 describes the list of failover scenarios. NOTE: The first time failover to a node at a remote site has to be done with the Management Server being active for the EVA array at the remote site.
Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY The replication link state is good, the role of the device group on this site is destination" and the data Log Copy is in progress. Because the WAIT_TIME is set to xx minutes, the program will wait for completion of the log copy Resolution Package does not wait for the merge to complete. It starts up immediately. … The DR Group is in merging state. … The WAIT_TIME has expired.
Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY Resolution Vdisk is restored. HP recommends to take a snapshot/snapclone of the destination Vdisks before the copy starts so that there is consistent copy available for recovery. Remote failover when the link is manually suspended Synchronous DR Group does not fail over and the package does not start.
Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY Resolution group. The package is NOT allowed to start up. Remote failover Synchronous when the DR Enhanced Group is in Asynchronous RUNDOWN state and link is down N/A DR Group fails over and the package is started. DR Group is not failed over and To forcefully start the package is not started.
5 Administering Metrocluster Adding a node to Metrocluster To add a node to Metrocluster with Continuous Access EVA P6000 for Linux: 1. To add the node in a cluster, edit the Serviceguard cluster configuration file, and then apply the configuration: # cmapplyconf -C cluster.config 2. Copy caeva.map file to the new node. For Red Hat: # scp/usr/local/cmcluster/conf/mccaeva/caeva.map\ :/usr/local/cmcluster/conf/mccaeva/caeva.map For SUSE: # scp/opt/cmcluster/conf/mccaeva/caeva.
2. 3. P6000 Continuous Access does not resynchronize the source and destination Vdisks upon links recovery. This helps in maintaining data consistency. Take a local replication copy of the destination Vdisks using HP P6000 Business Copy software so that there is consistent copy available for recovery. Change the Continuous Access link state to resume mode. This initiates the normalization upon Continuous Access link recovery.
Rolling upgrade Metrocluster configurations follow the HP Serviceguard rolling upgrade procedure. The HP Serviceguard documentation includes rolling upgrade procedures to upgrade the Serviceguard version, operating environment, and other software. This Serviceguard procedure, along with recommendations, guidelines, and limitations, is applicable to Metrocluster versions. For more information on completing a rolling upgrade of HP Serviceguard, see the latest edition of Managing HP Serviceguard B.12.00.
6 Troubleshooting Troubleshooting Metrocluster Analyse Metrocluster and SMI-S/Command View log files to understand the problem in the respective environment and follow a recommended action based on the error or warning messages. Metrocluster log Make sure you periodically review the following files for messages, warnings, and recommended actions. HP recommends to review these files after each system, data center, and/or application failures: • View the system log at /var/log/messages.
A Checklist and worksheet for configuring a Metrocluster with Continuous Access EVA P6000 for Linux Disaster Recovery Checklist Use this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for two main data centers and a third location configuration. Data centers A and B have the same number of nodes to maintain quorum in case an entire data center fails.
Network Polling Interval: ______________________________________________ AutoStart Delay: ______________________________________________________ Package Configuration Worksheet Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the latest version of the Managing HP Serviceguard B.12.00.00 for Linux manual available at http://www.hp.com/go/linux-serviceguard-docs.
DC1 DC1 DC2 DC2 DC2 SMIS List: ______________________________________________________________ HOST List: _____________________________________________________________ Storage Array WWN: ___________________________________________________ SMIS List: ______________________________________________________________ HOST List: _____________________________________________________________ P6000/EVA Configuration Checklist Use the following checklist to verify the Metrocluster with Continuous Access EVA P6000 for
B Package attributes for Metrocluster with Continuous Access EVA P6000 for Linux This appendix lists all Package Attributes for this product. HP recommends that you use the default settings for most of these variables, so exercise caution when modifying them: CLUSTER_TYPE This parameter identifies the type of disaster recovery services cluster: Metrocluster or Continentalclusters.
DC1_STORAGE_WORLD_WIDE_NAME The world wide name of the HP P6000/EVA storage system that resides in Data Center 1. This storage system name is defined when the storage is initialized. DC1_SMIS_LIST A list of the management servers that reside in Data Center 1. Multiple names can be defined by using commas as separators. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in their order of specification.
C smiseva.conf file ################################################################# # # # smiseva.conf CONFIGURATION FILE (template) # # for use with the smispasswd utility # # in the Metrocluster CA EVA Environment # # # # Note: This file MUST be edited before it can be used. # # For complete details about SMI-S configuration for use # # with Metrocluster CA EVA, consult the manual "Designing # # Disaster Tolerant High Availability Clusters.
D mceva.conf file ############################################################## ## mceva.conf CONFIGURATION FILE (template) for use with ## ## the evadiscovery utility in the Metrocluster Continuous ## ## Access EVA Environment. ## ## Version: A.01.00 ## ## Note: This file MUST be edited before it can be used. ## ## For complete details about EVA configuration for use ## ## with Metrocluster Continuous Access EVA, consult the ## ## manual “Designing Disaster Tolerant High Availability ## ## Clusters”.
## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## , storage name and DR group name. ## ## Note: All the storage and DR Group names should be ## enclosed in double quotes (““), otherwise the ## evadiscovery command will not detect them.
E Identifying the devices to be used with packages Identifying the devices created in P6000/EVA After the WWN of the P6000/ EVA virtual volume is obtained, find the WWN of the disk using lsscsi or scsi_id commands. For Example: # lsscsi | grep HSV | grep disk | awk '{print $6}' After the P6000/EVA disks are retrieved by lsscsi command, run the scsi_id command to find the WWN of the P6000/EVA disk.
Glossary A, B arbitrator Nodes in a disaster tolerant architecture that act as tie-breakers in case all of the nodes in a data center go down at the same time. These nodes are full members of the Serviceguard cluster and must conform to the minimum requirements. The arbitrator must be located in a third data center to ensure that the failure of an entire data center does not bring the entire cluster down. See also quorum server.
disaster recovery The process of restoring access to applications and data after a disaster. Disaster recovery can be manual, meaning human intervention is required, or it can be automated, requiring little or no human intervention. disaster tolerant The characteristic of being able to recover quickly from a disaster. Components of disaster tolerance include redundant hardware, data replication, geographic dispersion, partial or complete recovery automation, and well-defined recovery procedures.
S split-brain syndrome When a cluster reforms with equal numbers of nodes at each site, and each half of the cluster thinks it is the authority and starts up the same set of applications, and tries to modify the same data, resulting in data corruption. Serviceguard architecture prevents split-brain syndrome in all cases unless dual cluster locks are used. sub-clusters Sub-clusters are clusterwares that run above the Serviceguard cluster and comprise only the nodes in a Metrocluster site.
Index C cluster continental, 39 Serviceguard, 11 cmviewcl command, 12 configuration environment, 9 configure web-based tool, 12 Configuring Generic Failover Attributes, 24, 25 Metrocluster EVA Parameters, 25 Continentalclusters, 39 Metrocluster, 26 D Disaster Recovery Continentalclusters worksheet, 36 Performing, 34 F failover_policy site_preferred, 12 FORCEFLAG, 30 H hardware software, 11 Hierarchical Storage Virtualization (HSV) terminology, 5 HP P6000 Continuous Access, 5 M Metrocluster configuration