Building Disaster Recovery Serviceguard Solutions Using Metrocluster with Continuous Access EVA P6000 for Linux B.01.00.
Legal Notices © Copyright 1995-2012 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use, or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor’s standard commercial license. The information contained herein is subject to change without notice.
Contents 1 Introduction...............................................................................................5 Overview of P6000/EVA and HP P6000 Continuous Access Concepts............................................5 Copy sets............................................................................................................................5 Data replication Groups (DR Groups).....................................................................................5 Write modes.................
5 Administering Metrocluster.........................................................................31 Adding a node to Metrocluster ................................................................................................31 Maintaining EVA P6000 Continuous Access replication in Metrocluster.........................................31 HP P6000 Continuous Access Link failure scenarios................................................................31 Planned maintenance.................................
1 Introduction This document describes how to configure data replication solutions using HP P6000/EVA disk Arrays to provide disaster recovery for Serviceguard clusters over long distances. It also gives an overview of the HP P6000 Continuous Access software and the additional files that integrate HP P6000/EVA disk Arrays with Metrocluster.
The replicating direction of a DR group is always from a source to a destination. In bidirectional replication, an array can have both source and destination virtual disks that will reside in separate DR groups. That is, one virtual disk cannot be both a source and destination simultaneously. Bidirectional replication enables you to use both arrays for primary storage while they provide disaster protection for another site.
DR Group write history log The DR group write history log is a virtual disk that stores a DR group's host write data. The log is created when you create the DR group. Once the log is created, it cannot be moved. In synchronous mode or basic asynchronous mode, the DR group write history log stores data when replication to the destination DR group is stopped because the destination DR group is unavailable or suspended. This process is called logging.
shutdown of the system before the redundant system takes over. An unplanned failover occurs when a failure or outage occurs that may not allow an orderly transition of roles. NOTE: Failover can take other forms: • Controller failover — The process that occurs when one controller in a pair assumes the workload of a failed or redirected controller in the same array. • Fabric or path failover — I/O operations transfer from one fabric or path to another.
For more information on remote data replication concepts and planning a remote replication solution, see HP P6000 Continuous Access Implementation Guide available at http://www.hp.com/ support/manual—>storage -> Storage Software -> Storage Replication Software -> HP P6000 Continuous Access Software. Overview of a Metrocluster with HP P6000 Continuous Access configuration A Metrocluster is configured with the nodes at Site A and Site B.
Figure 1 Sample Configuration of Metrocluster with Continuous Access EVA P6000 for Linux Quorum Server A B Node 1 Node 2 Node 3 Node 4 Metrocluster Site A Router Site B Router Synchronous / Enhanced Asynchronous EVA Disk Array EVA Disk Array DC1 for App A DC2 for App B DC1 for App B DC2 for App A Figure 1 depicts an example of two applications distributed in a Metrocluster with Continuous Access EVA P6000 for Linux environment balancing the server and replication load.
2 Configuring an application in a Metrocluster environment Installing the necessary hardware and software When the following procedures are complete, an adoptive node will be able to access the data belonging to a package after it fails over. Setting up the storage hardware 1. 2. 3. Before you configure this product, you must correctly cable the P6000/EVA with redundant paths to each node in the cluster that will run packages accessing data on the array.
The following is a sample of the site definition in a Serviceguard cluster configuration file: SITE_NAME san_francisco SITE_NAME san_jose NODE_NAME SFO_1 SITE san_francisco ..... NODE_NAME SFO_2 SITE san_francisco ........ NODE_NAME SJC_1 SITE san_jose ....... NODE_NAME SJC_2 SITE san_jose ........ Use cmviewcl command to view the list of sites that are configured in the cluster and their associated nodes.
at http://www.hp.com/support/manuals -> storage-> Storage Software-> Storage Device Management Software-> HP StorageWorks HP P6000 Command View Software. Using the Command View (CV) web user interface create VDISKS, create DR GROUP using the VDISKS and present those VDISKS to the connected host. After a DR group is created set the destination host access field as Read only using Command View GUI.
on the Management Servers and DR groups that will be used in Metrocluster environment. These tools must be used to create or modify the map file caeva.map. This product provides two utility tools for users to provide information about the SMI-S service running on the Management Servers and DR groups that will be used in Metrocluster environment. smispasswd Metrocluster retrieves storage information from the SMIS-Server for its startup or failover operations.
Re-enter password of : ********** All the Management Server information has been successfully generated. NOTE: The desired location is where the modified smiseva.conf file is located. For more information on configuring the username and password for SMI-S on the management server, see the HP P6000 Command View Installation Guide. After the passwords are entered, the configuration is written to the caeva.
1 CIMOM - Common Information Model Object Manager, a key component that routes information between providers and clients. This command adds a new record if it does not find the in the mapping file. Otherwise, it only updates the record.
NOTE: The desired location is where you have placed the modified mceva.conf file. The command generates the mapping data and stores it in caeva.map file. The mapping file caeva.map contains information of the Management Servers and information of the HP P6000/EVA storage cells and DR Groups. Displaying information about storage devices Use the evadiscovery command to display information about the storage systems and DR groups in your configuration.
NOTE: While using Cluster Device Special Files (cDSF) feature, the device special file name is same on all nodes for a source and target VDisk. Identifying special device files The following is the sample output of the evainfo command on Linux environment.
5. If required, deactivate the Volume Groups on the primary system and remove the tag. # vgchange -a n # vgchange --deltag $(uname -n) NOTE: Use the vgchange --deltag command only if you are implementing volume-group activation protection. Remember that volume-group activation protection if implemented, must be done on every node. 6. Run the vgscan command on all the nodes to make the LVM configuration visible, and to create the LVM database. # vgscan 7.
# cmmakepkg -m dts/mccaeva -m tkit/oracle/oracle temp.config NOTE: Metrocluster is usually used with applications such as Oracle. So, the application toolkit module must also be included when Metrocluster is used in conjunction with an application. You must make sure to specify the Metocluster module before specifying the toolkit module. 2. Edit the following attributes in the temp.config file: Table 4 Temp.
The site_preferred value implies that when a Metrocluster package needs to fail over, it fails over to a node in the same site as the node it last ran. If there is no other configured node available within the same site, the package fails over to a node on another site. The site_preferred_manual failover policy provides automatic failover of packages within a site and manual failover across sites. Configure a cluster with sites to use either of these policies.
Figure 4 Creating modular package 4. If the product Metrocluster with Continuous Access EVA P6000 for Linux Toolkit is installed, you will be prompted to configure a Metrocluster package. Select the dts/mccaeva module, and then click Next. Figure 5 Selecting Metrocluster module 5. 6. You will be prompted next to include any other toolkit modules. In case, application being configured has a Serviceguard toolkit, select the appropriate toolkit; otherwise, move to the next screen. Enter the package name.
7. Select additional modules if required by the application. For example, if the application uses LVM volumegroups or VxVM diskgroups, select the volume_group module. Click Next. Figure 7 Selecting additional modules for the package 8. Review the node order in which the package will start, and modify other attributes, if needed. Click Next. Figure 8 Configuring generic failover attributes 9. Configure the attributes for a Metrocluster package.
Figure 9 Specifying the list of management servers for DC1 and DC2. 10. Enter the values for other modules selected in step 7. 11. After you enter the values for all the modules, review all the inputs given to the various attributes in the final screen. If you want to validate the package configuration click on Check Configuration, else click on Apply Configuration.
3 Metrocluster features Data replication storage failover preview In an actual failure, packages are failed over to the standby site. In package startup, the underlying storage is failed over based on the parameters defined in Metrocluster package. The storage failover might fail under the following conditions: • Incorrect configuration or setup of Metrocluster and data replication environment.
Live Application Detach There may be circumstances in which you want to do maintenance that involves halting a node, or the entire cluster, without halting or failing over the affected packages. Such maintenance might consist of anything short of rebooting the node or nodes, but a likely case is networking changes that will disrupt the heartbeat. New command options in Serviceguard for Linux A.11.20.
4 Understanding failover/failback scenarios Metrocluster package failover/failback scenarios This section discusses the package start up behaviors in various failure scenarios depending on DT_APPLICATION_STARTUP_POLICY and replication mode. Table 6 describes the list of failover scenarios.
Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY Resolution progress. Because the complete. It starts WAIT_TIME is set to xx minutes, up immediately. the program will wait for completion of the log copy … The DR Group is in merging state. … The WAIT_TIME has expired. Error - Failed to failover and swap the role of the device group. The package is NOT allowed to start up.
Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode DT_APPLICATION_STARTUP_POLICY Resolution of the destination Vdisks before the copy starts so that there is consistent copy available for recovery. Remote failover when the link is manually suspended Synchronous DR Group does not fail over and the package does not start. The Resume the link, behavior is not affected by the presence of the FORCEFLAG file.
Table 6 Replication Modes and Failover Scenarios (continued) Failover Scenario Replication Mode Remote failover Synchronous when the DR Enhanced Group is in Asynchronous RUNDOWN state and link is down 30 DT_APPLICATION_STARTUP_POLICY Resolution N/A DR Group fails over and the package is started. Understanding failover/failback scenarios DR Group is not failed over and To forcefully start the package is not started.
5 Administering Metrocluster Adding a node to Metrocluster To add a node to Metrocluster with Continuous Access EVA P6000 for Linux: 1. To add the node in a cluster, edit the Serviceguard cluster configuration file, and then apply the configuration: # cmapplyconf -C cluster.config 2. Copy caeva.map file to the new node. For Red Hat: # scp/usr/local/cmcluster/conf/mccaeva/caeva.map\ :/usr/local/cmcluster/conf/mccaeva/caeva.map For SUSE: # scp/opt/cmcluster/conf/mccaeva/caeva.
2. 3. P6000 Continuous Access does not resynchronize the source and destination Vdisks upon links recovery. This helps in maintaining data consistency. Take a local replication copy of the destination Vdisks using HP P6000 Business Copy software so that there is consistent copy available for recovery. Change the Continuous Access link state to resume mode. This initiates the normalization upon Continuous Access link recovery.
Rolling upgrade Metrocluster configurations follow the HP Serviceguard rolling upgrade procedure. The HP Serviceguard documentation includes rolling upgrade procedures to upgrade the Serviceguard version, operating environment, and other software. This Serviceguard procedure, along with recommendations, guidelines, and limitations, is applicable to Metrocluster versions. For more information on completing a rolling upgrade of HP Serviceguard, see the latest edition of Managing HP Serviceguard A.11.20.
6 Troubleshooting Troubleshooting Metrocluster Analyse Metrocluster and SMI-S/Command View log files to understand the problem in the respective environment and follow a recommended action based on the error or warning messages. Metrocluster log Make sure you periodically review the following files for messages, warnings, and recommended actions. HP recommends to review these files after each system, data center, and/or application failures: • View the system log at /var/log/messages.
A Checklist and worksheet for configuring a Metrocluster with Continuous Access EVA P6000 for Linux Disaster Recovery Checklist Use this checklist to make sure you have adhered to the disaster tolerant architecture guidelines for two main data centers and a third location configuration. Data centers A and B have the same number of nodes to maintain quorum in case an entire data center fails.
Network Polling Interval: ______________________________________________ AutoStart Delay: ______________________________________________________ Package Configuration Worksheet Use this package configuration worksheet either in place of, or in addition to the worksheet provided in the latest version of the Managing HP Serviceguard A.11.20.10 for Linux manual available at http://www.hp.com/go/linux-serviceguard-docs.
DC1 DC1 DC2 DC2 DC2 SMIS List: ______________________________________________________________ HOST List: _____________________________________________________________ Storage Array WWN: ___________________________________________________ SMIS List: ______________________________________________________________ HOST List: _____________________________________________________________ P6000/EVA Configuration Checklist Use the following checklist to verify the Metrocluster with Continuous Access EVA P6000 for
B Package attributes for Metrocluster with Continuous Access EVA P6000 for Linux This appendix lists all Package Attributes for this product. HP recommends that you use the default settings for most of these variables, so exercise caution when modifying them: CLUSTER_TYPE This parameter identifies the type of disaster recovery services cluster: Metrocluster or Continentalclusters.
DC1_STORAGE_WORLD_WIDE_NAME The world wide name of the HP P6000/EVA storage system that resides in Data Center 1. This storage system name is defined when the storage is initialized. DC1_SMIS_LIST A list of the management servers that reside in Data Center 1. Multiple names can be defined by using commas as separators. If a connection to the first management server fails, attempts are made to connect to the subsequent management servers in their order of specification.
C smiseva.conf file ################################################################# # # # smiseva.conf CONFIGURATION FILE (template) # # for use with the smispasswd utility # # in the Metrocluster CA EVA Environment # # # # Note: This file MUST be edited before it can be used. # # For complete details about SMI-S configuration for use # # with Metrocluster CA EVA, consult the manual "Designing # # Disaster Tolerant High Availability Clusters.
D mceva.conf file ############################################################## ## mceva.conf CONFIGURATION FILE (template) for use with ## ## the evadiscovery utility in the Metrocluster Continuous ## ## Access EVA Environment. ## ## Version: A.01.00 ## ## Note: This file MUST be edited before it can be used. ## ## For complete details about EVA configuration for use ## ## with Metrocluster Continuous Access EVA, consult the ## ## manual “Designing Disaster Tolerant High Availability ## ## Clusters”.
## ## ## ## ## ## ## ## ## ## ## ## ## ## ## ## 42 , storage name and DR group name. ## ## Note: All the storage and DR Group names should be ## enclosed in double quotes (““), otherwise the ## evadiscovery command will not detect them.
E Identifying the devices to be used with packages Identifying the devices created in P6000/EVA After the WWN of the P6000/ EVA virtual volume is obtained, find the WWN of the disk using lsscsi or scsi_id commands. For Example: # lsscsi | grep HSV | grep disk | awk '{print $6}' After the P6000/EVA disks are retrieved by lsscsi command, run the scsi_id command to find the WWN of the P6000/EVA disk.
Glossary A, B arbitrator Nodes in a disaster tolerant architecture that act as tie-breakers in case all of the nodes in a data center go down at the same time. These nodes are full members of the Serviceguard cluster and must conform to the minimum requirements. The arbitrator must be located in a third data center to ensure that the failure of an entire data center does not bring the entire cluster down. See also quorum server.
disaster recovery The process of restoring access to applications and data after a disaster. Disaster recovery can be manual, meaning human intervention is required, or it can be automated, requiring little or no human intervention. disaster tolerant The characteristic of being able to recover quickly from a disaster. Components of disaster tolerance include redundant hardware, data replication, geographic dispersion, partial or complete recovery automation, and well-defined recovery procedures.
S split-brain syndrome When a cluster reforms with equal numbers of nodes at each site, and each half of the cluster thinks it is the authority and starts up the same set of applications, and tries to modify the same data, resulting in data corruption. Serviceguard architecture prevents split-brain syndrome in all cases unless dual cluster locks are used. sub-clusters Sub-clusters are clusterwares that run above the Serviceguard cluster and comprise only the nodes in a Metrocluster site.
Index C cluster continental, 38 Serviceguard, 11 cmviewcl command, 12 configuration environment, 9 configure web-based tool, 12 Configuring Generic Failover Attributes, 24 Metrocluster EVA Parameters, 24 Continentalclusters, 38 Metrocluster, 25 configuration, 17 evadiscovery, 13 Storage Cells DR Groups, 16 storage devices configuration, 17 V VDisk Cluster Device Special Files (cDSF), 18 W worksheet Continentalclusters, 35 D Disaster Recovery Continentalclusters worksheet, 35 Performing, 33 F failover_p