Best Practices Dell EMC SC Series: VMware Site Recovery Manager Best Practices Abstract This document offers best practices for automated disaster recovery of virtualized workloads using Dell EMC™ SC Series arrays, Replay Manager, array-based replication, and VMware® Site Recovery Manager™ with varying levels of consistency.
Revisions Revisions Date Description August 2011 Initial release November 2011 Updated for Storage Center OS 5.5.4 December 2011 Updated replication sections March 2012 SRA version correction July 2012 Added warning October 2012 Updated diagrams October 2012 Updated for SRM 5.1 April 2013 Updated for Storage Center OS 6.
Table of contents Table of contents Revisions.............................................................................................................................................................................2 Acknowledgments ...............................................................................................................................................................2 Table of contents ................................................................................................
Table of contents 5.9 Using application- and data-consistent frozen snapshots with SRM ...............................................................26 5.10 Custom recovery tasks .....................................................................................................................................27 6 6.1 Configuring the array managers .......................................................................................................................28 6.2 Creating array pairs ......
Executive summary Executive summary Data center consolidation by way of x86 virtualization is a trend that has gained tremendous momentum and offers many benefits. Although the physical nature of a server is transformed once it is virtualized, the necessity for data protection remains. Virtualization opens the door to new and flexible opportunities in data protection, data recovery, replication, and business continuity.
Introduction 1 Introduction This paper provides configuration examples, tips, recommended settings, and other storage guidelines to follow while integrating VMware Site Recovery Manager (SRM) with Dell EMC SC Series solutions. In addition to basic configuration, this document also answers frequently asked questions about VMware interactions with Site Recovery Manager. It is recommended to read the Site Recovery Manager documentation provided on vmware.com before beginning an SRM implementation.
Setup prerequisites 2 Setup prerequisites Verify system requirements prior to building or upgrading your environment. To view the current product support matrix, see the section, "Storage Replication Adapter for VMware SRM", in the Dell Storage Manager Administrator’s Guide. 2.1 Storage Replication Adapter The SC Series Storage Replication Adapter (SRA) must be installed on each SRM server.
Site Recovery Manager architecture 3 Site Recovery Manager architecture This section provides array-based replication architecture for single- and dual-protected sites as well as Live Volume stretched storage. vSphere replication architecture is also included for comparison purposes. 3.1 Array-based replication: single protected site This configuration (shown in Figure 1) is generally used when the secondary site does not have any virtual machines that need to be protected by SRM.
Site Recovery Manager architecture 3.2 Array-based replication: dual protected site This configuration (shown in Figure 2) is generally used when both sites have virtual machines that need to be protected by SRM. In this example, each site has virtual machines that must be protected by SRM. Each site replicates its virtual machines to the opposing site where they can be recovered.
Site Recovery Manager architecture 3.3 Array-based replication: Live Volume stretched storage Live Volume stretched storage typically depicts two active sites where one site may be impacted by an unplanned outage. A DSM Data Collector should be available for each recovery site.
Site Recovery Manager architecture Architecture for sites with array-based replication and Live Volume stretched storage with a single Data Collector network reachable at a third site. Note: See the Dell Storage Manager Release Notes for the latest information about stretched storage and SRM configuration.
Site Recovery Manager architecture 3.4 vSphere replication: single protected site vSphere replication can be used in addition to or in place of array-based replication. Two of the main advantages of vSphere replication over array-based replication are: • • A granular selection of individual powered-on VMs are replicated instead of entire datastores of VMs. vSphere datastore objects abstract the underlying storage vendor, model, protocol, and type.
Site Recovery Manager architecture 3.5 vSphere replication: dual protected site The architectural changes with vSphere replication are carried into the active/active site model. In each vSphere replication architecture diagram (Figure 4 and Figure 5), replication is handled by the vSphere hosts using the vSphere network stack. An array-based SRA is not present in this architecture. Note that these figures do not provide representation of all the components of vSphere replication.
Dell Storage Manager configuration 4 Dell Storage Manager configuration This section provides best practices for configuring Dell Storage Manager (DSM). 4.1 Data Collector configuration As illustrated in section 3, DSM is a critical piece to the SRM infrastructure because the Data Collector processes all of the calls from the SRA and relays them to the SC Series arrays to perform the workflow tasks.
Dell Storage Manager configuration Keep in mind that each Data Collector, whether primary or remote, maintains its own user access database. In a typical active/DR site configuration, a single Data Collector server is configured and deployed, managing both source and destination site arrays. A single set of credentials is needed to register that Data Collector server as an array manager for both sites.
Dell Storage Manager configuration Saving restore points should be completed prior to an SRM disaster recovery or planned migration recovery plan execution. Restore points should also be saved prior to a reprotect operation in SRM.
Dell Storage Manager configuration 4.6 Validating restore points The validate restore points process reconciles the list of saved restore points with the list of replication jobs and provides an opportunity to clean up restore points that may be orphaned or no longer needed. Select the Validate All option to validate restore points. Validate restore points is performed automatically during a save restore points operation.
Dell Storage Manager configuration making adjustments to accommodate such environments, see VMware KB article, Modify Settings to Run Large Site Recovery manager Environments. • storage.commandTimeout – Min: 0 Default: 300 This option specifies the timeout allowed (in seconds) for running SRA commands in array-basedreplication-related workflows. Increasing this value is typically required for larger environments or environments with Live Volume stretched storage.
Dell Storage Manager configuration The value can also be configured per cluster by editing the srmMaxBootShutdownOps in vSphere DRS Advanced Options. This value will override a value specified in the vmare-dr.xml file. • defaultMaxBootAndShutdownOpsPerHost – Default: off This option specifies the maximum number of concurrent power-on operations performed by SRM at the host object level. Enable by specifying a numerical value (such as 4) by modifying the vmwaredr.xml file.
Configuring replication 5 Configuring replication SC Series replication, in coordination with Site Recovery Manager (SRM), can provide a robust and scalable disaster recovery solution. Since each replication method affects recovery differently, choosing the correct method to meet business requirements is important. A brief summary of the different options is provided in this section. 5.
Configuring replication abstraction layer to the replication to allow mapping of an abstracted volume derived across two SC Series arrays. SRM support for stretched storage with Live Volume was added in DSM 2016 R1 and further improved in DSM 2016 R3.11. Supported configurations are asynchronous replication or synchronous high availability replication with non-uniform storage mapping to hosts.
Configuring replication Consistency states of frozen snapshot replications during plan execution 1. Once a snapshot is taken of the source volume, the delta changes begin transferring to the destination immediately. The consistency state of the data within this snapshot is dependent on whether or not the application had the awareness to quiesce the data before the snapshot was taken. 2. During a recovery plan test, a new snapshot is taken of the destination volume.
Configuring replication 5.6 Data consistency while replicating the active snapshot Figure 7 and the steps that follow describe the consistency states of replications during plan execution while replicating the active snapshot. Consistency states of active snapshot replications during plan execution 1. As writes are committed to the source volume, they are almost simultaneously transferred to the destination and stored in the active snapshot.
Configuring replication 3. Once the SRM recovery snapshot has been taken, a view volume is created from that snapshot. 4. The view volume is then presented to the vSphere host (or hosts) at the DR site for the SRM to begin a test execution of the recovery plan. During an actual disaster recovery or planned migration execution, a view volume from a snapshot is not mounted to the remote vSphere hosts. Instead, the destination volume itself is mounted to the remote hosts. This change in behavior from SRM 4.
Configuring replication 5.8 SRM selectable snapshot SRM selectable snapshot is a feature that is built into Unisphere Central. Because multiple methods of replication are supported by SC Series storage, this feature determines whether the active snapshot or last frozen snapshot is used when VMware SRM initiates a failover or test failover.
Configuring replication 5.9 Using application- and data-consistent frozen snapshots with SRM There are a number of methods available for creating a frozen snapshot (replay) on SC Series storage. Once replicated, the snapshot may be used by SRM. While some methods of snapshot creation result in crashconsistent data contained within the snapshot, other methods may be employed that result in application or data consistency within the snapshot. For example, Replay Manager 7.
Configuring replication 5.10 Custom recovery tasks If the environment requires a custom recovery strategy, both Dell EMC storage and VMware have robust sets of PowerShell cmdlets to customize the recovery steps where needed. The Dell EMC storage cmdlets can control the snapshot selection, view volume creation, volume mappings, and even modify replications.
Site Recovery Manager configuration 6 Site Recovery Manager configuration This section provides best practices for configuring the Site Recovery Manager. 6.1 Configuring the array managers To allow SRM to manage SC Series storage, the SRA must be able to communicate with the DSM Data Collector. Array manager configuration can be performed from the Array Managers module. An array manager must be added for each site in the unified interface.
Site Recovery Manager configuration The protected site array managers and the recovery site array managers must both be configured for pairing. Depending on the architecture, either a single DSM Data Collector can be added for both sites, or a Data Collector and Remote Data Collector model can be deployed. Single Data Collector: Specify the same single Data Collector for both the protected site array manager and the recovery site array manager.
Site Recovery Manager configuration 2. Provide the local SC Series serial number (comma separate multiple local serial numbers) and DSM connection parameters for the local array manager. Note: To avoid Stretched storage (Live Volume) compatibility and operational issues, it is a best practice to provide the SC Series System Serial Number in the Storage Center Filter field. The serial number for the local array is used with the Local array manager.
Site Recovery Manager configuration 3. Provide the remote SC Series serial number (comma separate multiple remote serial numbers) and DSM connection parameters for the remote array manager.
Site Recovery Manager configuration 6.2 Creating array pairs Once an array manager has been added to each of the two sites in SRM, the arrays must be paired so that replicated volumes can be discovered by SRM as eligible devices. In older versions of SRM, pairing was an action that was performed after the initial installation of SRM. However, as of SRM 5.8, pairing can be performed as part of the process of adding array managers to sites.
Site Recovery Manager configuration Select Discover Devices as shown on the following screen to invoke an SRA query to the DSM Data Collector to obtain the newest array-based replicated device information. 6.4 Creating protection groups If not completed already, create a small VMFS datastore at the disaster recovery site as a placeholder for VM configuration files.
Site Recovery Manager configuration which provides automated sub-LUN tiering for virtual machines without interfering with SRM protection groups. 6.5 Creating recovery plans When testing or running recovery plans, SRM does not have integrated mechanisms to determine whether the replication volumes are fully synced before the storage is prepared for recovery. In other words, there may be in-flight data that is actively replicated to the secondary site influencing the outcome of the recovery.
Site Recovery Manager configuration For example, integrate an SC Series REST API or PowerShell script into the recovery plan to take current snapshots of all the volumes and make sure the most recent data has been replicated (see appendix A for examples). When the recovery plan executes, it pauses. However, recovery plan execution will not pause at a command on an SRM server step.
Testing a recovery plan 7 Testing a recovery plan Testing the recovery plan is not disruptive to the storage replications, production volumes, and VMs because the test recoveries use SC Series view volumes created from snapshots (replays) when running the recovery plan tests. This means that when testing a recovery plan, any tests, changes, or updates can be performed on the recovered virtual machines because they will be discarded when the test recovery plan cleanup takes place.
Testing a recovery plan Acknowledge the safety precaution message to execute a live plan. Review the success of the recovery plan after completion.
Reprotect and failback 8 Reprotect and failback After virtual machines are migrated from one site to another using either the disaster recovery or planned migration features in SRM, they are in an active running state on the network at the alternate site. However, they are vulnerable to a site failure with no SRM protection. Previous versions of SRM required a manual reprotection of the virtual machines at the recovery site.
Conclusion 9 Conclusion VMware vSphere, Site Recovery Manager, and Dell EMC SC Series arrays combine to provide a highly available business platform for automated disaster recovery with the best possible RTO and RPO, as well as planned migrations for your virtualized data center.
Example scripts A Example scripts A.1 REST API script: TakeSnapshot.py This is an example REST API script that can be folded into an SRM recovery plan. It leverages the Dell RESTful API to take a snapshot (replay) of the source replication system volume to make sure that the most current snapshot is replicated to the DR site. # # main.
Example scripts # login to DSM instance payload = {} REST = '/ApiConnection/Login' completeURL = '%s%s' % (baseURL, REST if REST[0] != '/' else REST[1:]) print connection.post(completeURL, data=json.dumps(payload, ensure_ascii=False).encode('utf-8'), headers=header, verify=verify_cert) # capture API connection instanceId payload = {} REST = '/ApiConnection/ApiConnection' completeURL = '%s%s' % (baseURL, REST if REST[0] != '/' else REST[1:]) json_data = connection.
Example scripts # create a replay object from Volume_Name_x volume object payload = {} payload['Description'] = 'Replay of Volume_Name_x' payload['ExpireTime'] = ‘60’ # in minutes, 0 = Never Expire REST = '/StorageCenter/ScVolume/%s/CreateReplay' % volList['Volume_Name_x']['instanceId'] completeURL = '%s%s' % (baseURL, REST if REST[0] != '/' else REST[1:]) json_data = connection.post(completeURL, data=json.dumps(payload, ensure_ascii=False).
Example scripts A.3 SC Series command set PowerShell script: TakeSnapshot.ps1 This script leverages the SC Series command set to take a snapshot of the source replication system volume in an effort to make sure that the most current snapshot is replicated to the DR site. $SCHostname = "sc12.techsol.
Example scripts # Get the volume $Volume = Get-DellScVolume -Connection $Connection ` -ScName $ScName ` -VolumeFolderPath $VolumeFolderPath ` -Name $VolumeName # Create a Snapshot that will expire in 1 day $OneDayInMinutes = 1 * 24 * 60 # 1 day * 24 hours/day * 60 minutes/hour $Snapshot = New-DellScVolumeReplay -Connection $Connection ` -Instance $Volume ` -Description $SnapshotDescription ` -ExpireTime $OneDayInMinutes This PowerShell script will connect to a DSM which is managing an SC Series system named
Additional resources B Additional resources B.1 Technical support and resources Dell.com/support is focused on meeting customer needs with proven services and support. Storage technical documents and videos provide expertise that helps to ensure customer success on Dell EMC storage platforms. B.2 VMware support For VMware support, see the following resources: • • • • 45 VMware.