Site Recovery Manager Administration Guide vCenter Site Recovery Manager 4.
Site Recovery Manager Administration Guide You can find the most up-to-date technical documentation on the VMware Web site at: http://www.vmware.com/support/ The VMware Web site also provides the latest product updates. If you have comments about this documentation, submit your feedback to: docfeedback@vmware.com Copyright © 2008, 2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws.
Contents About This Book 5 1 Administering VMware vCenter Site Recovery Manager 7 Protected Sites and Recovery Sites 7 Array-Based Replication 8 About Protection Groups and Recovery Plans 10 Understanding Recovery and Test Recovery 11 Operational Limits of Site Recovery Manager 12 About Failback 12 SRM and vCenter 13 About the Site Recovery Manager Database 14 SRM Licensing 14 SRM Authentication 14 How SRM Uses Network Ports 16 Site Recovery Manager Roles and Permissions 16 2 Installing and Updating Site
Site Recovery Manager Administration Guide Limitations on Recovery of Snapshots and Linked Clones 35 Create a Recovery Plan 35 Edit a Recovery Plan 36 Remove a Recovery Plan 37 4 Test Recovery, Recovery, and Failback 39 Test a Recovery Plan 39 Pause, Resume, or Cancel a Test 40 Run a Recovery Plan 40 Configuring and Executing Failback 41 Review and Execute Post-Failover Cleanup Tasks 42 Reconfigure Replication 42 Reconfigure SRM to Enable Failback to the Protected Site Restore the Original Configuration
About This Book ® VMware vCenter Site Recovery Manager (SRM) is an extension to VMware vCenter that enables integration with array-based replication, discovery and management of replicated datastores, and automated migration of inventory from one vCenter to another.
Site Recovery Manager Administration Guide Technical Support and Education Resources The following technical support resources are available to you. To access the current version of this book and other books, go to http://www.vmware.com/support/pubs. Online and Telephone Support To use online support to submit technical support requests, view your product and contract information, and register your products, go to http://www.vmware.com/support.
Administering VMware vCenter Site Recovery Manager 1 VMware vCenter Site Recovery Manager is a business continuity and disaster recovery solution that helps you plan, test, and execute a scheduled migration or emergency failover of vCenter inventory from one site to another.
Site Recovery Manager Administration Guide Site Pairing The protected and recovery sites must be paired before you can use SRM. Site pairing includes three main steps: 1 Exchange of authentication information between the two sites. 2 Discovery of the replicated storage arrays that support the protected site, and discovery of peer arrays at the recovery site. 3 Discovery of the replicated devices supported by the arrays, and mapping of these devices to datastores that support virtual machines.
Chapter 1 Administering VMware vCenter Site Recovery Manager You cannot designate a third site as a recovery site for one that is already paired with another site. If you want to use SRM to provide business continuity and disaster recovery services at a recovery site, you must configure that site as a protected site that uses its own array managers to replicate data to the other member of the site pair.
Site Recovery Manager Administration Guide About Protection Groups and Recovery Plans A protection group is a collection of virtual machines and templates that use the same replicated datastore or datastore group. A recovery plan specifies how the virtual machines in a protection group are recovered. When the replicated devices that support a datastore group failover, that operation affects all of the virtual machines and templates that use the datastores in the group.
Chapter 1 Administering VMware vCenter Site Recovery Manager machines. If you need to override inventory mappings for a few members of a protection group, use the vSphere Client to connect to the recovery site and edit the network settings of the placeholders or move them to a different folder or resource pool. If a member of a protection group loses its protection, its placeholder is removed from the recovery site until the protection has been restored.
Site Recovery Manager Administration Guide How SRM Interacts with DPM and DRS During Recovery Distributed Power Management (DPM) is a VMware facility that manages power consumption by ESX hosts. Distributed Resource Scheduler (DRS) is a VMware facility that manages the assignment of virtual machines to ESX hosts. When DPM and DRS are enabled on a recovery site cluster, SRM temporarily disables DPM for the cluster and ensures that all hosts in it are powered on before recovery begins.
Chapter 1 Administering VMware vCenter Site Recovery Manager A typical failback has two phases. In the first phase, the protected and recovery sites switch roles, and the virtual machines are migrated from the recovery site to the protected site under the control of a recovery plan. In the second phase, the relationship of the protected and recovery sites is restored, so that future failovers migrate the protected virtual machines from the protected site to the recovery site.
Site Recovery Manager Administration Guide About the Site Recovery Manager Database The SRM server requires its own database, which it uses to store recovery plans, inventory information, and similar data. The SRM database is a critical part of any SRM installation. The database must be initialized and a database connection created before you can install SRM.
Chapter 1 Administering VMware vCenter Site Recovery Manager Credential-Based Authentication If you are using credential-based authentication, SRM stores a user name and password that you specify during installation, and then uses those credentials when connecting to vCenter or another SRM server. SRM also creates a special-purpose certificate for its own use. This certificate includes additional information that you supply during installation.
Site Recovery Manager Administration Guide How SRM Uses Network Ports SRM servers use several network ports to communicate with each other, with client plug-ins, and with vCenter. If any of these ports are in use by other applications or are blocked on your network, you must reconfigure SRM to use different ones. Table 1-3 lists the default network ports the SRM uses for intrasite (between hosts at a single site) and intersite (between hosts at the protected and recovery sites) communications.
Chapter 1 Administering VMware vCenter Site Recovery Manager n Recovery SRM Administrator—Configure arrays and create protection profiles. n Recovery Virtual Machine Administrator—Create virtual at the recovery site machines and add them to the resource pool. Also grants the ability to reconfigure and customize the recovery virtual machines when a recovery plan is run.
Site Recovery Manager Administration Guide Table 1-4.
Installing and Updating Site Recovery Manager 2 You must install an SRM server at the protected site and also at the recovery site. After the SRM servers are installed, you can download the client plug-in from either server to any vSphere Client. You use the SRM client plug-in to configure and manage SRM at each site. Prerequisites SRM requires the support of a vCenter server at each site. The SRM installer must be able to connect with this server during installation.
Site Recovery Manager Administration Guide The SRM database at each site holds information about virtual machine configurations, protection groups, and recovery plans. SRM cannot use the vCenter database because it has different database schema requirements, though you can use the vCenter database server to create and support the SRM database. Each SRM site requires its own instance of the SRM database. The database must exist before SRM can be installed.
Chapter 2 Installing and Updating Site Recovery Manager DB2 Server Configuration A DB2 Server configuration must meet specific requirements to support SRM. DB2 Server has the following configuration requirements when used as the SRM database: n When creating the database instance, specify utf-8 encoding. n Because DB2 uses Windows authentication, you must specify the database owner as a domain account.
Site Recovery Manager Administration Guide n vCenter Server Username—Enter the user name of an administrator of the specified vCenter server. n vCenter Server Password —Enter the password for the specified user name. When you click Next, the installer contacts the specified vCenter server and validates the information you supplied. 8 On the Certificate Type Selection page, select an authentication method.
Chapter 2 Installing and Updating Site Recovery Manager What to do next You can now install SRAs at each site. See “Install the Storage Replication Adapters,” on page 23. Install the Storage Replication Adapters A Storage Replication Adapter (SRA) is a program provided by an array vendor that enables SRM to work with a specific kind of array. You must install an appropriate SRM on the SRM server hosts at the protected and recovery sites.
Site Recovery Manager Administration Guide Prerequisites Before you begin the update, back up your current SRM database. The update wizard requires you to verify that the database is backed up, and pauses until you confirm that it is. Procedure 1 Log in to the server host on which you are installing SRM. Log in as a local administrator. 2 Download the SRM installation file to a folder on the host, or open a folder on the network that contains this file.
Chapter 2 Installing and Updating Site Recovery Manager 6 Click Install. 7 When the installation completes, click Finish. If the installation replaced any open files, you are prompted to shut down and restart Windows. Revert to a Previous Release To revert to a previous release, uninstall the current SRM server release from the protected and recovery sites, uninstall the SRM plug-in, and restore the SRM database from the backup you made before you updated the SRM server.
Site Recovery Manager Administration Guide 4 Click Repair on the Program Maintenance Options page. 5 On the VMware vCenter Server page, enter the following information: n vCenter Server Username—Enter the user name of an administrator of the specified vCenter server. n vCenter Server Password —Enter the password for the specified user name. You cannot use the installer's repair mode to change the vCenter server address or port.
Configuring the Protected and Recovery Sites 3 After you have installed SRM at the protected and recovery sites, you must connect the two sites to create a site pair, configure the array managers at each site, and configure SRM at each site. You use the SRM client plug-in to administer SRM. Site pairing requires vSphere admnistrative privileges at both sites. Prerequisites Before you can connect the protected and recovery sites, you must: 1 Install an SRM server at each site.
Site Recovery Manager Administration Guide Procedure 1 Open a vSphere client and connect to the vCenter server at the site that you want to designate as the protected site. Log in as a vSphere administrator. NOTE The recovery site must be the replication target of arrays managed by the SRA at the protected site. 2 On the vSphere Client Home page, click the Site Recovery icon. 3 In the Protection Setup area of the Summary window, navigate to the Connection line and click Configure.
Chapter 3 Configuring the Protected and Recovery Sites 4 In the navigation pane of the Advanced Settings window, click Licensing. 5 Enter the SRM license key in the Licensing.LicenseKey text box The first time you open the Licensing page, the evaluation key is displayed in the Licensing.LicenseKey text box. 6 Click OK to save your changes and close the Advanced Settings window. 7 Repeat the process to install a license key at the recovery site.
Site Recovery Manager Administration Guide 6 Type a name for the array in the Display Name field of the Add Array Manager window. Use any descriptive name that makes it easy for you to identify the storage associated with this array manager. 7 Fill in the remaining fields of the Add Array Manager window. These fields are created by the SRA. For more information about how to fill them in, see the documentation provided by your SRA vendor.
Chapter 3 Configuring the Protected and Recovery Sites 3 In the Recovery Setup area of the Summary window, navigate to the Recovery Plans line and click Repair Array Managers. 4 On the Recovery Site Array Managers page, click the Add, Remove, or Edit button to change the array manager information for the recovery site. Rescan Arrays to Detect Configuration Changes SRM checks arrays for changes to device configurations every 24 hours. However, if needed, you can force an array rescan at any time.
Site Recovery Manager Administration Guide 3 In the Protection Setup area of the Summary window, navigate to the Inventory Mappings line and click Configure. The Inventory Mappings page displays a tree of resources at the protected site and a corresponding tree of resources at the recovery site. For any protected site resource that does not have an inventory mapping, the corresponding item in the recovery site tree is listed as None Selected.
Chapter 3 Configuring the Protected and Recovery Sites Procedure 1 Open a vSphere Client and connect to the vCenter server at the protected site. Log in as a vSphere administrator. 2 On the vSphere Client Home page, click the Site Recovery icon. 3 In the Site Recovery tree view, navigate to the protection group that includes the virtual machine that you want to configure. 4 On the Virtual Machines page, right-click a virtual machine and click Configure Protection.
Site Recovery Manager Administration Guide 6 On the Datastore for Placeholder VMs page, select a datastore group from the list. The datastores listed on this page exist only at the recovery site. None of them are replicated from the protected site. The datastore that you select is used to hold the files that constitute the placeholder virtual machines. These files are not large, so any datastore that is accessible to the recovery site host and cluster can be an appropriate choice.
Chapter 3 Configuring the Protected and Recovery Sites Limitations on Recovery of Snapshots and Linked Clones Array-based replication supports recovering VMware Virtual Consolidated Backup (VCB) snapshots, but it does not support recovering other types of snapshots or virtual machines configured as linked clones. SRM cannot reliably recover virtual machine snapshots that are not created by VCB.
Site Recovery Manager Administration Guide 6 On the Response Times page, specify how long you want the recovery plan to wait for a response from a virtual machine after various recovery plan events, and then click Next. Change Network Settings If the virtual machine does not acquire the expected IP address within the specified interval after a recovery step that changes network settings, an error is reported and the recovery plan proceeds to the next virtual machine.
Chapter 3 Configuring the Protected and Recovery Sites Remove a Recovery Plan You can remove a recovery plan if you no longer need it. Procedure 1 Open a vSphere Client and connect to the vCenter server at the recovery site. Log in as a vSphere administrator. 2 On the vSphere Client Home page, click the Site Recovery icon. 3 In the Recovery Setup area of the Summary window, navigate to the Recovery Plans line, right-click the plan that you want to remove, and select Remove Recovery Plan. VMware, Inc.
Site Recovery Manager Administration Guide 38 VMware, Inc.
Test Recovery, Recovery, and Failback 4 After you have configured SRM at the protected and recovery sites, you can test your recovery plan without affecting services at either site. You can also run a recovery plan and, if necessary, configure the two sites for failback so that you can restore services at the protected site. SRM makes it easy to test a recovery plan. The test does not disrupt replication or any ongoing activities at the protected site.
Site Recovery Manager Administration Guide 5 Click the Recovery Steps tab to monitor the progress of the test and respond to messages. The Recovery Steps tab displays the progress of individual steps. The Recent Tasks area reports the progress of the overall plan. NOTE If the SRM server loses contact with the recovery site vCenter while a recovery plan is being tested or run, the recovery plan fails and displays the message Error: The session is not authenticated.
Chapter 4 Test Recovery, Recovery, and Failback 5 Review the information in the confirmation prompt, and when you are ready to proceed, select I understand that this process cannot be undone and click Run Recovery Plan. 6 To monitor the progress of the recovery and respond to messages, click the Recovery Steps tab. The Recovery Steps tab displays the progress of individual steps. The Recent Tasks area reports the progress of the overall plan.
Site Recovery Manager Administration Guide 3 Reconfigure SRM to Enable Failback to the Protected Site on page 43 Before you can run a failback, you must create the protection groups and recovery plans required to migrate protected inventory from the recovery site back to the protected site. 4 Restore the Original Configuration on page 43 After a failback is complete, you can restore the original configuration so that the protected and recovery sites resume the roles they had before the failover.
Chapter 4 Test Recovery, Recovery, and Failback Reconfigure SRM to Enable Failback to the Protected Site Before you can run a failback, you must create the protection groups and recovery plans required to migrate protected inventory from the recovery site back to the protected site. After you have prepared both sites for failback, reconfigured array replication, and replicated the source devices at the recovery site to their targets at the protected site, you can create the environment needed for failback.
Site Recovery Manager Administration Guide 44 3 Configure the array managers (see “Configure Array Managers,” on page 29). 4 Configure the inventory mappings (see “Configure Inventory Mappings,” on page 31). 5 Create the protection groups (see in “Create Protection Groups,” on page 33). 6 Create the recovery plans (see “Create a Recovery Plan,” on page 35). 7 Test the recovery plan (see “Test a Recovery Plan,” on page 39). VMware, Inc.
Customizing Site Recovery Manager 5 In its default configuration, SRM enables a number of simple recovery scenarios. Advanced users can customize SRM to support a broader range of site recovery requirements. The default protection and recovery capabilities of SRM can be appropriate for sites that have simple configurations or recovery objectives.
Site Recovery Manager Administration Guide 5 To apply the selected role to all child objects of the selected inventory object, select Propagate to Child Objects. 6 To select the user or group for the role, click the Add button. 7 Identify the user or group. 8 a From the Domain drop-down menu, select the domain where the user or group is located. b Either enter a name in the Search text box or select a name from the Name list. c Click Add and then click OK when finished.
Chapter 5 Customizing Site Recovery Manager Virtual machines in all other priority groups are recovered serially per ESX host to enable a group of machines that spans several hosts to recover in parallel. During this type of recovery, machines on a specific ESX host are recovered in the order specified by the list, but the recovery order of the entire list is subject to the assignment of virtual machines to hosts.
Site Recovery Manager Administration Guide 3 Power on the virtual machine and verify that VMware Tools reports an OS heartbeat within the specified period. 4 Run any post-power-on command or message steps. NOTE Post-power-on command steps provide an application-specific way to verify that a recovered virtual machine has all the capabilities that you expect.
Chapter 5 Customizing Site Recovery Manager Table 5-1. Environment Variables Available to All Command Steps Name Value Example VMware_RecoveryName Name of the recovery plan that is executing "Plan A" VMware_RecoveryMode Recovery mode "test" or "recovery" VMware_VC_Host Host name of the vCenter host at the recovery site "vc_hostname.example.
Site Recovery Manager Administration Guide Specify Virtual Machine Recovery Priority By default, all virtual machines in a new recovery plan are members of the normal priority group. Members of this group are recovered in the order that they were created on the protected datastore. You can move a virtual machine to a different priority group or to a different priority within a group. Procedure 1 Open the Recovery Steps page for the plan, as described in “Customize Recovery Plan Steps,” on page 49.
Chapter 5 Customizing Site Recovery Manager Add Commands to a Recovery Plan You can customize a recovery plan to include commands that are executed on the SRM server host at the recovery site when the plan is tested or run. You can add command steps to any part of a recovery plan. When you create a command step to add to a recovery plan, make sure that it takes into account the environment in which it must run. For more information, see “Guidelines for Writing Command Steps,” on page 48.
Site Recovery Manager Administration Guide The customizations you specify are saved as properties of the placeholder virtual machine and then applied to the recovered virtual machine when a recovery plan is run or tested. NOTE If you remove the protection of a virtual machine, all recovery customizations are lost.
Chapter 5 Customizing Site Recovery Manager In an SRM recovery plan that defines three placeholder virtual machines, the generated file might look like this: VM ID,VM Name,Adapter ID,MAC Address,DNS Domain,Net BIOS,Primary WINS,Secondary WINS,IP Address,Subnet Mask,Gateway(s),DNS Server(s),DNS Suffix(es) shdw1,srm1,0,,,,,,,,,, shdw2,srm2,0,,,,,,,,,, shdw3,srm3,0,,,,,,,,,, The file consists of a header row that defines the meaning of each column, and a single row for each placeholder virtual machine found
Site Recovery Manager Administration Guide 5 n To define properties for a specific adapter on a placeholder virtual machine, create a new row that contains that virtual machine’s ID in the VM ID column and the adapter ID (the virtual PCI slot in which the adapter is installed on the placeholder virtual machine) in the Adapter ID column, then specify values for the other columns.
Chapter 5 Customizing Site Recovery Manager Procedure 1 Open a vSphere Client and connect to the vCenter server at the protected site. Log in as a vSphere administrator. 2 On the vSphere Client Home page, click the Site Recovery icon. 3 In the Site Recovery tree view, navigate to the protection group that includes the virtual machine that you want to configure. 4 On the Virtual Machines page, right-click a virtual machine and click Configure Protection.
Site Recovery Manager Administration Guide Repair Placeholder Virtual Machines After a Failed Test Recovery If the vCenter Server at the recovery site becomes inaccessible during a test recovery, some virtual machines in a protection group might lose their protection configuration. Virtual machines in this state have a status of Needs Repair. You can repair these virtual machines to restore protection.
Chapter 5 Customizing Site Recovery Manager 4 Right-click an alarm and click Edit Settings. 5 In the Edit Settings dialog box, click the Actions tab. In the Actions window, click Add to add an action. The default action for every event is Send a notification e-mail. To change this action, click it and select a different action from the drop-down box. For more information about actions, see the vCenter help.
Site Recovery Manager Administration Guide Change Recovery Site Settings Use the Advanced Settings Recovery page to adjust default values for time-outs that occur when you test or run a recovery plan. Several kinds of time-outs can occur during the execution of recovery plan steps. These time-outs cause the plan to pause for a specified interval to give the step time to complete. n Command line timeout – By default, SRM allows 300 seconds for a command step to complete.
Chapter 5 Customizing Site Recovery Manager 4 n To change the interval that SRM waits for each HBA rescan to complete, enter a new value in the SanProvider.hostRescanTimeoutSec text box. n To change the interval between datastore group computations, enter a new value in the SanProvider.minLunGroupComputationInterval text box. Click OK to save your changes and close the Advanced Settings window.
Site Recovery Manager Administration Guide 4 n To change the interval between Remote Site Down alarms, enter a new value in the remoteSiteStatus.panicRepeatDelay field. n To change the number of remote site status checks to try before declaring the check a failure, enter a new value in the remoteSiteStatus.warningDelay field. Click OK to save your changes and close the Advanced Settings window.
Chapter 5 Customizing Site Recovery Manager Create a Nonreplicated Virtual Disk for Paging File Storage You can avoid replication of a virtual machine's Windows paging file by creating a virtual disk on a nonreplicated datastore, configuring Windows to create its paging file on that disk, and configuring a nonreplicated copy of that disk at the recovery site. In the default configuration, Windows creates its paging file on the system disk (typically C:).
Site Recovery Manager Administration Guide d Power off and then power or the virtual machine so that it writes its paging file to the new location on the cloned disk. At this point, the protected virtual machine is writing its paging file to a disk on a nonreplicated datastore at the protected site. Until you specify a recovery site location for this disk, the virtual machine does not have a valid protection configuration.
Troubleshooting SRM 6 If you have problems with storage replication, site pairing, or guest customization, you can try to troubleshoot the problem. To help identify the cause, you might need to collect SRM server or client log files to review or send to VMware Support. Errors encountered during SRM operations are displayed in error dialogs or shown in the Recent Tasks window. Most errors also generate an entry in an SRM log files.
Site Recovery Manager Administration Guide 3 In the Protection Setup area of the SRM Summary window, navigate to the Array Managers line and click Configure. 4 In the Configure Array Mangers wizard, click Next on the Protected Site Array Managers page and then click Next on the Recovery Site Array Managers page. The Review Replicated Datastores page should now display each replicated datastore that contains at least one virtual machine.
Chapter 6 Troubleshooting SRM Cause This error usually occurs when a virtual machine has been recently created but its files have not yet been replicated to the recovery site. For instance, you have created a virtual machine at the protected site, added it to a protection group, and then tested or run a recovery plan that includes the new virtual machine. If the virtual machine files have not yet been replicated to the recovery site, the recovery plan cannot recover the virtual machine.
Site Recovery Manager Administration Guide Collecting SRM Log Files SRM creates several log files that contain information that can help VMware Support diagnose problems. You can use the SRM log collector to simplify log file collection. The SRM server and client generate separate sets of log files. The SRM server log files contain information about the server configuration and messages related to server operations.
Index A alarms, SRM-specific 56 array managers and storage replication adapters 29 replicated device discovery 29 to configure 29 authentication certificate warnings and 14 methods used by Site Recovery Manager 14 C certificate public key 14 to change type 25 to update 25 certificate warning 14 D database backup requirements 23, 25 configuration details 19 Connection Count value 14 Max Connections value 14 Site Recovery Manager 14 to change connection details 19, 25 vCenter 13 datastore protected 8 repli
Site Recovery Manager Administration Guide 68 R S recovery, customize for a virtual machine 51 recovery plan command steps 48 customizing 46 running 11, 40 steps 46 testing 11, 39 time-outs 46 to report IP address mappings used by 52 virtual machine recovery priority 46 recovery priority, virtual machine 46, 50 recovery site configure array managers for 29 configuring 27 host compatibility requirements 7 to designate 27 replication and failback 42 and recovery 11 array-based 8 roles Site Recovery Manger