HP Matrix Operating Environment 7.3 Recovery Management User Guide Abstract The HP Matrix Operating Environment 7.3 Recovery Management User Guide contains information on installation, configuration, testing, and troubleshooting HP Matrix Operating Environment recovery management (Matrix recovery management).
© Copyright 2013 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. The information contained herein is subject to change without notice.
Contents 1 Matrix recovery management Overview........................................................5 2 Installing and configuring Matrix recovery management..................................8 Installation and configuration overview........................................................................................8 Installation and configuration prerequisites...................................................................................8 Installing and licensing Matrix recovery management........
5 Issues, limitations, and suggested actions.....................................................49 Limitations.............................................................................................................................49 Hyper-V support limitation for bidirectional configuration.........................................................49 No automatic synchronization of configuration between sites...................................................
1 Matrix recovery management Overview Matrix recovery management is a component of the HP Matrix Operating Environment that provides disaster recovery protection for logical servers and for Matrix infrastructure orchestration services. Logical servers and Matrix infrastructure orchestration services (IO services) that are included in a Matrix recovery management configuration are referred to as DR Protected logical servers and IO services.
Figure 1 Recovery Group Sets Features and benefits of Matrix 7.3 recovery management • Provides an automated failover mechanism for DR Protected logical servers, DR Protected IO services, and associated storage. • Provides a disaster recovery solution for logical servers and IO services managed by the HP Matrix Operating Environment. NOTE: Supports DR Protection of IO services with virtual server groups only.
• Includes Recovery Group startup order settings that let you determine which Recovery Groups are recovered first during a site failover. • Includes a Copy feature that makes it easy to create multiple Storage Replication Groups with the same configuration parameters. • Supports up to 2,500 disaster recovery-protected logical servers in a two-site configuration for VC hosted and VMware VM’s.
2 Installing and configuring Matrix recovery management This chapter contains sections on Matrix recovery management installation prerequisites, networking setup, storage setup, logical server setup, Matrix recovery management configuration, export and import operations, and DR Protection for IO services. IMPORTANT: If you intend to create DR Protected IO services, see “DR protection for IO services” (page 22) before starting the Matrix recovery management installation and configuration process.
Installing and licensing Matrix recovery management 1. Install the HP Matrix Operating Environment and dependent software on the Central Management Server (CMS) at the Local Site and the Remote Site. 2. Discover the managed infrastructure at each site from HP Systems Insight Manager. 3. Apply the license for Matrix recovery management using the HP Insight managed system setup wizard. For more information, refer to the HP Matrix Operating Environment 7.
• When running on HP Virtual Connect hosted physical targets, the Portable Images Network Tool (PINT) must be used to prepare the server image to execute on targets with different network interface configurations and MAC addresses. To use PINT, the Local and Remote Sites must be on the same network, and the OS image must be a Linux version that is supported by Matrix recovery management.
NOTE: 2. 3. • Each Recovery Group has a single Storage Replication Group that is used by the logical servers in that Recovery Group only. All boot and data LUNs used by these logical servers must be included in the same Storage Replication Group. • A Storage Replication Group is a set of storage LUNs on a particular disk array that are replicated with write order preserved.
vDisks when the failed site recovers. If a failure occurs in the middle of this full copy operation, the data on the new destination vDisks could be corrupted. To protect the new destination vDisks, you must enable the HP P6000 Command View auto-suspend setting to prevent an automatic full copy operation from occurring. To protect the new destination vDisks, you must back up the data on them before you manually run a full copy operation.
1. 2. 3. 4. Discover the Storage Management Server with the changed password. Go to the Matrix recovery management user interface Storage Management Servers tab. Select the Storage Management Server that has the changed password, and click Edit. Select the Refresh SIM Password box and click Save.
are invoked. The User Defined storage adapter specification for nonintegrated storage defines commands to: • Validate Storage Management Server information when Storage Management Servers for nonintegrated storage are configured using the Matrix recovery management GUI. • Validate Storage Replication Group information when Storage Replication Groups that use nonintegrated storage are configured using the Matrix recovery management GUI.
Command-line arguments • For validatesms.cmd: sms_name= sms_username= • For validatesrg.
If your implementation of the storage adapter commands requires passwords to manage storage replication, the storage adapter command implementation must handle passwords securely. It is your responsibility to encrypt/decrypt passwords while saving and retrieving them. Setting up Local Site logical servers The following conditions must be met before you can configure a logical server for DR protection. 1. You must ensure that the logical server is associated with SAN based storage. 2.
b. 4. • The names of recovery logical server storage entries must be the same as the names of the logical server storage entries on the Local Site. • The names of recovery logical server storage pool entries must be the same as the names of the logical server storage pool entries on the Local Site. Refresh Virtual Machine resources using the Logical Servers Refresh operation in the Tools menu of the Visualization tab in Matrix OE visualization.
Figure 2 Matrix recovery management home screen Matrix recovery management user interface tabs • Home Information on the most recent Matrix recovery management Job is displayed at the top of the Matrix recovery management Home screen, including the Latest Job status:, the Job Id:, the Start Time: (if the job is in progress), or the End Time: (if the job has completed).
configuration process and to help ensure that the two sites have synchronized configurations, you can export the Matrix recovery management configuration information to a file at the Local Site, move that file to the Remote Site, and then import the Matrix recovery management configuration information at the Remote Site.
8. 9. Deactivate the recovery logical servers and disable Maintenance mode at the Remote Site — for more information see “Testing Recovery Groups” (page 27) Fail back the Storage Replication Groups to the Local Site, then activate the Local Site logical servers. If there are VM-hosted logical servers, use the VMware Virtual Center or the Microsoft Hyper-V Management Console to rescan and refresh virtual machine resources.
NOTE: To manage HP 3PAR remote copy, the encrypted password file for both the Local Site and Remote Site Inserv storage servers must be available on the CMS at each site, and the name of the password file must be the same on the CMS at each site. • Storage Replication Group information associated with activated Recovery Groups in the exportconfig file is imported if the importing site has no Storage Replication Group configuration.
NOTE: • Recovery Groups can be imported one at a time only. You must repeat the import...→Select import file... procedure for each Recovery Group that you import. • If a Recovery Group in the exportconfig file has the same name as a Recovery Group at the importing site, it is not imported.
NOTE: 6. 7. 8. 9. • When you use Matrix recovery management to DR protect VMware ESX based IO services that have been deployed from an IO template using an ICVirt template or a VM template on the vCenter server at the Local Site, an ICVirt or VM template with same name must be available at the Remote Site before you perform a Matrix recovery management import operation to import the site configuration.
NOTE: If one datastore is specified in volume.dr.list, the DR Protected IO services are provisioned on the datastore specified. If multiple datastores are specified in volume.dr.list, the DR Protected IO services are provisioned on the datastore in volume.dr.list that is both available for that server pool and also has the most free disk space. If multiple datastores are specified in volume.dr.list, and the IO template specifies one of the datastores in volume.dr.
federated.siteA. = federated.siteB. For example, if the CMS name at the local site was hostA.test.net and at the remote site it was hostB.test.net, the mapping line would be: federated.siteA.hostA.test.net = federated.siteB.hostB.test.net NOTE: Use this mapping whenever federated.io is set to true, even if federation is not used. For more information, see the HP Matrix Operating Environment 7.
NOTE: Matrix recovery management does not perform DNS updates or update the IP configuration of logical servers associated with IO services during a failover operation. Your Network Administrator is responsible for making the necessary modifications to ensure that network services are available if you configure a logical server to use a different IP address or subnet at each site in the Matrix recovery management configuration.
3 Testing and failover operations This chapter describes Recovery Group testing, planned failovers, and unplanned failovers using the Matrix recovery management Activate and Deactivate operations. Testing Recovery Groups There are two ways to test Recovery Groups: • Using Maintenance Mode to test individual Recovery Groups.
4. Place the Recovery Group into Maintenance Mode at the Remote Site using the Enable Maintenance Mode button in the Matrix recovery management Recovery Groups tab. For logical servers, use the Logical Servers Activate operation in the Tools menu of the Visualization tab in Matrix OE visualization to activate the logical servers in the Recovery Group at the Remote Site. Depending on the type of logical server, the activation may be on VC blades, VM hosts, or both.
1. 2. Shut down the applications and operating system on each Matrix recovery management DR Protected logical server and each server associated with DR Protected IO services. Click on the Deactivate... button and the Deactivate Recovery Groups at the Local Site window will appear. For more information about the Recovery Groups contained in a Recovery Group Set, select the Recovery Group Set and click View Recovery Group.
3. 4. Select each Recovery Group Set that you want to activate or click the check-box on the left side of the banner of the Activate Recovery Groups at the Local Site window to select all of the Recovery Group Sets for activation. Click Activate Recovery Groups to start the activation operation. A window will appear asking if it is OK to proceed. Click OK and you will be directed to the Jobs tab where you can monitor the progress of the activation Job.
3. 4. • Start Order • Power-Up Delay Select each Recovery Group Set that you want to activate at the recovery site. The objective is for all Recovery Group Sets that were previously activated at the site where the site-wide event occurred to be activated at the recovery site. Click Activate Recovery Groups to start the activation operation. A window will appear asking if it is OK to proceed. Click OK and you will be directed to the Jobs tab where you can monitor the progress of the activation Job.
4 Dynamic workload movement with CloudSystem Matrix This chapter explains how you can configure cross-technology logical servers that can be managed with Matrix recovery management. The HP Matrix Operating Environment facilitates the fluid movement of workloads between dissimilar servers within a site and across sites. Workloads can be moved between physical servers and virtual machines and between dissimilar physical servers.
Capabilities and limitations Using the tools and procedures described in this chapter you can: • Configure and manage a logical server that can perform physical to virtual cross-technology movements within the datacenter. • Configure and manage a DR Protected logical server that can be failed over across data centers in a cross-technology movement.
Figure 4 Same LUN number across physical and virtual targets • The target WWN values used to present the Logical Unit must be the same across virtual and physical targets. NOTE: The recovery logical server that provides DR protection at the Remote Site has its own set of target WWN/LUN values that differ from the target WWN/LUN values for the logical server at the Local Site.
Figure 5 ESX host network name Figure 6 Virtual Connect Enterprise Manager network name • • When moving a logical server between physical and virtual servers within a site, the following server IDs are not preserved: ◦ Network MAC addresses ◦ Server/Initiator WWNs (On a virtual machine, the storage adapter is a virtual SCSI controller.
both types of servers. The recovery site can have a physical/virtual combination also, or have only virtual machine hosts. Supported platforms The procedures for enabling movement between physical and virtual servers described in this chapter apply to physical servers, hypervisors, and workload operating systems supported by Matrix recovery management. For more information, see the HP Insight Management 7.3 Update 1 Support Matrix at Enterprise Information Library.
i. ii. Copy the executable cp011231.exe to the physical server where the image is currently running. Run cp011231.exe to install PINT and start the PINT service. For more information, see “Configuring and managing portable OS images” (page 39). 2. Create a portability group that includes all potential physical and VM host targets. This step sets up the portability group that defines the list of potential targets for the logical server.
NOTE: When the logical server is first moved to a virtual machine, you may want to add additional tools to the server, for example, VMware tools. In the HP Matrix Operating Environment, the VM configuration created does not include a virtual CD/DVD drive. You can use the VM management console to modify the VM configuration to include a virtual CD/DVD drive. 5. Configure inter-site movement between physical and virtual targets (disaster recovery use case).
NOTE: The Matrix recovery management Site configuration can be set up to preferentially activate the logical server on a physical server at one site and a VM host at the other site. For more information, see “Setting a failover target type preference” (page 47).
The command-line interface for PISA is described below. The options are mutually exclusive. PISA runs on supported versions of Windows only, and it requires that the user be a member of the Administrator user group. Usage: hppisa -h, -?, -help Show this information -e, -enable Enable the LSI driver -d, -disable Disable the LSI driver After these changes are made, the OS image can be moved back and forth between physical servers and virtual machines.
The HP Matrix Operating Environment provides default portability groups depending on the resources found within your data center. The Default portability groups include: • ESX—All ESX Hypervisors • HYPERV—All Hyper-V Hypervisors • Each Virtual Connect Domain Group—Each VCDG has its own Default portability group. You can also create User Defined portability groups that extend the portability of a logical server to unlike technologies.
Figure 10 Selecting group members and targets Provide a name and optional description for the portability group. The name will be used for defining logical servers. The set of Group Types is selected automatically based on the targets inserted into the portability group. Valid combinations of targets include: • A single Virtual Connect Domain Group (VCDG) • A set of ESX Hypervisors • A set of Hyper-V Hypervisors • A set consisting of a single VCDG plus a set of ESX Hypervisors.
Figure 11 Selecting a portability group To view the portability group for any logical server, click the View movable logical server details icon in Matrix OE visualization as illustrated in Figure 12 (page 43). Figure 12 View movable logical server details icon The details for this logical server are displayed as illustrated in Figure 13 (page 43).
Logical servers can be made portable through techniques described in “Portability groups” (page 40). NOTE: You must determine whether the provisioned operating system within a logical server performs as desired on a variety of platforms. If a logical server has never been active on a platform type, the HP Matrix Operating Environment shows a warning for each target of that type in the Target Selection page during moves and activations. You must determine whether the target is valid.
When defining storage for a portable logical server, you must select SAN Storage Entry. For flexibility and movement between underlying technology types, storage must be presented to the WWNs tied to the Virtual Connect server profile, and storage must also be presented to any ESX VM hosts that are potential targets for the logical server.
Targets for a logical server are selected from that logical server's portability group. The portability group members are then further filtered based on resource availability, including CPU and memory resources as well as network and SAN connectivity. NOTE: Networks in Virtual Connect must be named identically to their corresponding networks (port groups) on ESX Hypervisors. Differences in names prevent the Unlike Move operation from identifying networks with similar connectivity.
Moving between blade types For logical servers with target attributes, the logical server management software can identify more possible targets when moving or activating a server. As with all cross-technology logical servers, you must ensure that the logical server can function appropriately on various platforms. If a particular target is proven to be unsuitable, it is easy to remove that type of target to more accurately describe the logical server's portability.
You must specify the target type preferred for all sites on the CMS at each site: • If you specify Virtual as the target type preferred for a site, all cross-technology logical servers whose Recovery Groups prefer that site are activated on VM hosts during an Activate operation at that site. A physical server is chosen only if no VM hosts are available.
5 Issues, limitations, and suggested actions This chapter lists issues and limitations for this release, categorized as follows: Limitations Limitations of the implemented functions and features of this release Major issues Issues that may significantly affect functionality and usability in this release Minor issues Issues that may be noticeable but do not have a significant impact on functionality or usability Limitations • Only IO services that include virtual servers and on-premise (not cloud) res
ESX configuration setting required for VMFS datastores of Matrix recovery management managed logical servers to be visible at Remote Site Under the following conditions, Matrix recovery management requires a specific ESX configuration setting to retain the signature of a VMFS datastore so it will be visible at the Remote Site: • You have asymmetric HP P6000 Continuous Access Software array models at the Local and Remote Site.
recommends as a best practice that you keep LUN numbers the same for corresponding disks across sites. Suggested actions Assess the impact of these discrepancies on any licensing arrangements in use for the operating system and applications running on DR Protected logical servers.
6 Troubleshooting This chapter provides troubleshooting information in the following categories: • “Configuration troubleshooting” (page 52) • “Configuration error messages” (page 54) • “Warning messages” (page 57) • “Matrix recovery management Job troubleshooting” (page 58) • “Failover error messages” (page 61) • “Matrix recovery management log files” (page 62) • “DR Protected IO serivces troubleshooting” (page 62) Configuration troubleshooting To troubleshoot Matrix recovery management confi
• Unable to add or edit HP P6000 Storage Replication Group Possible causes include: ◦ • Matrix recovery management is unable to obtain Storage Replication Group information from Command View servers to validate the Storage Replication Group information provided by the user. Unable to add or edit HP P9000 Storage Replication Group Possible causes include: • ◦ The Storage Replication Group is not configured to be managed by the RAID manager instances.
• No configuration operation can be run Possible causes include: • ◦ An Activate, Deactivate, or Import operation is in progress. ◦ Another configuration operation may be in progress Unable to import Storage Management Servers as part of an import operation Possible causes include: • ◦ The Storage Management Server was not discovered in the HP Matrix Operating Environment user interface.
Error message Cannot verify the host name specified. Cause The hostname specified for the CMS for either the Local Site or the Remote Site is not locatable in the DNS. Action Verify that a valid DNS entry with a fully qualified domain name exists for each CMS. Error message Cannot create/edit the site information. Cause The hostname specified for the CMS does not include a fully qualified domain name associated with the local CMS.
then go to Storage→Storage Software→Storage Device Management Software→HP P6000 Command View Software. 2. Confirm that the port number specified during Storage Management Server configuration in Matrix recovery management is the same as the WBEM port number configured on the HP P6000 Command View server (for example, 5989). For more information, see the “CIMOM” server configuration section in the HP P6000 Command View Software Installation Guide. 3.
Error message Unable to run Matrix recovery management operations because Matrix recovery management Job is in progress or another Matrix recovery management configuration operation is in progress. Cause If an Activate or Deactivate operation is in progress, no configuration operation is allowed, because the Job is in progress. If a Matrix recovery management configuration operation is in progress, no other Matrix recovery management configuration operations are allowed.
Warning message Warning: Matrix recovery management is quiesced. No new operations will be allowed. Cause Matrix recovery management has been quiesced. All configuration buttons (Create, Edit, Delete, etc…) are disabled. Action Wait for Matrix recovery management to be unquiesced. Warning message Warning: Unable to remove CLX credentials for (these server credentials may not exist in CLX).
an Activate Job. It has an Entity of type site and an Operation of type activate. You will also notice the Failed icon in the Status column indicating that Job 3288 has failed. Figure 21 Jobs screen For a failed Job, click the check box next to the Job Id to get detailed information about the associated Sub Jobs. A site Job contains a Sub Job for each Recovery Group. Similarly, each Recovery Group has Sub Jobs for its Storage Replication Group and logical server, respectively.
Figure 23 Restarting a failed job NOTE: Restarting the Job retries only Sub Jobs that previously failed; servers associated with completed Jobs or Sub Jobs are not impacted. IMPORTANT: If correcting the problem that caused the Job to fail included reconfiguration of logical servers, before you restart the Job, go to the Recovery Groups tab and delete the Recovery Groups that contain the reconfigured logical servers.
• Matrix recovery management job failed because of unlocatable logical server in Matrix OE logical server management. Possible causes include: ◦ • A logical server managed by Matrix recovery management was removed from Matrix OE logical server management before it was unmanaged in Matrix recovery management. Matrix recovery management job failed because an operation failed in Matrix OE logical server management for the logical server.
Matrix recovery management log files There are several log files available with detailed information that you can view to help identify the sources of Matrix recovery management failover or failback problems: • For errors that occur during the initial Matrix recovery management configuration steps, view the mxdomainmgr(0).log file located in the logs directory where HP Systems Insight Manager is installed on the system. • For errors that occur during a failover, check the lsdt.
DR Protected IO services configuration troubleshooting In addition to the configuration issues addressed in this User Guide that are common to both logical servers and IO services, the following configuration issues apply to IO services only: • Failed to get a list of IO services that can be included in a recovery group Possible Causes: • ◦ Matrix infrastructure orchestration Windows service is not running. ◦ There are no IO services that are DR protection enabled.
IO services configuration error messages 64 Error message Unable to get the IO service. Cause The IO service does not exist or Matrix recovery management failed to get the IO service information from the Matrix infrastructure. Action Check the Matrix recovery management and IO log files for more information on the failure. If the IO service does not exist in IO, it is possible that the IO service was removed. If the IO service exists, restart IO and retry the operation.
DR Protected IO services failover troubleshooting In addition to the failover issues addressed in this User Guide that are common to both logical servers and IO services, the following failover issues apply to IO services only: • Failed to activate IO service in a Recovery Group Possible Causes: • ◦ Storage resources are not available. ◦ The IO service is in an invalid state for activation. ◦ The IO service does not exist. ◦ The Matrix infrastructure orchestration Windows service is not running.
the issue. After the reboot is complete, check the cluster disks that are visible in the Microsoft Failover Cluster Manager UI and restart the Matrix RM activate operation. • Uninstall the ghosted pseudo devices that are left behind as a result of storage failover to remote site, followed by a rescan for storage. These pseudo devices can be seen in Device Manager’s disk drives (ensure that the hidden devices are seen only when View option is selected.
7 Support and other resources Information to collect before contacting HP Be sure to have the following information available before you contact HP: • Software product name • Hardware product model number • Operating system type and version • Applicable error message • Third-party hardware or software • Technical support registration number (if applicable) How to contact HP Use the following methods to contact HP technical support: • In the United States, see the Customer Service / Contact HP U
With this service, Insight Management customers benefit from expedited problem resolution as well as proactive notification and delivery of software updates. For more information about this service, see the following website: http://www.hp.com/services/insight. Registration for this service takes place following online redemption of the license certificate.
Matrix recovery management documentation For more information on Matrix recovery management, see the following sources: • HP Insight Management 7.3 Update 1 Support Matrix Provides Matrix recovery management support information along with other HP Insight hardware, software, and firmware support information. Available at Enterprise Information Library. • HP Matrix Operating Environment 7.
WARNING An alert that calls attention to important information that, if not understood or followed, results in personal injury. CAUTION An alert that calls attention to important information that, if not understood or followed, results in data loss, data corruption, or damage to hardware or software. IMPORTANT An alert that calls attention to essential information. NOTE An alert that contains additional or supplementary information. TIP An alert that provides helpful information.
A Recover the CSV from online pending state If you perform an activate operation at the remote site without taking the CSV offline at the local site, you might see the following symptoms at the local site: • No storage information is available when navigating to the storage view in the Windows Failover Cluster Management tool. • The CSV is in the online pending state. To recover from these symptoms, perform the following steps: 1.
B Documentation feeback Documentation feeback HP is committed to providing documentation that meets your needs. To help us improve the documentation, send any error, suggestions, or comments to Documentation Feedback (docsfeedback@hp.com). Include the document title and part number, version number, or the URL when submitting your feedback.
Glossary bidirectional failover A Matrix recovery management feature that allows Recovery Group Sets to be activated or deactivated at either the Local Site or the Remote Site. At any point in time there can be activated and deactivated Recovery Group Sets at both sites. In the event of a disaster, or to accommodate site maintenance, all of the Recovery Group Sets in the Matrix recovery management configuration can be deactivated at one site, and activated at the other site.
rehearsal, the Recovery Group and its corresponding logical servers and IO services can be brought back under the control of Matrix recovery management. Matrix infrastructure orchestration services Matrix infrastructure orchestration services (IO services) quickly provision infrastructure to automatically activate physical and virtual servers, storage, and networking from pools of shared resources. More information on Matrix infrastructure orchestration is available at http:// www.hp.
Recovery Group Set A set of Recovery Groups that share the same Preferred and Secondary sites. Recovery Groups cannot be activated or deactivated individually. Instead, all Recovery Groups that share the same Preferred and Secondary site must be activated or deactivated as a set. Recovery Group Sets can be selected for activation or deactivation at the Local site. Recovery Group Start Order An optional number that specifies the order in which a Recovery Group is to be started during a site failover.
VM-hosted logical server 76 Glossary A logical server running on a virtual machine under the control of a hypervisor.