HP Matrix Operating Environment 7.1 Recovery Management User Guide Abstract The HP Matrix Operating Environment 7.1 Recovery Management User Guide contains information on installation, configuration, testing, and troubleshooting HP Matrix Operating Environment recovery management (Matrix recovery management).
© Copyright 2012 Hewlett-Packard Development Company, L.P. Confidential computer software. Valid license from HP required for possession, use or copying. Consistent with FAR 12.211 and 12.212, Commercial Computer Software, Computer Software Documentation, and Technical Data for Commercial Items are licensed to the U.S. Government under vendor's standard commercial license. Warranty The information contained herein is subject to change without notice.
Contents 1 Matrix recovery management Overview........................................................5 2 Installing and configuring Matrix recovery management..................................8 Installation and configuration overview........................................................................................8 Installation and configuration prerequisites...................................................................................8 Installing and licensing Matrix recovery management........
5 Issues, limitations, and suggested actions.....................................................48 Limitations.............................................................................................................................48 Hyper-V support limitation for bidirectional configuration.........................................................48 No automatic synchronization of configuration between sites...................................................
1 Matrix recovery management Overview Matrix recovery management is a component of the HP Matrix Operating Environment that provides disaster recovery protection for logical servers and for Matrix infrastructure orchestration services. Logical servers and Matrix infrastructure orchestration services (IO services) that are included in a Matrix recovery management configuration are referred to as DR Protected logical servers and IO services.
Figure 1 Recovery Group Sets Features and benefits of Matrix 7.1 recovery management • Provides an automated failover mechanism for DR Protected logical servers, DR Protected IO services, and associated storage. • Provides a disaster recovery solution for logical servers and IO services managed by the HP Matrix Operating Environment. NOTE: Supports DR Protection of IO services with virtual server groups only.
• Includes Recovery Group startup order settings that let you determine which Recovery Groups are recovered first during a site failover. • Includes a Copy feature that makes it easy to create multiple Storage Replication Groups with the same configuration parameters. By reading this HP Matrix Operating Environment 7.1 Recovery Management User Guide, you will gain a better understanding of Matrix recovery management concepts and configuration testing.
2 Installing and configuring Matrix recovery management This chapter contains sections on Matrix recovery management installation prerequisites, networking setup, storage setup, logical server setup, Matrix recovery management configuration, export and import operations, and DR Protection for IO services. IMPORTANT: If you intend to create DR Protected IO services, see “DR protection for IO services” (page 22) before starting the Matrix recovery management installation and configuration process.
Uninstalling Matrix recovery management Use the Windows Add/Remove Programs feature as follows: 1. Select recovery management, then click Remove. 2. Wait until the Matrix recovery management product no longer appears in the list. Setting up Networking It is assumed that networking links are present between the Local Site and the Remote Site.
For information on PINT, see the Portable Images Network Tool (PINT) Linux readme version 1.0.0 available at http://h20000.www2.hp.com/bc/docs/support/SupportManual/ c01726723/c01726723.pdf?jumpid=reg_R1002_USEN. • If the Local Site and corresponding Remote Site managed servers share a common subnet, you must ensure that there is no conflict between MAC addresses assigned by HP Virtual Connect Enterprise Manager (VCEM).
NOTE: In the same way that conflict in the configuration of MAC addresses at the Local and Remote Sites is avoided in “Setting up Networking” (page 9), conflict must also be avoided in the configuration of WWNs, if the WWNs are not private to the respective sites. The same technique using VCEM exclusion ranges is available for array WWN configuration.
For more information, see the following: • ◦ HP P6000 Continuous Access Software documentation available at http:// h20000.www2.hp.com. Cick Manuals, then go to Storage→Storage Software→Storage Replication Software→HP P6000 Continuous Access Software. ◦ HP P6000 Command View Software documentation available at http:// h20000.www2.hp.com. Click Manuals, then go to Storage→Storage Software→Storage Device Management Software→HP P6000 Command View Software.
Command Line Software. Both must be installed on the Central Management Server where Matrix recovery management is installed. For more information, see the following: ◦ HP 3PAR Cluster Extension Software documentation available at http:// h20000.www2.hp.com. Click Manuals, then go to Storage→Storage Software→Storage Replication Software→HP Cluster Extension Software. ◦ HP 3PAR InForm Command Line Software documentation available at http:// h20000.www2.hp.com.
Managing nonintegrated storage with Matrix recovery management If your DR Protected logical servers use a nonintegrated storage system that is supported by Matrix OE, and you want Matrix recovery management to automatically invoke storage failover for the nonintegrated storage using the Matrix recovery management Activate operation, you must: 1.
remote_storage_id= NOTE: Volumes in the Storage Replication Group (identified by srg_name) are replicated between the local storage system (identified by local_storage_id) and the remote storage system (identified by remote_storage_id). • For failoversrg.
/STORAGE and copy all three commands to the newly created directory: • /STORAGE/EMC/validatesms.cmd • /STORAGE/EMC/validatesrg.cmd • /STORAGE/EMC/failoversrg.
In the case of a cluster shared volume or shared cluster disks, the replicated disk on the Remote Site Hyper-V host must be configured with the same cluster resource name that is assigned to the Local Site disk it is replicated from. b. 4. • When creating recovery logical servers, you must specify the datastore name for the logical server. The datastore name selected must be the same as the datastore name for the Local Site logical server.
Figure 2 Matrix recovery management home screen Matrix recovery management user interface tabs • Home Information on the most recent Matrix recovery management Job is displayed at the top of the Matrix recovery management Home screen, including the Latest Job status:, the Job Id:, the Start Time: (if the job is in progress), or the End Time: (if the job has completed).
configuration process and to help ensure that the two sites have synchronized configurations, you can export the Matrix recovery management configuration information to a file at the Local Site, move that file to the Remote Site, and then import the Matrix recovery management configuration information at the Remote Site.
8. 9. Deactivate the recovery logical servers and disable Maintenance mode at the Remote Site — for more information see “Testing Recovery Groups” (page 26) Fail back the Storage Replication Groups to the Local Site, then activate the Local Site logical servers. If there are VM hosted logical servers, use the VMware Virtual Center or the Microsoft Hyper-V Management Console to rescan and refresh virtual machine resources.
NOTE: To manage HP 3PAR remote copy, the encrypted password file for both the Local Site and Remote Site Inserv storage servers must be available on the CMS at each site, and the name of the password file must be the same on the CMS at each site. • Storage Replication Group information associated with activated Recovery Groups in the exportconfig file is imported if the importing site has no Storage Replication Group configuration.
NOTE: • Recovery Groups can be imported one at a time only. You must repeat the import...→Select import file... procedure for each Recovery Group that you import. • If a Recovery Group in the exportconfig file has the same name as a Recovery Group at the importing site, it is not imported.
NOTE: 6. 7. 8. • When you use Matrix recovery management to DR protect VMware ESX based IO services that have been deployed from an IO template using an ICVirt template or a VM template on the vCenter server at the Local Site, an ICVirt or VM template with same name must be available at the Remote Site before you perform a Matrix recovery management import operation to import the site configuration.
dr.properties • On the Local Site and Remote Site CMS specify a name identifying the associated site where IO is running in the dr.properties file, for example: local.site = siteA at the Local Site, and local.site = siteB at the Remote Site. The name in the dr.properties file can be different than the site name configured in Matrix recovery management. • If the service owner domain name and username on the Local Site and Remote Site are not the same, set the owner.username..
Network configuration To allow the same IP addresses for primary and replica IO services, when the subnet is spanned between the Local and Remote Site: • The Local Site and the Remote Site must define the same static IP range and it must contain the IP range of the primary IO service. At the site with the replica IO service, the IO administrator must specify a list or a range of IPs (IP exclusion list) in the hpio.properties file to avoid IP address conflicts.
3 Testing and failover operations This chapter describes Recovery Group testing, planned failovers, and unplanned failovers using the Matrix recovery management Activate and Deactivate operations. Testing Recovery Groups There are two ways to test Recovery Groups: • Using Maintenance Mode to test individual Recovery Groups. • Performing a planned failover to test all Recovery Groups. See “Planned failover” (page 27) for more information.
in the Matrix infrastructure service to activate the IO services in the Recovery group. When the test is complete, gracefully shut down the operating systems, then deactivate the logical servers or IO services. NOTE: If the Matrix recovery management configuration includes Hyper-V logical servers or IO services, bring the cluster disk resource used for logical server or IO service storage offline. 5. 6.
3. 4. • Start Order • Power-Up Delay Click the check-box on the left side of the banner of the Deactivate Recovery Groups at the Local Site window to select all of the Recovery Group Sets at the Local Site for deactivation. Click Deactivate Recovery Groups to start the deactivation operation. A window will appear asking if it is OK to proceed. Click OK and you will be directed to the Jobs tab where you can monitor the progress of the deactivation Job.
loss). If the event is more severe, resulting in the permanent loss of the CMS or managed resources, reconstruction of the site may be necessary. At the site where the site-wide event occurred: 1. Ensure that the DR Protected logical servers are no longer running in order to prevent a split-brain situation. As long as the DR Protected logical servers have stopped running, Matrix recovery management will prevent them from automatically powering up when power is restored.
NOTE: • If the Matrix recovery management configuration has been changed since the failover occurred (for example, a new Recovery Group was created), the sites must be brought into sync by making appropriate configuration changes. The Matrix recovery management Site configuration export and import operations can be used for this purpose. • A successful Activate or Deactivate operation ensures that all of the Recovery Groups within a Recovery Group Set are in the same state (enabled or disabled).
4 Dynamic workload movement with CloudSystem Matrix This chapter explains how you can configure cross-technology logical servers that can be managed with Matrix recovery management. The HP Matrix Operating Environment facilitates the fluid movement of workloads between dissimilar servers within a site and across sites. Workloads can be moved between physical servers and virtual machines and between dissimilar physical servers.
Capabilities and limitations Using the tools and procedures described in this chapter you can: • Configure and manage a logical server that can perform physical to virtual cross-technology movements within the datacenter. • Configure and manage a DR Protected logical server that can be failed over across data centers in a cross-technology movement.
Figure 4 Same LUN number across physical and virtual targets • The target WWN values used to present the Logical Unit must be the same across virtual and physical targets. NOTE: The recovery logical server that provides DR protection at the Remote Site has its own set of target WWN/LUN values that differ from the target WWN/LUN values for the logical server at the Local Site.
Figure 5 ESX host network name Figure 6 Virtual Connect Enterprise Manager network name • • 34 When moving a logical server between physical and virtual servers within a site, the following server IDs are not preserved: ◦ Network MAC addresses ◦ Server/Initiator WWNs (On a virtual machine, the storage adapter is a virtual SCSI controller.
both types of servers. The recovery site can have a physical/virtual combination also, or have only virtual machine hosts. Supported platforms The procedures for enabling movement between physical and virtual servers described in this chapter apply to physical servers, hypervisors, and workload operating systems supported by Matrix recovery management. For more information, see the HP Insight Management 7.1 Support Matrix at http://www.hp.com/go/matrixoe/docs.
i. ii. Copy the executable cp011231.exe to the physical server where the image is currently running. Run cp011231.exe to install PINT and start the PINT service. For more information, see “Configuring and managing portable OS images” (page 38). 2. Create a portability group that includes all potential physical and VM host targets. This step sets up the portability group that defines the list of potential targets for the logical server.
NOTE: When the logical server is first moved to a virtual machine, you may want to add additional tools to the server, for example, VMware tools. In the HP Matrix Operating Environment, the VM configuration created does not include a virtual CD/DVD drive. You can use the VM management console to modify the VM configuration to include a virtual CD/DVD drive. 5. Configure inter-site movement between physical and virtual targets (disaster recovery use case).
NOTE: The Matrix recovery management Site configuration can be set up to preferentially activate the logical server on a physical server at one site and a VM host at the other site. For more information, see “Setting a failover target type preference” (page 46).
PISA is a simple command-line tool that accepts only a few command-line options. It needs to be executed only once after Windows has been installed on a physical server. The changes it makes are persistent and do not need to be repeated or reversed. However, repeatedly running PISA has no negative impact. PISA can also be used to disable the driver used by the virtual machine. The command-line interface for PISA is described below. The options are mutually exclusive.
The HP Matrix Operating Environment provides default portability groups depending on the resources found within your data center. The Default portability groups include: • ESX—All ESX Hypervisors • HYPERV—All Hyper-V Hypervisors • Each Virtual Connect Domain Group—Each VCDG has its own Default portability group. You can also create User Defined portability groups that extend the portability of a logical server to unlike technologies.
Figure 10 Selecting group members and targets Provide a name and optional description for the portability group. The name will be used for defining logical servers. The set of Group Types is selected automatically based on the targets inserted into the portability group. Valid combinations of targets include: • A single Virtual Connect Domain Group (VCDG) • A set of ESX Hypervisors • A set of Hyper-V Hypervisors • A set consisting of a single VCDG plus a set of ESX Hypervisors.
Figure 11 Selecting a portability group To view the portability group for any logical server, click the View movable logical server details icon in Matrix OE visualization as illustrated in Figure 12 (page 42). Figure 12 View movable logical server details icon The details for this logical server are displayed as illustrated in Figure 13 (page 42).
Logical servers can be made portable through techniques described in “Portability groups” (page 39). NOTE: You must determine whether the provisioned operating system within a logical server performs as desired on a variety of platforms. If a logical server has never been active on a platform type, the HP Matrix Operating Environment shows a warning for each target of that type in the Target Selection page during moves and activations. You must determine whether the target is valid.
When defining storage for a portable logical server, you must select SAN Storage Entry. For flexibility and movement between underlying technology types, storage must be presented to the WWNs tied to the Virtual Connect server profile, and storage must also be presented to any ESX VM hosts that are potential targets for the logical server.
Targets for a logical server are selected from that logical server's portability group. The portability group members are then further filtered based on resource availability, including CPU and memory resources as well as network and SAN connectivity. NOTE: Networks in Virtual Connect must be named identically to their corresponding networks (port groups) on ESX Hypervisors. Differences in names prevent the Unlike Move operation from identifying networks with similar connectivity.
Moving between blade types For logical servers with target attributes, the logical server management software can identify more possible targets when moving or activating a server. As with all cross-technology logical servers, you must ensure that the logical server can function appropriately on various platforms. If a particular target is proven to be unsuitable, it is easy to remove that type of target to more accurately describe the logical server's portability.
You must specify the target type preferred for all sites on the CMS at each site: • If you specify Virtual as the target type preferred for a site, all cross-technology logical servers whose Recovery Groups prefer that site are activated on VM hosts during an Activate operation at that site. A physical server is chosen only if no VM hosts are available.
5 Issues, limitations, and suggested actions This chapter lists issues and limitations for this release, categorized as follows: Limitations Limitations of the implemented functions and features of this release Major issues Issues that may significantly affect functionality and usability in this release Minor issues Issues that may be noticeable but do not have a significant impact on functionality or usability Limitations • Only IO services that include virtual servers and on-premise (not cloud) res
ESX configuration setting required for VMFS datastores of Matrix recovery management managed logical servers to be visible at Remote Site Under the following conditions, Matrix recovery management requires a specific ESX configuration setting to retain the signature of a VMFS datastore so it will be visible at the Remote Site: • You have asymmetric HP P6000 Continuous Access Software array models at the Local and Remote Site.
recommends as a best practice that you keep LUN numbers the same for corresponding disks across sites. Suggested actions Assess the impact of these discrepancies on any licensing arrangements in use for the operating system and applications running on DR Protected logical servers.
6 Troubleshooting This chapter provides troubleshooting information in the following categories: • “Configuration troubleshooting” (page 51) • “Configuration error messages” (page 53) • “Warning messages” (page 56) • “Matrix recovery management Job troubleshooting” (page 57) • “Failover error messages” (page 60) • “Matrix recovery management log files” (page 61) • “DR Protected IO serivces troubleshooting” (page 61) Configuration troubleshooting To troubleshoot Matrix recovery management confi
• Unable to add or edit HP P6000 Storage Replication Group Possible causes include: ◦ • Matrix recovery management is unable to obtain Storage Replication Group information from Command View servers to validate the Storage Replication Group information provided by the user. Unable to add or edit HP P9000 Storage Replication Group Possible causes include: • ◦ The Storage Replication Group is not configured to be managed by the RAID manager instances.
• No configuration operation can be run Possible causes include: • ◦ An Activate, Deactivate, or Import operation is in progress. ◦ Another configuration operation may be in progress Unable to import Storage Management Servers as part of an import operation Possible causes include: • ◦ The Storage Management Server was not discovered in the HP Matrix Operating Environment user interface.
Error message Cannot verify the host name specified. Cause The hostname specified for the CMS for either the Local Site or the Remote Site is not locatable in the DNS. Action Verify that a valid DNS entry with a fully qualified domain name exists for each CMS. Error message Cannot create/edit the site information. Cause The hostname specified for the CMS does not include a fully qualified domain name associated with the local CMS.
then go to Storage→Storage Software→Storage Device Management Software→HP P6000 Command View Software. 2. Confirm that the port number specified during Storage Management Server configuration in Matrix recovery management is the same as the WBEM port number configured on the HP P6000 Command View server (for example, 5989). For more information, see the “CIMOM” server configuration section in the HP P6000 Command View Software Installation Guide. 3.
Error message Unable to run Matrix recovery management operations because Matrix recovery management Job is in progress or another Matrix recovery management configuration operation is in progress. Cause If an Activate or Deactivate operation is in progress, no configuration operation is allowed, because the Job is in progress. If a Matrix recovery management configuration operation is in progress, no other Matrix recovery management configuration operations are allowed.
Warning message Warning: Matrix recovery management is quiesced. No new operations will be allowed. Cause Matrix recovery management has been quiesced. All configuration buttons (Create, Edit, Delete, etc…) are disabled. Action Wait for Matrix recovery management to be unquiesced. Warning message Warning: Unable to remove CLX credentials for (these server credentials may not exist in CLX).
an Activate Job. It has an Entity of type site and an Operation of type activate. You will also notice the Failed icon in the Status column indicating that Job 3288 has failed. Figure 21 Jobs screen For a failed Job, click the check box next to the Job Id to get detailed information about the associated Sub Jobs. A site Job contains a Sub Job for each Recovery Group. Similarly, each Recovery Group has Sub Jobs for its Storage Replication Group and logical server, respectively.
Figure 23 Restarting a failed job NOTE: Restarting the Job retries only Sub Jobs that previously failed; servers associated with completed Jobs or Sub Jobs are not impacted. IMPORTANT: If correcting the problem that caused the Job to fail included reconfiguration of logical servers, before you restart the Job, go to the Recovery Groups tab and delete the Recovery Groups that contain the reconfigured logical servers.
• Matrix recovery management job failed because of unlocatable logical server in Matrix OE logical server management. Possible causes include: ◦ • A logical server managed by Matrix recovery management was removed from Matrix OE logical server management before it was unmanaged in Matrix recovery management. Matrix recovery management job failed because an operation failed in Matrix OE logical server management for the logical server.
Matrix recovery management log files There are several log files available with detailed information that you can view to help identify the sources of Matrix recovery management failover or failback problems: • For errors that occur during the initial Matrix recovery management configuration steps, view the mxdomainmgr(0).log file located in the logs directory where HP Systems Insight Manager is installed on the system. • For errors that occur during a failover, check the lsdt.
DR Protected IO services configuration troubleshooting In addition to the configuration issues addressed in this User Guide that are common to both logical servers and IO services, the following configuration issues apply to IO services only: • Failed to get a list of IO services that can be included in a recovery group Possible Causes: • ◦ Matrix infrastructure orchestration Windows service is not running. ◦ There are no IO services that are DR protection enabled.
IO services configuration error messages Error message Unable to get the IO service. Cause The IO service does not exist or Matrix recovery management failed to get the IO service information from the Matrix infrastructure. Action Check the Matrix recovery management and IO log files for more information on the failure. If the IO service does not exist in IO, it is possible that the IO service was removed. If the IO service exists, restart IO and retry the operation.
DR Protected IO services failover troubleshooting In addition to the failover issues addressed in this User Guide that are common to both logical servers and IO services, the following failover issues apply to IO services only: • Failed to activate IO service in a Recovery Group Possible Causes: • ◦ Storage resources are not available. ◦ The IO service is in an invalid state for activation. ◦ The IO service does not exist. ◦ The Matrix infrastructure orchestration Windows service is not running.
7 Support and other resources Information to collect before contacting HP Be sure to have the following information available before you contact HP: • Software product name • Hardware product model number • Operating system type and version • Applicable error message • Third-party hardware or software • Technical support registration number (if applicable) How to contact HP Use the following methods to contact HP technical support: • See the Contact HP Worldwide website: http://www.hp.
Warranty information HP will replace defective delivery media for a period of 90 days from the date of purchase. This warranty applies to all HP Insight Management products. HP authorized resellers For the name of the nearest HP authorized reseller, see the following sources: • In the United States, see the HP U.S. service locator website: http://www.hp.com/service_locator • In other locations, see the Contact HP worldwide website: http://welcome.hp.com/country/us/en/wwcontact.
• HP Matrix Operating Environment 7.1 Recovery Management User Guide Provides information on Matrix recovery management installation, configuration, testing, and troubleshooting. Available at http://www.hp.com/go/matrixoe/docs. • Matrix recovery management white papers Matrix recovery management white papers are available at http://www.hp.com/go/matrixoe/ docs.
Glossary bidirectional failover A Matrix recovery management feature that allows Recovery Group Sets to be activated or deactivated at either the Local Site or the Remote Site. At any point in time there can be activated and deactivated Recovery Group Sets at both sites. In the event of a disaster, or to accommodate site maintenance, all of the Recovery Group Sets in the Matrix recovery management configuration can be deactivated at one site, and activated at the other site.
rehearsal, the Recovery Group and its corresponding logical servers and IO services can be brought back under the control of Matrix recovery management. Matrix infrastructure orchestration services Matrix infrastructure orchestration services (IO services) quickly provision infrastructure to automatically activate physical and virtual servers, storage, and networking from pools of shared resources. More information on Matrix infrastructure orchestration is available at http:// www.hp.
Recovery Group Set A set of Recovery Groups that share the same Preferred and Secondary sites. Recovery Groups cannot be activated or deactivated individually. Instead, all Recovery Groups that share the same Preferred and Secondary site must be activated or deactivated as a set. Recovery Group Sets can be selected for activation or deactivation at the Local site. Recovery Group Start Order An optional number that specifies the order in which a Recovery Group is to be started during a site failover.
VM hosted logical server A logical server running on a virtual machine under the control of a hypervisor.