HP 3PAR Storage Replication Adapter 5.0 for VMware vCenter Site Recovery Manager Troubleshooting Guide Abstract This document provides troubleshooting and workflow information for the HP 3PAR Storage Replication Adapter for VMware vCenter Site Recovery Manager.
© Copyright 2011 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The only warranties for HP products and services are set forth in the express warranty statements accompanying such products and services. Nothing herein should be construed as constituting an additional warranty. HP shall not be liable for technical or editorial errors or omissions contained herein.
Contents 1 Introduction...............................................................................................5 About the HP 3PAR Storage Replication Adapter..........................................................................5 System requirements..................................................................................................................5 Turning on logging...................................................................................................................
Troubleshooting checklist.....................................................................................................53 Workflow chart..................................................................................................................55 restoreReplication...................................................................................................................57 Troubleshooting checklist...........................................................................................
1 Introduction About the HP 3PAR Storage Replication Adapter The HP 3PAR VMware vCenter Storage Replication Adapter (SRA) interacts with array management systems to discover arrays and replicated LUNs. These interactions are provided by various scripts installed with VMware vCenter Site Recovery Manager (SRM). HP 3PAR SRA creates software modules with predefined interfaces, which provide SRM with replication information, and executes needed commands using XML.
2 Workflows and Corresponding Log Messages HP 3PAR Storage Replication Adpater (SRA) supports the following 20 commands, which are discussed in the proceeding sections: • queryInfo • queryStrings • queryErrorDefinitions • queryCapabilities • queryConnectionParameters • discoverArrays • discoverDevices • checkTestFailoverStart • testFailoverStart • testFailoverStop • checkFailover • perpareFailover • failover • perpareReverseReplication • reverseReplication • perpareRestoreRepl
Troubleshooting checklist Table 1 queryInfo Troubleshooting Checklist Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Check log file path. Missing log file path information in the input XML. [2010-05-26 14:29:32.966 'TPDSrm.arrayMgm.AddGlobalWarning' 3PAR_2014 warning (id=1052)] Warning. Unable to access the specified log folder.
Workflow chart 8 Workflows and Corresponding Log Messages
queryString The queryString command returns translation of the strings for a given locale in XML format. The following information is requested by SRM: • UTF-8 encoded string localized to the specified locale • Translation of the string identified by the specified ID • Command progress update Troubleshooting checklist Table 2 queryString Troubleshooting Checklist Problem Area Problem Description Example of Key Log Message Check log file path. Missing log file path [2010-05-26 14:29:32.
Workflow chart 10 Workflows and Corresponding Log Messages
queryErrorDefinitions The queryErrorDefinitions command returns pre-defined array-specific error and warning descriptions in XML format. The following information is requested by SRM: • Error and warning definition should contain code, description, and fix hint for each set • Command progress update Troubleshooting checklist Table 3 queryErrorDefinitions Troubleshooting Checklist Problem Area Problem Description Example of Key Log Message Check log file path.
Workflow Chart 12 Workflows and Corresponding Log Messages
queryCapabilities The queryCapabilities command returns supported versions of the array replication software, supported array models, and supported SRM commands.
Workflow chart 14 Workflows and Corresponding Log Messages
queryConnectionParameters The queryConnectionParameters command returns a list of parameters required to establish connection with the array management system. The following information is requested by SRM: • Group of parameters required to establish a connection to the HP 3PAR Storage System • Localized string representing title, address information, username, password, etc.
Worflow chart 16 Workflows and Corresponding Log Messages
discoverArrays The discoverArrays command returns the following information about storage arrays configuration for replication: • Unique identifier for the storage array • User-friendly name of the storage array • Array model • Array vendor • List of replication peers Troubleshooting checklist Table 6 discoverArrays Troubleshooting Checklist Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Check log file path. Missing log file path [2010-05-26 14:29:32.
Workflow chart 18 Workflows and Corresponding Log Messages
discoverDevices The discoverDevices command returns devices on the specified storage array configured for replication with the specified target array.
Troubleshooting checklist Table 7 discoverDevices Troubleshooting Checklist Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Check log file path. Missing log file path information in the input XML. [2010-05-26 14:29:32.966 'TPDSrm.arrayMgm.AddGlobalWarning' 3PAR_2014 warning (id=1052)] Warning. Unable to access the specified log folder.
Table 7 discoverDevices Troubleshooting Checklist (continued) Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Get Virtual Volume information on the HP 3PAR Storage System. Missing expected output content from storage. [2010-05-27 10:58:47.031 'TPDSrm.discoverDevices.Run' 3PAR_1018 error (id=5432)] Error. Unable to get detail virtual volume information. Additional information: . This error indicates that storage array returns empty output for the command issued.
Workflow chart 22 Workflows and Corresponding Log Messages
discoverDevices 23
checkTestFailoverStart The checkTestFailoverStart command validates environment and target devices before test failover. This command is expected to perform various configuration and runtime state checks on the specified target devices and identify problems which will prevent successful test. SRM expects the HP 3PAR SRA to perform the following checks: • Initial data synchronization has completed and there is a copy of the production data ready for test.
Table 8 checkTestFailoverStart Troubleshooting Checklist (continued) Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Check Remote Copy group status Invalid Remote Copy group. [2010-05-27 16:33:37.734 'TPDSrm.checkTestFailoverStart.Run' 3PAR_1022 error (id=1580)] Error. The specified group does not exist in the storage array. This error indicates that the requested group for checking is not available in the storage array.
Workflow chart 26 Workflows and Corresponding Log Messages
testFailoverStart The testFailoverStart command creates writable temporary copies of the requested replication targets and presents these copies to the requested hosts. Since HP 3PAR's Remote Copy group is not mapped to a specific host in the array level, HP 3PAR SRA will support the dynamic access restriction, which indicates that SRM will specify the set of ESX hosts that require access to failed-over devices. Each host is described as an initiator with a WWN and type (Fibre Channel or ISCSI).
Table 9 testFailoverStart Troubleshooting Checklist (continued) Problem Area Example of Key Log Message Troubleshooting Tips required license on the connected HP 3PAR Storage System. to have both Remote Copy Software and Virtual Copy Software licenses. Get Virtual Volume information on the HP 3PAR Storage System Missing expected output content from storage. [2010-05-27 10:58:47.031 'TPDSrm. checkTestFailoverStart.Run' 3PAR_1018 error (id=5432)] Error.
Worflow chart testFailoverStart 29
Workflows and Corresponding Log Messages
testFailoverStart 31
testFailoverStop The testFailoverStop command deletes the temporary copies created by testFailoverStart command. Once HP 3PAR SRA receives this command, it should remove all test snapshots pertaining to the specified group.
Table 10 testFailoverStop Troubleshooting Checklist (continued) Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Get host exposure mapping list. Missing expected output content from storage [2010-05-27 10:58:47.031 'TPDSrm. testFailoverStart.Run' 3PAR_1032 error (id=5432)] Error. Unable to get detail InServ host information. Additional information: {xxxx} This error indicates that storage array returns empty output for the command issued.
Workflow chart 34 Workflows and Corresponding Log Messages
checkFailover The checkFailover command validates environment and target devices before failover. This command is expected to perform various configuration and runtime state checks on the specified target devices and identify problems which will prevent successful failover.
Table 11 checkFailover Troubleshooting Checklist (continued) 36 Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Check Remote Copy group status. Invalid Remote Copy group. [2010-05-27 16:33:37.734 'TPDSrm. testFailoverStart.Run' 3PAR_1022 error (id=1580)] Error. The specified group does not exist in the storage array. This error indicates that the requested group for checking is not available in the storage array.
Workflow chart checkFailover 37
prepareFailover The prepareFailover command is issued at the protected site before failover to make the source device read-only and takes a snapshot of these devices in anticipation of a failover.
Table 12 prepareFailover Troubleshooting Checklist (continued) Problem Area Problem Description Example of Key Log Message Troubleshooting Tips administrator and validate the finding. Stop Remote Copy group Invalid volume state. [2010-05-27 16:48:24.953 'TPDSrm. testFailoverStart.Run' 3PAR_1021 error (id=860)] Error. The secondary volumes are not in Synced state. This error indicates that the volumes participated in the Remote Copy groups are not in sync.
Workflow chart 40 Workflows and Corresponding Log Messages
prepareFailover 41
failover The failover command stops replication for the requested replication targets, makes writable devices from these targets, and presents these devices to requested hosts. SRM expects the HP 3PAR SRA to perform the following: • The response from failover must contain the same set of standalone devices (not supported by HP 3PAR SRA) and consistency groups as specified in the request • It is allowed that different devices within the same consistency group request access to different sets of hosts.
Table 13 failover Troubleshooting Checklist (continued) Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Check Remote Copy group status. Invalid Remote Copy group. [2010-05-27 16:33:37.734 'TPDSrm. testFailoverStart.Run' 3PAR_1022 error (id=1580)] Error. The specified group does not exist in the storage array. This error indicates that the requested group for checking is not available in the storage array.
Workflow chart 44 Workflows and Corresponding Log Messages
failover 45
prepareReverseReplication The prepareReverseReplication command initiates the replication in reverse. After this command, the TargetGroup becomes the ConsistencyGroup and the ConsistencyGroup becomes the TargetGroup. This command is executed at the replication source.
Table 14 prepareReverseReplication Troubleshooting Checklist (continued) Problem Area Problem Description Example of Key Log Message Troubleshooting Tips storage array for more information. Get Virtual Volume information on the HP 3PAR Storage System Missing expected output content from storage. [2010-05-27 10:58:47.031 'TPDSrm. checkTestFailoverStart.Run' 3PAR_1018 error (id=5432)] Error. Unable to get detail virtual volume information.
Workflow chart 48 Workflows and Corresponding Log Messages
reverseReplication The reverseReplication command initiates the replication in reverse. After this command, the TargetGroup becomes the ConsistencyGroup and the ConsistencyGroup becomes the TargetGroup. This command is executed at the replication source.
Table 15 reverseReplication Troubleshooting Checklist (continued) 50 Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Get Virtual Volume information on the HP 3PAR Storage System Missing expected output content from storage. [2010-05-27 10:58:47.031 'TPDSrm. checkTestFailoverStart.Run' 3PAR_1018 error (id=5432)] Error. Unable to get detail virtual volume information.
Workflow chart reverseReplication 51
Workflows and Corresponding Log Messages
prepareRestoreReplication The prepareRestoreReplication command is issued at the recovery site before restoreReplication to prepare for upcoming restore process. This is meant to be used for restoration after a disruptive failover test. SRM expects the HP 3PAR SRA to perform the following: • The response from prepareRestoreReplication must contain the same set of standalone devices (not supported by HP 3PAR SRA) and consistency groups as specified in the request.
Table 16 prepareRestoreReplication Troubleshooting Checklist (continued) Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Get Virtual Volume information on the HP 3PAR Storage System Missing expected output content from storage. [2010-05-27 10:58:47.031 'TPDSrm. checkTestFailoverStart.Run' 3PAR_1018 error (id=5432)] Error. Unable to get detail virtual volume information.
Workflow chart prepareRestoreReplication 55
Workflows and Corresponding Log Messages
restoreReplication The restoreReplication command is issued at the protected site after prepareRestoreReplication, and it discards any changes made to failover devices and makes the protected site the primary site again. This is meant to be used for restoration after a disruptive failover test. Another use case for this operation is if failover fails, users can use restore replication to ensure that the original protected site is writable for normal operation.
Table 17 restoreReplication Troubleshooting Checklist (continued) Problem Area Problem Description Example of Key Log Message Troubleshooting Tips storage array for more information. 58 Get Virtual Volume information on the HP 3PAR Storage System Missing expected output content from storage. [2010-05-27 10:58:47.031 'TPDSrm. checkTestFailoverStart.Run' 3PAR_1018 error (id=5432)] Error. Unable to get detail virtual volume information.
Workflow chart restoreReplication 59
Workflows and Corresponding Log Messages
syncOnce The syncOnce command initiates the immediate replication of the specified devices and consistency groups. syncOnce must return as soon as possible without waiting for replication to complete. After executing syncOnce, SRM executes periodic querySyncStatus commands to check replication progress. HP 3PAR SRA is using the push model in this case where replication is initiated at the source. The replication operation status is returned to the caller via an XML file.
Table 18 syncOnce Troubleshooting Checklist (continued) 62 Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Get Virtual Volume information on the HP 3PAR Storage System Missing expected output content from storage. [2010-05-27 10:58:47.031 'TPDSrm. checkTestFailoverStart.Run' 3PAR_1018 error (id=5432)] Error. Unable to get detail virtual volume information.
Workflow chart syncOnce 63
Workflows and Corresponding Log Messages
querySyncStatus The querySyncStatus command returns the status of replication sessions initiated with syncOnce command. The synchronization status of the specified Remote Copy Group for the specified target server is returned to SRM: Troubleshooting checklist Table 19 querySyncStatus Troubleshooting Checklist Problem Area Problem Description Example of Key Log Message Troubleshooting Tips Check log file path. Missing log file path information in the input XML. [2010-05-26 14:29:32.966 'TPDSrm.
Table 19 querySyncStatus Troubleshooting Checklist (continued) Problem Area Check Remote Copy group status. Check Remote Copy link status. 66 Problem Description Invalid Remote Copy group. Example of Key Log Message Troubleshooting Tips volume information. Additional information: by intermittent network interruption. [2010-05-27 16:33:37.734 'TPDSrm. testFailoverStart.Run' 3PAR_1022 error (id=1580)] Error. The specified group does not exist in the storage array.
Workflow chart querySyncStatus 67
Workflows and Corresponding Log Messages
queryReplicationSettings This command returns the replication settings, which need to be preserved after failover when replication is reversed or restored. HP 3PAR SRA saves this data, but currently does not make any reference to the data during reverseReplication or restoreReplication. Troubleshooting checklist Table 20 queryReplicationSettingsTroubleshooting Checklist Problem Area Problem Description Check log file path. Missing log file path [2010-05-26 14:29:32.966 information in the input 'TPDSrm.
Table 20 queryReplicationSettingsTroubleshooting Checklist (continued) Problem Area Check Remote Copy group status. 70 Problem Description Invalid Remote Copy group. Workflows and Corresponding Log Messages Example of Key Log Message Troubleshooting Tips volume information. Additional information: be caused by intermittent network interruption. [2010-05-27 16:33:37.734 'TPDSrm. testFailoverStart.Run' 3PAR_1022 error (id=1580)] Error. The specified group
Workflow chart queryReplicationSettings 71
3 Tear-Down Workflows The following is a list of important tear-down modules that are used in different commands. Following this list are workflow charts for each module. 72 Module Description CheckInFormVersion Checks and validates the connected storage array OS version. CheckInServConnectivity Connects to the storage array and get array model information. CheckInServLicense Validates the caller's specified license name with the registered license feature in the storage array.
Module Description GetTaskStatusByType Retrieves the task list by a task type specified by the caller. GetVvAssocVLunInfo Retrieves LUN exposure information for a specified virtual volume. GetVvInfoByVVID Retrieves the requested virtual volume information by searching through the cached VV array. PromoteDataToRecoveryPoint Promotes the virtual volume to the specific snapshot (recovery point).
CheckInFormVersion 74 Tear-Down Workflows
CheckInServConnectivity CheckInServConnectivity 75
CheckInServLicense 76 Tear-Down Workflows
CheckRCLinkStatus CheckRCLinkStatus 77
CreateRCGroupSnapshotForBackup 78 Tear-Down Workflows
CreateGroupSnapshotForTest CreateGroupSnapshotForTest 79
ExposeVvToSingleHost 80 Tear-Down Workflows
FailoverRCGroup FailoverRCGroup 81
Tear-Down Workflows
FailoverRCGroup 83
FillTestSnapshotInfo 84 Tear-Down Workflows
GetDetailRCInformation GetDetailRCInformation 85
GetInFormVersion 86 Tear-Down Workflows
GetInServHostPortInformation GetInServHostPortInformation 87
GetInServPortInformation 88 Tear-Down Workflows
GetInServTime GetInServTime 89
GetInServVVInformation 90 Tear-Down Workflows
GetNextAvailableLunNo GetNextAvailableLunNo 91
GetRCGroupLastSyncTime 92 Tear-Down Workflows
GetRCGroupStatus GetRCGroupStatus 93
GetRCTargetSysInfo 94 Tear-Down Workflows
GetRecoveryInformationById GetRecoveryInformationById 95
GetRecoveryInformationbyVvName 96 Tear-Down Workflows
GetTaskStatus GetTaskStatus 97
GetTaskStatusByType 98 Tear-Down Workflows
GetVvAssocVLunInfo GetVvAssocVLunInfo 99
GetVVInfoByVVID 100 Tear-Down Workflows
PromoteDataToRecoveryPoint PromoteDataToRecoveryPoint 101
PromoteSnapshotToParent 102 Tear-Down Workflows
RecoverRemoteCopyGroup RecoverRemoteCopyGroup 103
RemoveGroupSnapshots 104 Tear-Down Workflows
StartRemoteCopyGroup StartRemoteCopyGroup 105
StopRemoteCopyGroup 106 Tear-Down Workflows
SyncRemoteCopy SyncRemoteCopy 107
UndoRemoteCopyGroupRole 108 Tear-Down Workflows
UnExposeVVFromAllHost UnExposeVVFromAllHost 109
UnExposeVVFromHost 110 Tear-Down Workflows
WaitingForPromoteCompletion WaitingForPromoteCompletion 111
4 Error Messages Table 21 Message Table 112 Name Message Description Fix Hint 3PAR_1001 Error. Exception has occurred: Contact HP 3PAR support if the problem persists. 3PAR_1002 Error. Failed to create process due to the If you have firewall or anti-virus sorftware installed on following: your system, make sure it allows network access for the HP 3PAR Storage System. 3PAR_1003 Error. Missing user name information.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1006 Error. The InServ Storage Server is not accessible. • Make sure you entered correct address for the HP 3PAR Storage System. • Check if there is any problem with your network connection. 3PAR_1007 Error. Unable to get system information. Additional information: • If the error is caused by insufficient privilege to issue the command, contact your storage administrator for assistance.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1014 Error. Missing required If you are missing the required license to perform license on the connected 3PAR Storage replication or snapshot operation, access your HP Server. 3PAR Storage System and issue the showlicense command to get license information. If you do not see the required license, contact HP 3PAR support for assistance. 3PAR_1015 Error.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1023 Error. Unable to show Remote Copy link • If the error is caused by insufficient privilege to information. Additional information: issue the command, contact your storage administrator for assistance. • If the error is related to network issues, check if you have problem with your network connection.
Table 21 Message Table (continued) Name Message Description Fix Hint • If the error is related to network issues, check if you have problem with your network connection. • If you have firewall or anti-virus software installed on the system, make sure it allows network access to the HP 3PAR Storage System. 3PAR_1030 Error. Unable to get date information from InServ.
Table 21 Message Table (continued) Name Message Description Fix Hint on the system, make sure it allows network access to the HP 3PAR Storage System. 3PAR_1036 Error. Unable to stop Remote Copy Group locally. • If the error is caused by insufficient privilege to issue the command, contact your storage administrator for assistance. • If the error is related to network issues, check if you have problem with your network connection.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1041 Error. Failover task has failed. Additional • If the error is caused by insufficient privilege to information: issue the command, contact your storage administrator for assistance. • If the error is related to network issues, check if you have problem with your network connection.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1047 Error. Unable to get virtual volume LUN information. Additional information: • If the error is caused by insufficient privilege to issue the command, contact your storage administrator for assistance. • If the error is related to network issues, check if you have problem with your network connection.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1054 Error. Unable to create group snapshot. Additional Information: • If the error is caused by insufficient privilege to issue the command, contact your storage administrator for assistance. • If the error is related to network issues, check if you have problem with your network connection.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1061 Error. Synchronization has failed. Additional information: • If the error is caused by insufficient privilege to issue the command, contact your storage administrator for assistance. • If the error is related to network issues, check if you have problem with your network connection.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1068 Error. Remote Group is • If the error is caused by insufficient privilege to not a Primary Group in the 3PAR Storage issue the command, contact your storage Server. administrator for assistance. • If the error is related to network issues, check if you have problem with your network connection.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_1081 Error. Unable to get volume history information for device. Additional information: • If the error is caused by insufficient privilege to issue the command, contact your storage administrator for assistance. • If the error is related to network issues, check if you have problem with your network connection.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_2002 Warning. No associated VLUN This volume is currently not exposed to any host. information is found for LUN ID . 3PAR_2003 Warning. Remote Copy Group might not be in sync. This is the result of Remote Copy group not being in started status and some of its VV members are not in synced state. 3PAR_2004 Warning. Unable to stop Remote Copy roup <{GroupName}>. Retry after 30 seconds.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_2016 Warning. 3PAR SRA only supports group HP 3PAR Remote Copy Software only supports group replication. replication. 3PAR_2017 Warning. Group name has unsupported naming convention. SRA does not support group name that contains .r. 3PAR_2018 Warning. Recovery point information is not available. Snapshot is not created according to the specification to support recovery point. 3PAR_2019 Warning.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_3014 Info. No associated VLUN information is found for LUN ID . None. 3PAR_3015 Info. VV is already exposed to the specified host. Additional information: None. 3PAR_3016 Info. Virtual volume is None. already exposed to the specified initiator group. 3PAR_3017 Info. LUN ID will be used. None. 3PAR_3018 Info. Snapshot LUN is already exposed. None. 3PAR_3019 Info.
Table 21 Message Table (continued) Name Message Description Fix Hint 3PAR_3037 Info. Mutex wait has failed. Additional information: None. 3PAR_3038 Info. None. 3PAR_3039 Info. Wait for all promote operations to complete. None. 3PAR_3040 Info. Check if there is any failed promote None. operation. 3PAR_3041 Info. Group promote has failed. Prepare None. to recover all devices back to its original content. 3PAR_3042 Info. User updates the log file size to be: None.