Technical white paper HP Matrix 7.2 KVM Private Cloud Backup and Restore Table of contents Abstract .......................................................................................................................................3 Matrix Operating Environment with Matrix KVM Private Cloud Overview ............................................3 Matrix KVM Private Cloud Management Overview .......................................................................
How to use the sample restore script .........................................................................................51 Restore Script .........................................................................................................................51 Example Output ......................................................................................................................60 Troubleshooting Tips........................................................................................
Abstract The purpose of this white paper is to assist IT professionals in performing backup and recovery of HP Matrix KVM Private Cloud integrated with Insight Management software products. HP Matrix KVM Private Cloud appliance is added to the Insight Management software to manage and provision Kernal Virtual Machines (KVM). Optionally, the HP Matrix KVM Private Cloud appliance can be run in a High Availability (HA) configuration.
deploy, easy to use solution focused on the provisioning, optimization and ongoing management of infrastructure services. It targets private, public and hybrid cloud solutions deployed by enterprise and service providers. Optionally, the Matrix KVM Private Cloud can be run in an HA configuration, in a two-node cluster, to make it highly available.
Figure 2: HA solution overview 5
Backup and restore strategy for Matrix OE with Matrix KVM Private Cloud Matrix OE 7.2 configuration can incorporate a variety of software components. HP Matrix KVM Admin Console can be used to manage KVM Private Cloud. Additionally, the Insight Control Server Provisioning deployment software can be used to help provision images for physical servers. If any of these components become corrupted or lost, it may be necessary to restore them from a previously backed up copy.
Triggering a backup of the appliance Backups are taken and downloaded from the appliance using REST APIs (Appendix E). A sample backup script is provided in Appendix F. It is recommended to save the following data before the initial backup and subsequently whenever the appliance specification (XML) is redefined or the hostname/IP Address of the appliance is changed. This information will be needed when performing a restore. 1. Save the current appliance specification XML.
• Before starting a restore, you may want to download the existing audit logs. Restore will replace the audit logs with those in the backup. • Before starting a restore, make sure you know the appliance user names and passwords in effect at the time of the backup. Restore resets the user names and passwords to the ones configured when the backup was taken. • If you are restoring to a different appliance from the one where the backup was taken, you must take extra precautions before starting the restore.
Manual Recovery Actions Inconsistencies can occur as a result of a restore if the state of the managed resources at the time of backup and restore are different. In cases where the backup does not contain enough data about the current managed resources, a manual recovery should be performed. Alerts and audit messages provided by the Matrix KVM Private Cloud can be used to identify the recovery action that should be performed (Refer to Appendix C).
If the instance is not deployed from an ami format then, qemu-img convert -f qcow2 -O qcow2 -s
Backup procedure for Images repository The Images repository can be backed up online and requires no services to be stopped. However, it is strongly recommended that you back up the Images repository when no upload or add image operations are in progress. This will avoid partial backups of images. Warning If Images repository backup is taken while an image is being added and a restore is done from that backup, that partial image can incorrectly have a status of Active.
Completing the recovery process Once the Matrix KVM Private Cloud appliance is re-started, the appliance will synchronize with the restored Images repository contents. Since the Matrix KVM Private Cloud appliance and the Images repository can be backed up and restored independently, there could be inconsistency between the metadata stored in the KVM Private Cloud database for the images stored in the Images repository.
Note The KVM Private Cloud data is backed up and restored using the backup/restore procedures covered in the ‘Matrix KVM Private Cloud backup and restore’ section above. The Images repository data is backed up and restored using the procedures described in the previous section. Scope The scope of this section is to only cover backup and restore procedures for the HA cluster configuration data.
Recovery steps 1. Unexport the Quorum disk, the Matrix KVM Private Cloud appliance physical disk and the Images repository disk from the failed host only. 2. Reinstall the KVM Host OS, using the 6.3 RedHat version following the “RedHat Enterprise Linux 6 – Installation Guide”. Then install the add-ons as shown in the table below.
Note Recover and restart iptables from a backup if the firewall is set up. 9. Restart the network with the new configuration using the following commands: service network restart service iptables restart service multipathd restart 10. Check whether the disks (isc_quorum, isc_root, isc_glance) are visible on this host using the following command: multipathd –ll 11. Change the context of the VM configuration file using the following command: chcon –v --type=bin_t /vm/config/.xml 12.
4. Update the multipath configuration in /etc/multipath.conf (update the WWID for the new disk). Edit the WWID that is highlighted in the sample configuration below, for the alias “isc_quorum” to match the newly exported Quorum disk’s configuration in your setup. 5. Stop the rgmanager and the cluster service on both nodes using the following command: 6. service rgmanager stop service cman stop Restart the multipath service on both hosts. service multipathd restart 7.
2. Recover the damaged storage. • Unexport the damaged disk to both cluster nodes. • Create a new external shared disk (200GB) for the KVM Private Cloud. • Export the disk to both cluster nodes (use the same LUN# used by the previous disk). • Update the multipath configuration in /etc/multipath.conf (update the WWID for the new disk). Edit the WWID as highlighted in the sample configuration below, for the alias “isc_root” to match the newly exported KVM Private Cloud disk’s configuration in your setup.
Matrix KVM Private Cloud Appliance Data restore The second kind of appliance disk failure occurs when data in the Matrix KVM Private Cloud appliance disk becomes corrupted. Restore procedures are involved only in recreating the Matrix KVM Private Cloud appliance. Use the existing Matrix KVM Private Cloud appliance logical volume to perform restore procedures for the Matrix KVM Private Cloud. All the HA components are intact and do not need to be updated. Recovery Steps 1.
• If the rgmanager is set to ON in chkconfig, set it to OFF before rebooting the system. This prevents the auto-starting of the rgmanager after a system reboot. • Reboot both nodes (due to the existing /dev/vg_osimage file that prevents the creation of a new vg_osimage). • Set rgmanager chkconfig ON/OFF to the previous state if it changed when the nodes came back up. 3. Recreate the Images repository volume group and logical volume. Use the correct /dev/mapper/ to create the volume group on one node.
Product To Install Basic Server Additional Addons To install Packages To Install Notes High Availability All packages in the add-on Required for cluster Resilient Storage The storage packages in the add-on Required for HALVM Virtualization All packages in the add-on Required for virtualization Desktop X Window System and Desktop Provide Console GUI Base System Storage Availability Tools Required for multipath Note Depending on the way the OS is deployed, it may be necessary to change the
Ignore the “WARNING: About to destroy all data on /dev/mapper/…. 12. Verify that the Quorum disk has been created successfully. This command should display the details of the qdisk created from the previous step. mkqdisk -L 13. Recreate the KVM Private Cloud volume group and logical volume on one node. To find the heartbeat host name for the vgchange command, check the “volume_list” attribute configuration in the /etc/lvm/lvm.conf file.
Validating the HA configuration If the cluster needs to be restored for any of the above failure conditions, it is important to validate the restored HA configuration after the restore. The following steps outline the recommended validation operation. 1. Verify network bonding by observing the output of the following commands. (This is only needed if a node is restored.) ifconfig cat /proc/net/bondings/bond<#> Check Appendix D for the expected output. 2.
10. From Matrix OE, issue a request to the KVM Private Cloud to provision a VM and validate that it is provisioned successfully.
Appendix A: KVM Private Cloud restore resynchronization actions This section contains supplemental information that explains state changes after an appliance restore. The tables below indicate alerts and state changes that occur after a restore for various instances and images.
Appendix B: Images repository restore resynchronization actions This section contains supplemental information for restoring the images after an Images repository restore. Since the Images repository and the KVM Private Cloud containing the images metadata can be independently restored from backups taken at different timestamps, it is possible that inconsistencies could exist between the state indicated by the metadata entry and the actual image stored in the Images repository.
Appendix C: Alerts and Audit messages Alerts and Notifications After a restore, the KVM Private Cloud provides various notifications presented as alerts in the Activity page of the KVM Admin Console. The following alerts and audit logs are provided based on error conditions and state changes mentioned in Appendix A and Appendix B.
Instance Alerts Orphaned Instance Alert Disconnected Instance Alert Audit logs An appliance restore will replace the audit logs with those in the backup. After a restore, new audit logs are generated for all notifications/alerts and whenever state changes happen on the resources during the recovery process. To download audit logs, navigate to the Settings page in the KVM Admin Console and click Download audit logs from the Actions list in the top right corner.
Inconsistent Image 2012-11-24 04:54:47,292,IscRecovery,,,,Administrator,,,localhost,SUCCESS,KILLED,INFO,testinconsistent-image,Images files for the image test-inconsistent-image is not consistent,Images files for the image 3fbd57f3-a3d8-4aa7-9c6b-e70aa3706b1e is not consistent Missing Image 2012-11-24 04:58:10,779,IscRecovery,,,,Administrator,,,localhost,,START,INFO,testmissing-image,Processing the missing image with id test-missing-image,Processing the missing image with id cf27761b-a674-465e-8da1-8d98e6c0
Audit logs – Instances Orphaned Instance 2012-10-07 16:37:33,591,IscRecovery,,,,Administrator,,,localhost,,,,,,Orphaned instance:ID:{'name': u'instance-0000000f', 'power_state': u'ACTIVE', 'vm_host': u'RHEL-HA-G8', 'vCPU': 1, 'desc': u'Instance instance-0000000f is not found in instance inventory. This instance can be deleted from the host RHEL-HA-G8', 'memory': 1048576L, 'privateIP': u'10.1.0.
Appendix D: HA Cluster details Cluster Validation The following section has the sample output for the various commands used in validating the HA cluster configuration. 1. Ifconfig output The network configuration should be similar to the following output, after a restore.
2. Clustat output 3.
Cluster Configuration The following section shows the sample output for the configuration files. 1. List of cluster configuration files If you have enabled the firewall, validate that the ports required for cluster communication are enabled. Please follow the RedHat cluster configuration manual on the list of ports to enable for cluster communications. A sample output displaying the list of ports enabled is provided below.
2. Sample /etc/cluster.conf file PAGE 35Appendix E: Backup and Restore REST API Backup REST API Overview The backup REST API provides REST calls to request a backup, check the backup status, download the completed backup, and cancel a backup. These calls are summarized in the table below. The backup REST API calls require a session ID for authorization, which is obtained by issuing the REST request to log in to the appliance as a user with the "Infrastructure Administrator" role.
Restore REST API Overview The restore REST API provides REST calls to upload a backup to the appliance, start a restore, and check the restore status. These calls are summarized in the table below. The REST API calls to start a restore require a session ID for authorization. The session ID is obtained by issuing the REST request to log in to the appliance as a user with the "Infrastructure Administrator" role. The REST API calls to get restore status information do not require a session ID.
Appendix F: Sample Backup Script An example PowerShell script is provided for creating and downloading a backup. This script uses PowerShell version 3.0. It makes REST calls to create and download a backup. HP highly recommends installing cURL to improve performance. The sample script can be scheduled to run automatically on a regular basis. How to use the backup script You can copy and paste the sample script into a file on a Windows system that runs PowerShell version 3.0.
.INPUTS None, this function does not take inputs. .OUTPUTS Returns an object that contains the login name, password, and host name to connect to. .EXAMPLE $variable = queryfor-credentials #runs function, saves json object to variable. #> if ($args[0] -eq $null) { Write-Host "Enter Appliance name (https://ipaddress)" $appliance = Read-Host # Correct some common errors $appliance = $appliance.Trim().ToLower() if (!$appliance.StartsWith("https://")) { if ($appliance.
$exitquery = 0 } } while ($exitquery -eq 0) } else { Write-Host "improper filepath or no permission to write to given directory" Write-EventLog -EventId 100 -LogName Application -Source backup.ps1 -Message "Improper filepath, $storagepath " $_.Exception.
Attempts to send a web request to the appliance and obtain a authorized sessionID. .PARAMETER username The username to log into the remote appliance .PARAMETER password The correct password associated with username .PARAMETER hostname The appliance address to send the request to (in https://{ipaddress} format) .INPUTS None, does not accept piping .OUTPUTS Outputs the response body containing the needed session ID. .
$bkupURI = "/backup/rest/resources/" $fullBackupURI = $hostname + $bkupURI # create a new webrequest and add the proper headers (new header, auth, is needed for authorization # in all functions from this point on) try { $rawTaskResource = setup-request -Uri $fullBackupURI -method "POST" -accept "application/json" -contentType "application/json" -authValue $authValue if ($rawTaskResource -ne $null) { $rawTaskResource | convertFrom-Json } } catch [System.
$rawStatus = setup-request -Uri $fullStatusUri -method "GET" -accept "application/json" authValue $authValue -isSilent 1 # converts the response from the Appliance into a hash table $taskStatus = $rawStatus | convertFrom-Json # checks the status of the task manager $status = $taskstatus.taskState } catch { $errorMessage = $error[0].Exception.
.PARAMETER hostname the appliance to connect to (in https://{ipaddress} format) .INPUTS None, does not accept piping .OUTPUTS The backup resource object .EXAMPLE $backupResource = get-BackupResource $taskResource $sessionID $applianceName #> # appends URI (obtained from previous function) to Ip address $resourceUri = $hostname + $taskResource.
$downloadUri = $hostname + $backupResource.downloadUri $fileDir = [environment]::GetFolderPath("Personal") $filePath = $fileDir + "\" + $backupResource.id + ".
.OUTPUTS The absolute path of the download file .EXAMPLE download-backup-without-curl $backupResource $sessionID https://11.111.11.111 #> # appends URI ( obtained from previous function) to IP address $downloadURI = $hostname + $backupResource.downloadUri $downloadTimeout = 43200000 # 12 hours $bufferSize = 65536 # bytes # creates a new webrequest with appropriate headers [net.httpsWebRequest]$downloadRequest = [net.webRequest]::create($downloadURI) $downloadRequest.method = "GET" $downloadRequest.
} return } return $filePath } function setup-request ([string]$uri,[string]$method,[string]$accept,[string]$contentType = "",[string]$authValue = "",[object]$body = $null,[int]$isSilent=0) { try { [net.httpsWebRequest]$request = [net.webRequest]::create($uri) $request.method = $method $request.accept = $accept $request.Headers.Add("Accept-Language: en-US") if ($contentType -ne "") { $request.ContentType = $contentType } if ($authValue -ne "") { $request.Headers.Item("auth") = $authValue } $request.Headers.
Write-EventLog -EventId 100 -LogName Application -Source backup.
#loops to keep checking how far the backup has gone $taskResource = waitFor-completion $taskManager $authValue.sessionID $hostname if ($taskResource -eq $null) { if ($global:interactiveMode -eq 1) { Write-Host "Could not fetch backup status" } Write-EventLog -EventId 100 -LogName Application -Source backup.ps1 -Message "Could not fetch backup status" return } #gets the backup resource $backupResource = get-backupResource $taskResource $authValue.
Backup initiated. Checking for backup completion, this may take a while. Backup progress: [====================] 100 % Obtained backup file URI, now downloading Backup download complete! Backup can be found at C:\Users\jerry\Documents If you wish to automate this script in the future and re-use login settings currently entered, then provide the file path to the saved credentials file when running the script. ie: C:\Users\jerry\backup.
0 Internal Server Error 503 Server Not Available 50 Various Various progress. operation. An internal error occurred. Download a support dump. Then reboot the appliance and retry the operation. There is a network connectivity problem or the appliance software is down. Try to connect to the appliance UI from the system issuing the REST request. Resolve network and/or appliance problems.
Appendix G: Sample Restore Script An example PowerShell script is provided for uploading and restoring a backup. This script uses PowerShell version 3.0. It makes REST calls to upload and restore a backup. HP highly recommends installing cURL to improve performance. How to use the sample restore script You can copy and paste the sample script into a file on a Windows system that runs PowerShell version 3.0. HP highly recommends that you install cURL to improve performance.
{ $continue = 1 } } while ($continue -eq 0) do { Write-Host "Enter directory backup is located in (ie: C:\users\joe\)" $backupDirectory = Read-Host # Add trailing slash if needed if (!$backupDirectory.EndsWith("\")) { $backupDirectory = $backupDirectory + "\" } Write-Host "Enter name of backup (ie: appliance_vm1_backup_2012-07-07_555555.
.OUTPUTS Outputs the response body containing the needed session ID. .
Write-Host "Uploading backup file to appliance, this may take a few minutes..." try { $rawUploadResponse = invoke-expression $curlUploadCommand if ($rawUploadResponse -eq $null) { return } $uploadResponse = $rawUploadResponse | convertFrom-Json if ($uploadResponse.status -eq "SUCCEEDED") { Write-Host "Upload complete." return $uploadResponse } else { Write-Host $uploadResponse return } } catch [System.Management.Automation.
$uploadRequest.Headers.Add("auth", $authinfo) $uploadRequest.Headers.Add("X-API-Version", 1) $fs = New-Object IO.FileStream ($filepath,[System.IO.FileMode]::Open) $rs = $uploadRequest.getRequestStream() $rs.WriteTimeout = $uploadTimeout $disposition = "Content-Disposition: form-data; name=""file""; filename=""encryptedBackup""" $conType = "Content-Type: application/octet-stream" [byte[]]$BoundaryBytes = [System.Text.Encoding]::UTF8.GetBytes("--" + $boundary + "`r`n"); $rs.
The authorized sessionID obtained from login. .PARAMETER hostname The appliance to connect to. .PARAMETER uploadResponse The response body from the upload request. Contains the backup URI needed for restore call. .INPUTS None, does not accept piping .OUTPUTS Outputs the response body from the POST restore call. .EXAMPLE $restoreResponse = start-restore $sessionID $hostname $uploadResponse #> # append the appropriate URI to the IP address of the Appliance $backupUri = $uploadResponse.
$fullStatusUri = $hostname + $restoreResponse.uri } do { try { # create a new webrequest and add the proper headers (new header, auth is needed for authorization $rawStatusResp = setup-request -uri $fullStatusUri -method "GET" -accept "application/json" -contentType "application/json" -authValue $authinfo $statusResponse = $rawStatusResp | convertFrom-Json $trimmedPercent = ($statusResponse.
} return $idResponse.members[0].uri } function setup-request ([string]$uri,[string]$method,[string]$accept,[string]$contentType = "",[string]$authValue="0", [object]$body = $null) { <# .DESCRIPTION A function to handle the more generic web requests to avoid repeated code in every function. .PARAMETER uri The full address to send the request to (required) .PARAMETER method The type of request, namely POST and GET (required) .PARAMETER accept The type of response the request accepts (required) .
$errorResponse = $error[0].Exception.InnerException.Response.getResponseStream() $sr = New-Object IO.StreamReader ($errorResponse) $rawErrorStream = $sr.readtoend() $error[0].Exception.InnerException.Response.close() $errorObject = $rawErrorStream | convertFrom-Json if ($global:interactiveMode -eq 1) { Write-Host $errorObject.message $errorObject.recommendedActions } else { Write-EventLog -EventId 100 -LogName Application -Source backup.ps1 -Message $errorObject.message } } catch [System.
$authinfo = login-appliance $loginVals.userName $loginvals.password $loginVals.hostname if ($authinfo -eq $null) { Write-Host "Error getting authorized session from appliance, closing program." return } $uploadResponse = uploadTo-appliance $loginVals.backupPath $authinfo.sessionID $loginVals.hostname $loginVals.backupFile if ($uploadResponse -eq $null) { Write-Host "Error attempting to upload, closing program." return } $restoreResponse = start-restore $authinfo.sessionID $loginVals.
Troubleshooting Tips HTTP error Response Body Error Code Description Resolution 401 Unauthorized AUTHORIZATION An incorrect user name or password was specified. Specify the correct user name and password. RESOURCE_NOT_FOUND The incorrect URI was specified. Specify the correct URI. You may need to wait for the appliance software to start. You can find out the correct URI using this guide. It may help to issue the REST request to get the last backup resource.
References Backing up and restoring HP Insight Management 7.2 Central Management Server (Windows) HP Matrix 7.2 KVM Private Cloud Getting Started Guide HP Insight Control server provisioning Administrator Guide RedHat Enterprise Linux 6 – Installation Guide https://access.redhat.com/knowledge/docs/enUS/Red_Hat_Enterprise_Linux/6/pdf/Installation_Guide/Red_Hat_Enterprise_Linux-6-Installation_Guide-en-US.