Dell EqualLogic Best Practices Series Microsoft SQL Server 2008 Backup and Restore using Dell EqualLogic and PowerVault DL2200 with CommVault Simpana A Dell Technical Whitepaper Storage Infrastructure and Solutions Engineering Dell Product Group March 2012
THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. © 2012 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.
Table of Contents Executive Summary .................................................................................................................................................. 4 1 Introduction ....................................................................................................................................................... 5 1.1 Audience ...............................................................................................................................................
Acknowledgements This whitepaper was produced by the PG Storage Infrastructure and Solutions of Dell Inc.
Executive summary Finding backup windows of low activity to reduce the extra load on a production server during backup operations has been a concern for many organizations. One way of achieving this is by off-loading backup operations to a proxy server that can directly read source data through the SAN and write to backup target, a process termed as off-host backup.
1 Introduction The PowerVault DL2200, backup to disk appliance powered by CommVault is designed to address data management, protection, and recovery pain points by delivering a disk-based backup and archive solution with deduplication and built-in array snapshot support. The EqualLogic PS Series arrays are designed to provide scalable storage, allowing organizations to grow and match their retention requirements without performance degradation.
1.2 Terminology The following terms are used throughout this document. Group: A PS Series group consists of one or more PS Series arrays connected to an IP network. A group may contain up to 16 arrays. Pool: A container where each member (array) is assigned when added to a group and data volumes assigned to hosts span across a pool. A pool can have up to 8 members. CommServe™: The master server used in CommVault Simpana.
2 Test system configuration To conduct the system testing detailed in this paper, the SQL Server test system shown in Figure 1 and Figure 2 were created.
Figure 2 High Level view of Test System Components Key design details in the test system configuration include: • • Installed and configured SQL Server 2008 R2 Enterprise Edition in a Windows Server 2008 R2 ® virtual machine (SQL DB VM) hosted on the VMware vSphere ESXi4.1 on a Dell PowerEdge R710 server. The virtual machine that hosted the SQL Server was configured to use eight virtual CPUs and 16GB of reserved memory.
Figure 3 • • • Figure 4 BP1020 vSwitch0 Configuration Created a separate vSwitch (“vSwitch0”) for server side LAN network and a separate vSwitch (“vSwitch1”) for iSCSI SAN access. Created virtual network adapters (type VMXNET 3) within the VM and assigned them to the vSwitch1 (Refer to Figure 4) on the vSphere host. Used EQL MPIO DSM via Host Integration Tools (HIT) kit to setup multiple paths from the guest VM to the storage volumes. These paths are labeled as “Guest iSCSI Path” in Figure 2.
• • • • • • • • • BP1020 The two local disks installed in the R710 server were configured as a RAID 1. ESXi 4.1 was installed on these disks, and the guest virtual machine OS disk partitions were also hosted within the VMFS file system on these disks. A second VMware ESXi 4.1 server (INFRA) was used to host virtual machines for vCenter and Active Directory. A third VMware ESXi 4.
3 Test studies The following test scenarios study the performance impact on the production server when using CommVault Simpana’s off-host SnapProtect feature for database backup and restore. The impact of deduplication during the off-host SnapProtect backup was also analyzed. Figure 5 illustrates the data flow across the SAN during the off-host SnapProtect database backup operations.
3.1 SnapProtect Operations In SnapProtect, a backup job is scheduled using the CommCell console at the DL2200. When the backup job is started: First, CommVault’s SQL iData Agent in the client SQL Server host machine communicates with the SQL Server application to put it in a consistent mode, flushing logs and ensuring consistent copies of data are created using Windows Volume Shadow Copy Service (VSS).
3.2 Test #1: SQL Server performance impact studies using SnapProtect For this test, four ~150GB databases were created using Quest Benchmark factory. The goal of this test was to study the performance impact on the production SQL Server during off-host SnapBackup and also study the reduction in backup window when multiple databases were backed up at the same time. To achieve this, four CommVault backup subclients were created and each was assigned a different database to perform a SnapBackup.
3.2.2 Backup duration The backup duration depends on I/O throughput and decreases with increased throughput. So for this test, backup throughput was measured at the DL2200, by performing a full SnapProtect backup of one database by one client and linearly increasing the number of subclients to perform a SnapBackup of as many as four databases simultaneously. The snapshots for the databases were taken from the CommCell console and then an offline backup was run for the databases.
Figure 7 High-level view of Backup copy of multiple DBs Figure 8 shows the average and max write backup throughput and backup duration in each job with one to four database backups running simultaneously. As the number of database backup copies increased from one to four, the backup throughput increased due to multiple readers and writers involved during multiple database backups. This in turn reduced the backup duration for the multiple backup jobs.
The backup duration was less for the simultaneous backups when compared to backups that were taken sequentially (Refer to Figure 9).The sequential duration was plotted theoretically by multiplying the duration of single database backup with the number of databases backed up. Figure 9 3.2.3 Backup Duration Comparison CPU utilization at production SQL Server The SnapBackup occurs by mounting the snapshots on the DL2200 appliance, thus taking the backup process off the production host.
3.3 Test #2: SQL Server performance impact studies during full & Tlog SnapProtect backup This test simulated an environment where the typical production duration is an eight hour workday, full backups are taken daily, and transaction log backups are scheduled to run every 30 minutes during the eight hour work day.
Volume DB-BACKUP4 DB-BACKUP5 DB-BACKUP6 DB-BACKUP7 3.3.2 Size 1TB 1TB 1TB 1TB Purpose SQL Server DB Backup Files SQL Server DB Backup Files SQL Server DB Backup Files SQL Server DB Backup Files Full and Tlog backup Figure 12 compares the CPU utilization at the Production SQL Server during backup using CommVault’s SnapProtect feature and native SQL backup utility (SQL Server Management Studio here).
Figure 13 shows the CPU utilization during Tlog backups using CommVault and native SQL Server Management studio. Since CommVault SnapProtect is supported only on full, differential backups, and incremental backups for SQL iData Agent, the CPU impact on the production SQL Server during Tlog backups was similar to the CPU utilization seen using native SQL server tools. The Tlog backups work like regular backups through the LAN in SnapProtect and consume CPU resources at lower utilization levels.
The total time required to complete full database and transactional log backups was measured along with backup throughput during each operation. The baseline full backup in Figure 14 is the full backup taken when no user transactions were running on the production server. Figure 14 compares the increase in backup time for the backups taken during the user load vs. the backup taken with no user load.
Figure 15 Application response time comparison during full backup BP1020 EqualLogic & PowerVault: SQL Backup and Restore with CommVault Simpana 21
There was minimal impact on the disk read latencies at the source array (SQL Server production DB data) during off-host snap backup. The production server and the array were able to absorb this latency, resulting in minimal impact on the application response times as shown in Figure 14. Figure 16 Impact on disk read latencies during off-host full backup 3.3.
The type of restore i.e. either from snapcopy or diskcopy would depend on Recovery Point Objective and Recovery Time Objective defined for a specific business continuity strategy. Figure 17 Specifying copy precedence to restore from disk copy backup To restore a database to a point in time, the last full backup needs to be restored first followed by the transaction log backups until the required point in time. For this test, the full backup was restored initially, followed by 15 Tlog backup restores.
3.4 Test #3: SQL Server performance impact studies during deduplication 3.4.1 Deduplication terminology Deduplication focuses on data within a given object and provides an efficient method for storing data by identifying and eliminating duplicate items in backups. Block level deduplication is performed at the data block level by comparing blocks of data against each other. Deduplication uses a hashing algorithm to compare data.
3.4.2 Source side deduplication In this section, tests were performed for source side deduplication to study the impact on production SQL Server and overall dedupe and backup performance. For this test, • • • A storage policy with source deduplication was created. Client side compression and signature generation were enabled at the subclient in the CommCell console.
Figure 19 represents the high-level dataflow of source dedupe with production SQL server as proxy. Here the snapshots get mounted in the production SQL server, software compression and signature generation/hashing happen in the same server. The data is first split into 128KB blocks (this is configurable) and signatures are generated for these blocks. These initial signatures are stored in the dedupe datastore located at the DL2200.
PowerVault DL2200 as Proxy Server Figure 21 Data flow in source dedupe with the DL2200 as proxy Figure 21 shows the data flow in this scenario. The DL2200 server was specified as the proxy server for performing SnapProtect backup operations for this test. It was specified in the subclient SnapProtect operations tab under properties (Refer to Table 4). The snapshots get mounted at the DL2200 server, software compression and signature generation/hashing occur in the DL2200.
Since the compression and hashing happen at the DL2200, there was very little resource impact seen in the production server (Refer to Figure 22) and DL2200 server resources were used and high CPU utilization was observed at the DL2200 (Refer to Figure 22). Also the DL2200 has visibility to both source and target volumes, and the datacopy occurs off-host through the SAN. Therefore, the production server resources have minimal impact. Figure 22 Impact on CPU utilization at production SQL Server vs.
2 3 4 User Transactions on Database Dedupe#2- Full Backup 5 User Transactions on Database Dedupe#3- Full Backup 6 Dedupe#4- Full Backup Ran user transactions from QBMF on the database for 30 minutes to generate differential. Performed a full backup which has some differential compared to the full backup performed in step 1 Ran user transactions from QBMF on the database for 30 minutes to generate data change.
Figure 24 shows the capacity reduction after each deduplication. As expected, when there was no data change, the capacity savings percentage was 99%. Figure 24 Backup capacity size reduction 3.4.3 Deduplication block size studies on source side deduplication For the source side deduplication test, several full backups were run with 128KB and 256KB block sizes to see the performance impact on the production server and DL2200.
percentage was higher compared to the data set reduction percentage when using 256KB block size (Refer to Figure 25).The reduction percentage decreased when using larger block sizes because it makes the deduplication engine compare more data in a single chunk. However, the capacity savings were ~100% for the “Dedupe#4” (Full backup with no change in dataset from previous full backup) in both the block sizes. Hence, bigger block sizes means lesser deduplication, hence lesser capacity reduction.
3.4.4 Target deduplication Figure 27 Data flow in target dedupe A storage policy with target deduplication was created enabling compression and signature generation at the media agent in the DL2200. The difference between target dedupe and source dedupe with DL2200 as proxy is in the way they are created using storage policy and the way the agents perform dedupe and backup, however the data flow looks very similar in both the cases. Figure 27 shows the data flow in target deduplication .
Figure 28 Impact on CPU Utilization at production server vs. DL2200 Next, in the target deduplication impact studies, the steps specified in Table 8 were performed for source and target dedupes. The backup duration measured at the DL2200 includes the time it takes for snapshot creation, movement to proxy, complete indexing, and then copy to the disk target location. Table 8 Serial.
Figure 29 Backup duration comparisons The capacity reduction was almost the same in both test scenarios (Refer to 30). Figure 30 Backup data size reduction comparison 3.5 Analysis and conclusion for Dedupe studies SnapProtect backup greatly reduces the impact on the resources at the production SQL server during backup when compared to the backups performed using native SQL backup utility. Source and target Deduplication have their pros and cons. The usage depends on the goal for the backup operation.
In SnapProtect while using the source dedupe, the proxy feature can be used to move where the deduplication processing takes place. Refer to section 3.4.2 for the details. When a backup server such as the DL2200 is used as the proxy in source dedupe, the backup takes place off-host resulting in minimal impact at the production server.
4 Best practice recommendations 4.1 Network infrastructure The following are network infrastructure design best practices: • • • • • Design redundant SAN component architectures. This includes the NICs on the servers and switches for the storage network (including server blade chassis switches and external switches). Make sure that the server NIC ports and storage NIC ports are connected so that any single component failure in the SAN will not disable access to any storage array volumes.
Note: The EqualLogic Auto-Snapshot Manager/Microsoft Edition application in conjunction with SQL backup can help to improve RPO and RTO goals without any disruption to the database applications. ASM/ME helps to create transactionally consistent smart copies (snapshots, clones or replicas) of SQL Server® databases. An in-depth discussion on SQL Server data protection using EqualLogic smart copy snapshots can be found at http://en.community.dell.
scenarios. So it is recommended to design backup environments that ensure copies of data are regularly created and backed up to an external storage (Example: The DL2200 used in this paper) before the snapshot’s retention policy expires. • The jobs for the snapshot are pruned based on the retention policy of the snapshot copy. • The snapshots related to the pruned jobs are deleted from the array periodically. Snapshot Reserve Snapshot reserve space is needed to be allocated before creating snapshots.
Performance • • If the main goal of database backup is to reduce the impact on the production server and optimize application response times then SnapProtect would be an optimal choice. If in addition, backup capacity savings is another goal then SnapProtect with deduplication would be the proper choice.
Example 1 For the details on the projected average transaction time for an insert/query in the Deduplication database based on the size of the application data that is backed up, use the tool with the -simulateddb and -datasize options. COMMAND SIDB2 -simulateddb -in instance001 -p d:\dedup_store -datasize 500 SAMPLE OUTPUT The disk is capable of hosting a deduplication DB for: 0.500 TB of Application Data Size 0.100 TB of data on disk 146.
Path Used Read-Write type File Count Total Bytes Written Time Taken to Write(S) Throughput Write(GB/H) Total Bytes Read Time Taken to Read(S) Throughput Read(GB/H) : : : : : : : : : f:\ RANDOM 500 1048576000 7.113515 494.217709 1048576000 7.581667 463.700792 Ensure that the average read throughput of the disk is around 500 GB per hour, and the average write throughput of the disk is around 400GB per hour.
Appendix A Test system component details This section contains an overview of both the hardware and software configurations used throughout the testing described in this document. Table 11 Test Configuration Hardware Components Test Configuration – Hardware Components SQL Server® One (1) Dell PowerEdge R710 Server running VMware ESXi v4.1, hosting a (ESXi01) single SQL Server® Database virtual machine: BIOS Version: 3.0.0 2 x Quad Core Intel® Xeon® E5520 Processors 2.
1 x Dell EqualLogic PS6500E: 148x 1TB SATA drives as RAID 50, with two hot spare disks Dual quad-port 1GbE controllers running firmware version 5.1.1(H1) Test Configuration – Software Components Operating systems Host: VMware vSphere ESXi Server v4.1 Guest: Microsoft® Windows Server 2008 R2 Enterprise Edition (virtual machine): o MPIO enabled using EqualLogic DSM for Windows when using guest iSCSI initiator o EqualLogic Host Integration Toolkit(HIT) v3.5.
Additional resources Support.dell.com is focused on meeting your needs with proven services and support. DellTechCenter.com is an IT Community where you can connect with Dell Customers and Dell employees for the purpose of sharing knowledge, best practices, and information about Dell products and your installations. Referenced or recommended Dell publications: • PS Series Array Network Performance Guidelines http://www.equallogic.com/resourcecenter/assetview.
THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.