Dell EqualLogic Best Practices Series Best Practices for Enhancing Microsoft Exchange Server 2010 Data Protection and Availability using Dell EqualLogic Snapshots A Dell Technical Whitepaper This document has been archived and will no longer be maintained or updated. For more information go to the Storage Solutions Technical Documents page on Dell TechCenter or contact support.
THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. © 2011 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.
Table of Contents 1 2 3 Introduction ....................................................................................................................................................... 5 1.1 Purpose and scope ................................................................................................................................... 5 1.2 Target audience .........................................................................................................................................
A.2 Network configuration ........................................................................................................................... 48 A.3 Host hypervisor and virtual machines configuration ........................................................................ 50 A.3.1 ESXi Virtual Network Configuration ..............................................................................................51 A.3.2 Virtual Machines Network configuration ......................................
Acknowledgements This whitepaper was produced by the PG Storage Infrastructure and Solutions team between January 2011 and April 2011 at the Dell Labs facility in Round Rock, Texas.
1 Introduction IT professionals and businesses are constantly challenged by the exponential growth of the amount of data they have to manage. While data management encompasses several areas of an organization, ensuring the data protection and business continuity for the data content is one of the most critical aspects. Loss of data for mission critical applications and lack of application availability for users is not tolerable outside business continuity strategy boundaries.
• • the SAN level. They open new perspectives in dealing with the data protection strategies when integrated with backup applications. Rapid creation of a recovery point: The process of creating a SAN snapshot takes very little time. Thus it opens up the opportunity of establishing more frequent application recovery points, providing the choice of a more granular recovery point objective for the application data.
2 Exchange Server data protection options and challenges The traditional elements that should be identified before the definition of a business continuity plan (BCP) of any service are: • • • Recovery Point Objective (RPO), which defines the maximum acceptable amount of data loss measured in time Recovery Time Objective (RTO), which defines the duration of time within which the service of your application must be restored Features, constraints, and potential of the technology underlying the infrastructure
• • • ® Active Directory , where users and services configurations are stored, and authentication and authorization occur Exchange Servers with roles other than Mailbox (Client Access, Hub Transport, Unified Messaging and Edge Transport), that, from an high level, are in charge of accepting client connections, interconnect with PBX systems, and routing messages within or in/out of your organization Distributed email containers, external to the central messaging infrastructure, such as .
For additional information about DAG refer to Microsoft documentation: Understanding Database Availability Groups, available at: http://technet.microsoft.com/en-us/library/dd979799.aspx 2.
Once again, the presence of the required VSS writer is shown by DISKSHADOW.EXE: Microsoft Exchange 2010 offers support for multiple ways to restore its mailbox databases and data. Every way offers different advantages in terms of RPO and RTO. In-place DB restore: Represents the restore of the database (*.edb file) in the same location where it was originally. The data contained in the database is the data captured until the last checkpoint occurred.
3 Auto-Snapshot Manager and Exchange protection Auto-Snapshot Manager/Microsoft Edition (ASM/ME) is a software component that is part of the EqualLogic Host integration Tools (HIT kit), a software toolkit available free of cost for customers running Dell EqualLogic PS Series arrays. ASM/ME offers an intuitive interface, implemented as a snapin (up to version 3.5.
Figure 2 Architectural diagram of ASM/ME and Volume Shadow Service The DISKSHADOW.EXE CLI command reported below provides access to the VSS configuration and objects. DISKSHADOW>list providers When the EqualLogic VSS HW provider is present in the system the output includes an element similar to the one in the red box.
Table 1 ASM/ME and Microsoft Exchange Task Description Create Smart Copy Creates an Exchange-aware SAN snapshot (or clone) of a mailbox database volume, associated with a backup document in the local host. The snapshot (or clone) is created as a ‘copy’ of the volume, and thus the transaction logs are not truncated by this process. Collection Represents a set of volumes grouped by an operational logic (consistent actions to multiple volumes).
4 Test topology and architecture We decided to use a real world scenario simulation to identify the benefits of using EqualLogic SAN snapshots to strengthen the Exchange 2010 data protection.
Figure 3 Functional system design diagram 4.2 Physical system configuration The physical components of the test infrastructure were laid out as shown in Figure 4. In order to provide more flexibility and datacenter density, we deployed this solution on Dell Blade servers.
Figure 4 Physical system configuration diagram More details of the test configuration setup (software, storage, hypervisor, virtual machines and network) are provided in Appendix A, as well as a detailed schematic diagram showing the various network connection paths across the blade chassis switch fabrics. 4.
• Snapshot reserve configured for each volume with enough room to accommodate and retain the number of snapshots planned for the tests Figure 5 reflects the mailbox databases layout implemented on the EqualLogic SAN.
5 Test workload The goal of the baseline workload was to simulate a regular corporate messaging activity during peak hours and then to apply different data protection and recovery techniques to measure the efficiency and impact on the local and network resources of each of them. For the Microsoft Exchange mailbox database and users profile we applied the configuration reported in Table 2.
cached), POP, IMAP4, SMTP, ActiveSync, and Outlook Web App.
6 Testing the impact of Smart Copy snapshots The goal of this set of tests was to establish the impact we should expect from creating Smart Copy snapshots of mailbox database volumes under user load in the simulated test environment. First, we evaluated the resource utilization of a single Smart Copy snapshot taken under varying and increasing user workloads. We kept the simulation active before and after taking the snapshot to verify any behavioral changes in the key indicators.
The combination of the results for these series of tests helped us evaluate the cost of creating Smart Copy snapshots for an entire business day (identified by 8 hours) and to assess their variation while the amount of online users was variable. At the same time we verified if and how the occurrence of a critical situation would affect this status.
Figure 7 Time taken for 1 to 8 Smart Copy snapshots of one volume under a load of 5000 concurrent users Figure 8 Time taken for 1 to 8 Smart Copy snapshots of one volume under a load of 5000 concurrent users with one active host only These results clearly show that the process of creating a Smart Copy snapshot had a predictable and minimal duration, with an average time of less than 5 seconds.
Table 4 shows the Exchange KPIs that qualified the health of the mailbox database server during the load of 5000 concurrent users while creating and maintaining the cumulative eight Smart Copies. These values are the average of all the samples that were collected during the entire duration of the simulation. We did not identify any specific spike on any of these counters even at the outset of the snapshot process, therefore the identified KPIs where constantly under the recommended thresholds.
Figure 9 Disk I/O pattern during multiple Smart Copy snapshots on one volume Note: I/O operations (reads or writes) counter values for processes reported by Performance Monitor include all the operations executed by the process against the entire disk sub-system, not only the volume we are analyzing, nevertheless the reads spike is clearly due to the process reported (store.exe).
Remote host the task runs as part of a schedule on a remote server, which has all the ASM/ME and Exchange software components installed (Management tools) In any of these cases after the checksum runs, the backup document of the Smart Copy is updated accordingly.
verification would bring the CPU utilization of the system very close to the upper recommended threshold. The impact of ESEUTIL on the system can artificially be reduced adding a delay of 1 second for specified numbers of I/O by using the following parameter in the Windows registry (e.g. 1000 identifies the number of IOs): [HKEY_LOCAL_MACHINE\SOFTWARE\EqualLogic\ASM\Settings] "EseutilThrottle"=dword:00001000 Or, we advise and reinforce that, in the case where a time window to run this activity locally on th
7 Recovery scenarios As we showed how EqualLogic Smart Copy snapshots can be taken in an Exchange environment and overall what the efficiency and the cost will be in term of local host resources, we moved forward in the test plan to verify the recovery scenarios that will directly benefit from the use of these Smart Copy snapshots.
differential backup is now translated into the transaction logs backup that requires only the copy of the logs generated since the last full backup, and does not include a copy of the entire database file, even if modified. This technology then requires, at the time of recovery, a process named ‘transaction log replay’ to integrate all the changes tracked in the transaction logs into a previously restored full backup of a database.
Figure 11 Smart Copy snapshots recovery point timeline Simplification of the restore operations when using Smart Copy snapshots appears evident. During a database recovery scenario that requires restoring a full backup, an in-place restore of a snapshot will be much more time efficient because there is no data copy from the backup media.
Test Details for in-place restore of one database: Details Restore of one active database, where the second copy of the database has been suspended. Key indicators Time taken for restore An outline of the operations that are automated by ASM/ME when performing a restore from a Smart Copy set is summarized in the following list: 1. 2. 3. 4. 5.
Table 6 In-place full restore, time taken Operation ASM automated restore Task Duration Snapshot restore 25 sec Log replay 548sec Next steps End-user access Total (end-to-end restore) = 573 sec Smart Copy manual restore Snapshot mount 25 sec Log replay <5 sec/log Selective Log replay process Total (end-to-end restore) = Variable The number of transactions executed in the Exchange memory cache but not yet transferred to the mailbox database file is known as log checkpoint depth.
The remaining portion of data to be restored, elapsed between T8 and T9, should be protected by an integrated backup solution as explained at the beginning of Section 7. This would take the same amount of time to be recovered in either approach (incremental backup or Smart Copy snapshot). 7.3 Testing Brick level recovery In addition to performing in-place recovery of the Exchange databases, EqualLogic Smart Copy snapshots can be mounted as recovery databases to perform granular or brick level recovery.
environment. The times presented in Table 7, however, show an example of how soon the administrator can start recovering mailbox items from the recovery database. For reference only, we show a PowerShell command that can be used to restore data from an Exchange RDB. Where ‘RecoveryDBn’ is the name of the RDB, and ‘John Smith’ and ‘TempJSmith’ are examples of the source and target mailboxes names (using a mixed syntax of DisplayName and Alias attributes).
We configured a dedicated 1Gbps network (see Appendix A3.2) between the two nodes of the DAG to handle the Replication traffic required for regular and exceptional DAG seeding procedures. First we used this network to measure the load of the seeding and reseeding procedures on the DAG hosts.
Figure 12 DAG seeding process data flow with Exchange replication over the network Figure 13 and Figure 14 illustrate the CPU impact reported on both the source and target servers while proceeding with the mailbox database seeding. We show that the msexchangerepl.exe process running the Exchange replication service is the service responsible for the CPU utilization impact.
Figure 13 DAG seeding (1 database) over the replication network: source host CPU impact Figure 14 DAG seeding (1 database) over the replication network: target host CPU impact In order to further understand the load on local host resources, we analyzed the disk access to both the source and target volumes, focusing on disk read access on the source and disk writes access on the target. Figure 15 and Figure 16 show the disk access patterns of the two volumes used by the mailbox database copy process.
when the Replication service was executing the network transfer and also reading from the source volume. Figure 15 DAG seeding (one database) over the replication network: source volume impact On the target we verified, as expected, that the Replication Service executed the vast majority of disk writes, while the Information Store had a reduced impact because the target database was dismounted at the time of the copy until the end of the database copy process.
We simulated a scenario where a DAG environment was running and was already protected by the Smart Copy snapshots on both nodes. In this context we had the advantage of starting the recovery process from the pre-existing snapshots on the node that owned the mailbox database passive copy.
Figure 17 shows the data flow for a DAG reseeding process using a Smart Copy on the iSCSI SAN. Figure 17 DAG seeding process data flow when reseeding from a preexisting Smart Copy snapshot We compared the time taken to restore the data redundancy of one database across the two Exchange DAG nodes in case of loss of the secondary copy (corruption, deletion, etc.).
Table 8 compares the results from the two scenarios described above.
10. Wait while all the transaction logs (generated between the point-in-time when the Smart Copy was taken and the copy was resumed) are transferred and replayed to the passive database Figure 18 shows the data flow for a DAG initial seeding process using the iSCSI SAN.
Table 9 shows a summary of the activity executed in each scenario.
When the Smart Copy snapshot was recovered or the cloned volume was moved to its final location, we resumed the Exchange Replication service activities with the command below: [PS]>Resume-MailboxDatabaseCopy -Identity DBn\ MBXn Finally, to quickly monitor the status of the mailbox database copies and of the Content Index (not reported in the Exchange Management console GUI), we used one of the commands below: [PS]>Get-MailboxDatabaseCopyStatus -Identity DBn\ MBXn Or [PS]>Get-MailboxDatabaseCopyStatus -Serve
8 Best practice recommendations We recommend the integration of Dell EqualLogic snapshots managed by Auto-Snapshot Manager as a complementary solution to Microsoft Exchange 2010 data protection. The benefits of this solution include reducing the over-provisioning of resources for backup activities on Microsoft Exchange production servers and curtailing the RTO allowed by an infrastructure correctly integrated with a traditional backup system.
Remember that ASM/ME is integrated with the application layer in Microsoft Exchange. Do not use direct SAN snapshots (for example, by using the PS Series group manager) to protect Exchange mailbox databases, because they will not be application-consistent. Protect the ASM/ME backup documents The default location for the ASM/ME backup documents is on the local disk of the host where ASM/ME is installed. Protect these documents by using a network or shared folder that is regularly backed up.
VSS timeout Under rare circumstances of extreme load, the VSS service can time out when a requester makes an inquiry to create a shadow copy. Follow Microsoft resolution advice to correct this problem. Evaluate an increase of the VSS timeout value in the registry configuration editor if required. General network best practices • • • • • • Use separate network infrastructures for the isolation of the LAN traffic from the SAN traffic (iSCSI).
General Exchange best practices • • • • • • • • • Use Basic disk type for all the EqualLogic volumes Use GUID partition table (GPT) for Exchange volumes Use default disk alignment provided by Windows 2008 or greater Use NTFS with 64KB allocation unit for Exchange databases and logs partitions Deploy physically separated disks for guest Windows Operating System and Exchange data Isolation of logs and database is not required when deployed in a DAG Do not use circular logging when planning for granular recov
Appendix A Test configuration details A.1 Hardware configuration Table 10 lists the details of the hardware components required for the configuration setup: Table 10 Configuration – Hardware components Test configuration – Hardware components: • • Servers • Network • • • Storage • Dell PowerEdge M1000e Blade enclosure Dell PowerEdge M710 Blade Server o 2x Quad Core Intel® Xeon® X5570 Processors, 2.
Table 11, Table 12, and Table 13 summarize the different networks required for the configuration setup and their usage: Table 11 Configuration – Switch modules Switch Module Placement Purpose PowerConnect M6220 #1 M1000e I/O Module A1 Regular IP traffic PowerConnect M6220 #2 M1000e I/O Module A2 Regular IP traffic PowerConnect M6348 #1 M1000e I/O Module B1 iSCSI data storage traffic PowerConnect M6348 #2 M1000e I/O Module B2 iSCSI data storage traffic PowerConnect 6248 #1 Rack iSCSI data s
Figure 19 presents the diagram of the network connections between the 4 blade servers and the storage arrays: Figure 19 Network connectivity diagram A.3 Host hypervisor and virtual machines configuration A virtual infrastructure built with VMware vSphere hosted all the components of both the messaging and test infrastructure. Some key elements of the virtual infrastructure configuration were: • • • • • • BP1013 VMware 4.
Table 14 lists the relation between the hypervisor host and each of the virtual machines, with a brief summary of the virtual resources allocated for each virtual machine: Table 14 Configuration – Host and guest allocation Host VM Purpose vCPUs Memory Network Adapters DC1 Active Directory Domain Controller 4 8GB E1000 VCENTER VMware vCenter Server 2 4GB 2x E1000 MBX1 Exchange Server Mailbox Role DAG node 1 4 48GB HUBCAS1 Exchange Server Client Access and HUB Transport roles 1 4 8GB
Table 15 reports the relationship between virtual switches, network adapters and VLANs for each hypervisor host.
Figure 21 Configuration – vSwitch1 A.3.
A.4 Software components The setup of all the servers (physical and virtual) required the deployment of the following software components: • • • • • • • • Bare-metal hypervisor: VMware 4.
For additional information about Exchange Server 2010 requirements, pre-requisites, and installation, refer to Microsoft documentation: Exchange 2010 System Requirements, available at: http://technet.microsoft.com/en-us/library/aa996719.aspx Exchange 2010 Prerequisites, available at: http://technet.microsoft.com/en-us/library/bb691354.aspx Install Exchange Server 2010, available at: http://technet.microsoft.com/en-us/library/bb124778.
Related Publications The following Dell publications are referenced in this document or are recommended sources for additional information. • Dell EqualLogic Configuration Guide http://www.delltechcenter.com/page/EqualLogic+Configuration+Guide • Auto-Snapshot manager for Microsoft User Guide https://support.equallogic.com/support/download_file.aspx?id=1053 • Dell PowerEdge Blade Server and Enclosure Documentation: http://support.dell.com/support/edocs/systems/pem/en/index.
THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND.