Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre A Dell Technical White Paper Quy Ta Dell HPC Engineering October 2015 | Version 1.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. © 2015 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Contents Figures ...................................................................................................................... iv Tables ....................................................................................................................... v 1. Introduction ........................................................................................................... 1 2. The Lustre File System ..............
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Figures Figure 1: Lustre based storage solution components ............................................................... 2 Figure 2: Dell Storage for HPC with Intel EE for Lustre Components Overview ................................ 4 Figure 3: Lustre DNE option ............................................................................................. 4 Figure 4: Dell PowerEdge R630 ...............................................
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Tables Table 1: Test Client Cluster Details .................................................................................. 11 Table 2: Dell Storage for HPC with Intel EE for Lustre Configuration .......................................... 13 Table 3: Parameters used on MDtest ................................................................................
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware 1. Introduction In high performance computing (HPC), the efficient delivery of data to and from the compute nodes is critical and often complicated to execute. Researchers can generate and consume data in HPC systems at such speed that turns the storage components into a major bottleneck. Managing and monitoring such complex storage systems add to the burden on storage administrators and researchers.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware The object storage subsystem is comprised of one or more Object Storage Targets (OST) and one or more Object Storage Servers (OSS). The OSTs provides storage for file object data, while each OSS manages one or more OSTs. Typically, there are several active OSSs at any time. Lustre is able to deliver increased throughput by increasing the number of active OSSs (and associated OSTs).
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Metadata Target (MDT) – Stores the location of “stripes” of data, file names, time stamps, etc. Management Target (MGT) – Stores management data such as configuration and registry Metadata Storage Server (MDS) – Manages the MDT, providing Lustre clients access to files. Object Storage Target (OST) – Stores the data stripes or extents of the files on a file system.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Figure 2: Dell Storage for HPC with Intel EE for Lustre Components Overview A new feature in this release includes an option to utilize the Lustre Distributed Metadata. This feature referred to as DNE, will provide the ability to scale metadata performance and the capacity to further enhance the solution.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware The Dell Storage for HPC with Intel EE for Lustre solution utilizes the Dell PowerEdge R630 server platform as the Management Server, Object Storage Servers and Metadata Servers in the configuration. The solution supports Mellanox ConnectX-3 InfiniBand FDR (56 Gb/s) adapters, which takes advantages of the PCIe 3.0 supported by Dell’s 13th generation servers.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Figure 5: Metadata Server Pair FAILOVER (HA) ACTIVE 12Gb/s SAS 12Gb/s SAS 12Gb/s SAS 12Gb/s SAS MetaData Server #1 PASSIVE MetaData Server #2 MD3420 With the Lustre DNE configuration option, shown in Figure 6, the two Dell PowerEdge R630 servers will be configured in an active/active, highly available cluster. Each server will be directly attached to two (2) Dell PowerVault MD3420 storage arrays.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware 3.3 Object Storage Servers The Object Storage Servers, shown in Figure 7, are arranged in two-node high availability (HA) clusters providing active/active access to two Dell PowerVault MD3460 high-density storage arrays each with MD3060e expansions. Each PowerVault MD3460 array is fully populated with 60 – 4TB 3.5” NL SAS drives.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Figure 8: RAID6 Layout on MD3460 or MD3060e arrays 3.4 Scalability Providing the Object Storage Servers in active/active cluster configurations yields greater throughput and product reliability. This configuration provides high availability, decreasing maintenance requirements and consequently reducing potential downtime. The PowerEdge R630 server provides performance and density.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Figure 9: OSS Scalability Scaling the Dell Storage for HPC with Intel EE for Lustre can be achieved by adding additional OSS pairs with storage backend, demonstrated in Figure 9. This will increase both the total network throughput as well as the storage capacity at once. This allows for an increase in the volume of storage available while maintaining a consistent maximum network throughput. 3.5 Networking 3.5.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware (IML) GUI interface provides an option to configure either a single or multiple Lustre Network Identifiers (NID) on MDS and OSS servers to participate in the Lustre Network. For instance, you could configure your Infiniband interface (i.e. ib0) as well as your 10GbE Ethernet interface (i.e. eth0) on your OSS servers to both participate in the Lustre Network.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware 4. Performance Evaluation and Analysis The performance studies presented in this paper profile the capabilities of the Dell Storage for HPC with Intel EE for Lustre Solution 240-drive configuration. The configuration has 240 – 4TB disk drives (960TB raw space). The goal is to quantify the capabilities of the solution, points of peak performance and the most appropriate methods for scaling.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware same file. The overhead encountered comes from threads dealing with Lustre’s file locking and serialized writes. See Appendix A for examples of the commands used to run these benchmarks. Each set of tests was executed on a range of clients to test the scalability of the solution. The number of simultaneous physical clients involved in each test varied from a single client to 64 clients.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Table 2: Dell Storage for HPC with Intel EE for Lustre Configuration Configuration Size Lustre Server Version Intel EE for Lustre Version 960TB RAW 2.5.37.7 v2.3 OSS Nodes 2 x PowerEdge R630 Servers OSS Memory 256GB DDR4 2133MT/s OSS Processors 2 x Intel Xeon™ E5-2660V3 @ 2.60GHz 10 cores OSS Server BIOS 0.3.28 OSS Storage Array 2 x PowerVault MD3460, 2 x PowerVault MD3060e Drives in OSS Storage Arrays 240 3.5" 4 TB 7.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware As part of the performance characterizing of the solution, we explored performance impacts of utilizing the most recent LSI SAS drivers (version P8) available during time of testing in comparison to the native SAS drivers on RHEL/CentOS6.6. We also experimented with different driver and OS level tunings. In addition, we experimented with caching states on the storage arrays and noted the impact to overall performance of the solution.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware level out to 256. This is partially a result of increasing the number of OSTs utilized, as the number of threads is increased (up to the 24 OSTs in our system). To maintain the higher throughput for an even greater number of files, increasing the number of OSTs is likely to help.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware each Virtual Disk (OST). Figure 13 below illustrates the significant advantage to sequential write performance when write caching was disabled.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Figure 14: N-to-N Random reads and writes IOPS Iozone Random I/O — Dell Storage for HPC with Intel EE for Lustre Solution 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 1 12 24 48 72 96 120 144 Number of concurrent threads Read Write Figure 15 illustrates the effect on random write performance when write caching is disabled, as covered in earlier section.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware 4.3 Metadata Testing Metadata testing measures the time to complete certain file or directory operations that return attributes. MDtest is an MPI-coordinated benchmark that performs Create, Stat, and Remove operations on files or directories. This study used MDtest version 1.9.3. The MPI stack used for this study was Intel MPI version 5.0 Update 1.
Dell Storage for HPC with Intel Enterprise Edition 2.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Figure 17 illustrates the directory metadata results using MDtest. From this graph, directory create metadata operations is again the most expensive operation for most cases, starting with over 2K OPS at 1 thread, increasing to almost 32K OPS as threads increase. Directory operations are also affected by the number of top directories used (-b), but to a lesser degree than file operations.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Figure 17: MDtest File Create Operations Lustre DNE MDtest FIle Create Operations - Dell Storage for HPC with Intel EE Lustre Solution DNE 60000 50000 OPS 40000 30000 20000 10000 1MDT Threads 1024 512 256 240 216 192 168 144 128 120 96 72 64 48 32 24 16 12 8 4 2 1 0 2MDT 5. Conclusions There is a well-known requirement for scalable, high-performance clustered file system solutions.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware The continued use of generally available, industry-standard benchmark tools like IOzone and MDtest provide an easy way to match current and expected growth with the performance outlined. The profiles reported from each of these tools provide sufficient information to align the configuration of the Dell Storage for HPC with Intel EE for Lustre Solution with the requirements of many applications or group of applications.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Appendix A: Benchmark Command Reference This section describes the commands used to benchmark the Dell Storage for HPC with Intel EE Lustre solution. IOzone IOzone Sequential Writes – iozone -i 0 -c -e -w -r 1024K -I -s $Size -t $Thread -+n -+m /root/list.$Thread IOzone Sequential Reads iozone -i 1 -c -e -w -r 1024K -I -s $Size -t $Thread -+n -+m /root/list.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware MDtest - Metadata Files Operations mpirun -np $Threads -rr --hostfile /share/mdt_clients/mdtlist.$Threads /share/mdtest/mdtest.intel -v -d /mnt/lustre/perf_test24-1M -i $Reps -b $Dirs -z 1 -L -I $Files -y -u -t -F Directories Operations mpirun -np $Threads -rr --hostfile /share/mdt_clients/mdtlist.$Threads /share/mdtest/mdtest.
Dell Storage for HPC with Intel Enterprise Edition 2.3 for Lustre sofware Intel HPDD Wiki https://wiki.hpdd.intel.com/display/PUB/HPDD+Wiki+Front+Page Mellanox Technologies Home Page http://www.mellanox.com LSI 12Gb/s SAS HBA http://www.lsi.com/downloads/Public/Host%20Bus%20Adapters/LSI_PB_SAS9300_HBA_Family.