Dell Storage for HPC with Intel Enterprise Edition for Lustre A Dell Technical White Paper Quy Ta Dell HPC Engineering November 2014 | Version 1.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. © 2014 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Contents Figures ...................................................................................................................... iv Tables ....................................................................................................................... v 1. Introduction ........................................................................................................... 1 2. The Lustre File System ..................
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figures Figure 1: Lustre based storage solution components ............................................................... 2 Figure 2: Dell Storage for HPC with Intel EE for Lustre Components Overview ................................ 3 Figure 3: Dell PowerEdge R630 ......................................................................................... 4 Figure 4: Metadata Server Pair.....................................................
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Tables Table 1: Test Client Cluster Details .................................................................................. 10 Table 2: Dell Storage for HPC with Intel EE for Lustre Configuration .......................................... 12 Table 3: IOR Shared File Size .......................................................................................... 15 Table 4: Parameters used on MDtest .....................................
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware 1. Introduction In High-Performance computing, the efficient delivery of data to and from the compute nodes is critical and usually implicates some complications. Researchers can generate and consume data in HPC systems at such speed that renders the storage components a bottleneck. Managing and monitoring such complex storage systems add to the burden on storage administrators and researchers.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware manages one or more OSTs. Typically, there are several active OSSs at any time. Lustre is able to deliver increased throughput by increasing the number of active OSSs (and associated OSTs). Each additional OSS increases the existing networking throughput, while each additional OST increases the storage capacity. Figure 1 shows the relationship of the MDS, MDT, MGS, OSS and OST components of a typical Lustre configuration.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Lustre Clients – Access the MDS to determine where files are located, then access the OSSs to read and write data Typically, Lustre deployments and configurations are considered very complex and time consuming tasks. Lustre installation and administration is normally done via a command line interface (CLI), requiring extensive knowledge of the file system operation, along with the auxiliary tools like LNet and the locking mechanisms.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware The Dell Storage for HPC with Intel EE for Lustre solution utilizes the Dell PowerEdge R630 server platform as the Management Server, Object Storage Servers and Metadata Servers in the configuration. The solution supports Mellanox ConnectX-3 InfiniBand FDR (56 Gb/s) adapters, which takes advantages of the PCIe 3.0 supported by Dell’s 13th generation servers. Alternatively, there is also support for 10 Gb/s Ethernet to connect to clients.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figure 4: Metadata Server Pair FAILOVER (HA) ACTIVE 12Gb/s SAS 12Gb/s SAS 12Gb/s SAS 12Gb/s SAS MetaData Server #1 PASSIVE MetaData Server #2 MD3420 3.3 Object Storage Servers The Object Storage Servers, shown in Figure 5, are arranged in two-node high availability (HA) clusters providing active/active access to two Dell PowerVault MD3460 high-density storage arrays each with MD3060e expansions.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figure 5: Object Storage Server Pair FAILOVER (HA) Object Storage Server #1 ACTIVE Object Storage Server #2 12Gb/s SAS 12Gb/s SAS / Gb 12 MD3460 ACTIVE 6Gb/s SAS 6Gb/s SAS sS AS MD3060e SA S 6Gb/s SAS 6Gb/s SAS MD3460 1 2 Gb /s 12Gb/s SAS 12Gb/s SAS 12 12G Gb b/ /s s S SA AS S MD3060e The Object Storage Servers are the building blocks of the solution.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figure 6: RAID6 Layout on MD3460 or MD3060e arrays 3.4 Scalability Providing the Object Storage Servers in active/active cluster configurations yields greater throughput and product reliability. This configuration provides high availability, decreasing maintenance requirements and consequently reducing potential downtime. The PowerEdge R630 server provides performance and density.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figure 7: OSS Scalability Scaling the Dell Storage for HPC with Intel EE for Lustre can be achieved by adding additional OSS pairs with storage backend, demonstrated in Figure 7. Thus will increase both the total network throughput and increase the storage capacity at once. This allows for an increase in the volume of storage available while maintaining a consistent maximum network throughput. 3.5 Networking 3.5.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware (IML) GUI interface provides an option to configure multiple Lustre Network Identifiers (NID) on MDS and OSS servers to participate in the Lustre Network. For instance, you could configure your Infiniband interface (i.e. ib0) as well as your 10GbE Ethernet interface (i.e. eth0) on your OSS servers to both participate in the Lustre Network. In the InfiniBand network, fast transfer speeds with low latency can be achieved.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figure 8: Intel Manager for Lustre (IML) interface 4. Performance Evaluation and Analysis The performance studies presented in this paper profile the capabilities of the Dell Storage for HPC with Intel EE for Lustre 240-drive configuration. The configuration has 240 – 4TB disk drives (960 TB raw space). The goal is to quantify the capabilities of the solution, points of peak performance and the most appropriate methods for scaling.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Lustre: OS: IB SOFTWARE: Lustre 2.5.23 + Mellanox OFED Client Red Hat Enterprise Linux 6.5 (2.6.32-431.el6.x86_64) Mellanox OFED 2.2-1 Performance analysis was focused on three key performance markers: Throughput, data sequentially transferred in GB/s. I/O Operations per second (IOPS). Metadata Operations per second (OP/s).
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Table 2: Dell Storage for HPC with Intel EE for Lustre Configuration Configuration Size Lustre Server Version Intel EE for Lustre Version OSS Nodes OSS Memory 960TB RAW 2.5.23 v2.1 2 x PowerEdge R630 Servers 256GiB DDR4 2133MT/s OSS Processors 2 x Intel Xeon™ E5-2660V3 @ 2.60GHz 10 cores OSS Server BIOS 0.3.28 OSS Storage Array 2 x PowerVault MD3460, 2 x PowerVault MD3060e Drives in OSS Storage Arrays 240 3.5" 4 TB 7.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware 4.1 N-to-N Sequential Reads / Writes The sequential testing was done with the IOzone testing tool version 3.429. The throughput results presented in Figure 9 are converted to MB/s. The file size selected for this testing was such that the aggregate sample size from all threads was consistently 2TB. That is, sequential reads and writes have an aggregate sample size of 2TB divided equally among the number of threads within that test.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware request size is used because it aligns with Lustre’s 4KB file system block size and is representative of small block accesses for a random workload. Performance is measured in I/O Operations per second (IOPS) Figure 10 shows that the random writes peak at little over 10K IOPS with 240 threads, while random reads peak at 65K IOPS with 192 threads.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware In order to reduce the cache effects from server and client memory, it was decided to use a file size that was twice the combined memory size of the OSSs and the clients’ memory, according to the following formula and rounding to whole values where necessary: File Size = 2 * (2 OSSs*256 GiB memory per OSS + Number of physical clients * 24 GiB memory per client).
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figure 11: N-to-1 IOR Read / Write 4.4 Metadata Testing Metadata testing measures the time to complete certain file or directory operations that return attributes. MDtest is an MPI-coordinated benchmark that performs Create, Stat, and Remove operations on files or directories. This study used MDtest version 1.9.3. The MPI stack used for this study was Intel MPI version 5.0 Update 1.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware example when testing with 64 threads creating 3125 files per directory in 5 directories per thread OR creating 625 files per directory in 25 directories per thread, both result in the creation of 1 million files, but the measured performance in IOPS is not the same. This is due to overhead of seeks performed on the OSTs when changing directories.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figure 12: File Metadata Operations Figure 12 illustrates the file metadata results using MDtest. From this graph, file create metadata operations start with less than 504 OPS at 1 thread and scale to almost 15K OPS with 240 concurrent threads. This may be due to the Lustre locks needed on the MDT, but also those on the OSTs, since using a stripe count of 24 had a significant decrease in performance.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Figure 13: Directory Metadata Operations 5. Conclusions There is a well-known requirement for scalable, high-performance clustered file system solutions. The Dell Storage for HPC with Intel EE for Lustre addresses this need with a well-designed solution that is easy to manage and fully–supported.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Appendix A: Benchmark Command Reference This section describes the commands used to benchmark the Dell Storage for HPC with Intel EE Lustre solution. IOzone IOzone Sequential Writes – iozone -i 0 -c -e -w -r 1024K -I -s $Size -t $Thread -+n -+m /root/list.$Thread IOzone Sequential Reads iozone -i 1 -c -e -w -r 1024K -I -s $Size -t $Thread -+n -+m /root/list.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware MDtest - Metadata Files Operations mpirun -np $Threads -rr --hostfile /share/mdt_clients/mdtlist.$Threads /share/mdtest/mdtest.intel -v -d /mnt/lustre/perf_test24-1M -i $Reps -b $Dirs -z 1 -L -I $Files -y -u -t -F Directories Operations mpirun -np $Threads -rr --hostfile /share/mdt_clients/mdtlist.$Threads /share/mdtest/mdtest.
Dell Storage for HPC with Intel Enterprise Edition for Lustre sofware Intel HPDD Wiki https://wiki.hpdd.intel.com/display/PUB/HPDD+Wiki+Front+Page Mellanox Technologies Home Page http://www.mellanox.com LSI 12Gb/s SAS HBA http://www.lsi.com/downloads/Public/Host%20Bus%20Adapters/LSI_PB_SAS9300_HBA_Family.