Dell HPC Lustre Storage A Dell Technical White Paper Quy Ta Dell HPC Engineering Innovations Lab September 2016 | Version 1.
Dell HPC Lustre Storage solution with Intel Omni-Path THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. © 2016 Dell Inc. All rights reserved. Reproduction of this material in any manner whatsoever without the express written permission of Dell Inc. is strictly forbidden. For more information, contact Dell.
Dell HPC Lustre Storage solution with Intel Omni-Path Contents Figures ...................................................................................................................... iv Tables ....................................................................................................................... v 1. Introduction ........................................................................................................... 1 2. The Lustre File System ..................................
Dell HPC Lustre Storage solution with Intel Omni-Path Figures Figure 1: Lustre based storage solution components ............................................................... 2 Figure 2: Dell HPC Lustre Storage Solution Components Overview ............................................... 4 Figure 3: Dell PowerEdge R730 ......................................................................................... 5 Figure 4: Metadata Server Pair...................................................................
Dell HPC Lustre Storage solution with Intel Omni-Path Tables Table 1: Test Client Cluster Details .................................................................................. 12 Table 2: Dell HPC Lustre Storage solution configuration ......................................................... 13 Table 3: Parameters used on MDtest ................................................................................
Dell HPC Lustre Storage solution with Intel Omni-Path 1. Introduction In high performance computing (HPC), the efficient delivery of data to and from the compute nodes is critical and often complicated to execute. Researchers can generate and consume data in HPC systems at such speed that turns the storage components into a major bottleneck. Getting maximum performance for their applications require a scalable storage solution.
Dell HPC Lustre Storage solution with Intel Omni-Path pairs. Each additional OSS increases the existing networking throughput, while each additional OST increases the storage capacity. Figure 1 shows the relationship of the MDS, MDT, MGS, OSS and OST components of a typical Lustre configuration. Clients in the figure are the HPC cluster’s compute nodes.
Dell HPC Lustre Storage solution with Intel Omni-Path Metadata Storage Server (MDS) – Manages the MDT, providing Lustre clients access to files. Object Storage Target (OST) – Stores the data stripes or extents of the files on a file system. Object Storage Server (OSS) – Manages the OSTs, providing Lustre clients access to the data.
Dell HPC Lustre Storage solution with Intel Omni-Path Figure 2: Dell HPC Lustre Storage Solution Components Overview There are several new architectural changes in this release compared to previous release. The solution continues to use the Dell PowerEdge R630 server as the Intel Management Server, while the Object Storage Servers and Metadata Servers in the configuration will be based on the Dell PowerEdge R730.
Dell HPC Lustre Storage solution with Intel Omni-Path Figure 3: Dell PowerEdge R730 1 6 4 2 5 3 2 7 750W iDRAC 3.1 1 2 3 4 750W 1 Management Server The Intel Manager Server is a single server connected to the Metadata servers and Object Storage servers via an internal 1GbE network. The management server is responsible for user interaction, as well as systems health management and basic monitoring data collected and provided via an interactive web GUI console, the Intel Manager for Lustre.
Dell HPC Lustre Storage solution with Intel Omni-Path Figure 4: Metadata Server Pair 1 1 2 4 5 3 5 2 7 1 2 3 4 2 4 5 3 750W iDRAC 1 1 6 6 5 1 2 7 750W 750W 1 iDRAC 2 3 4 750W 1 MD3420 #1 SERVER SAS PCI SLOT SAS PORT MD3420 ARRAY ieel3-mds1 ieel3-mds1 ieel3-mds2 ieel3-mds2 Slot Slot Slot Slot Port Port Port Port MD3420 MD3420 MD3420 MD3420 1 5 1 5 0 0 0 0 MD3420 CONTROLLER #1 #1 #1 #1 Controller 0 Controller 1 Controller 0 Controller 1 MD3420 CONTROLLER PORT Port
Dell HPC Lustre Storage solution with Intel Omni-Path Figure 5: Metadata Server Pair with Lustre DNE option 1 1 2 ieel-mds1 4 5 3 5 2 7 1 2 3 4 2 4 5 3 750W iDRAC ieel-mds2 1 1 6 1 750W 1 iDRAC 2 3 4 750W 1 MD3420 #2 SERVER SAS PCI SLOT SAS PORT MD3420 ARRAY ieel3-mds1 ieel3-mds1 ieel3-mds1 ieel3-mds1 ieel3-mds2 ieel3-mds2 ieel3-mds2 ieel3-mds2 Slot Slot Slot Slot Slot Slot Slot Slot Port Port Port Port Port Port Port Port MD3420 MD3420 MD3420 MD3460 MD3420 MD3420 MD3460
Dell HPC Lustre Storage solution with Intel Omni-Path Figure 6: Object Storage Server Pair 1 2 3 1 2 ieel-oss1 4 5 6 5 6 1 2 7 3 750W iDRAC 1 2 3 4 2 1 2 ieel-oss2 4 5 6 5 750W 1 6 2 7 750W 1 iDRAC 2 3 4 1 MD3460 #1 MD3460 #2 MD3460 #3 MD3460 #4 SERVER SAS PCI SLOT SAS PORT MD3460 ARRAY ieel3-oss1 ieel3-oss1 ieel3-oss1 ieel3-oss1 ieel3-oss2 ieel3-oss2 ieel3-oss2 ieel3-oss2 ieel3-oss1 ieel3-oss1 ieel3-oss1 ieel3-oss1 ieel3-oss2 ieel3-oss2 ieel3-oss2 ieel3-oss2 Slot Slo
Dell HPC Lustre Storage solution with Intel Omni-Path Targets per enclosure. By using RAID 6, the solution provides higher reliability at a marginal cost on write performance (due to the extra set of parity data required by each RAID 6). Each OST provides about 29TB of formatted object storage space when populated with 4TB HDD. With the Dell HPC Lustre Storage solution, each MD3460 provides 6 OSTs. The OSTs are exposed to clients with LNet via Intel Omni-Path connections.
Dell HPC Lustre Storage solution with Intel Omni-Path Figure 8: OSS Scalability Management Target (MGT) Management Network Metadata Target (MDT) Metadata Servers Object Storage Targets (OSTs) Object Storage Targets (OSTs) Object Storage Servers Object Storage Servers Intel Manager for Lustre High Performance Data Network (Intel Omni-Path, InfiniBand, 40 or 10GbE) Compute clients Scaling the Dell Storage for HPC with Intel EE for Lustre can be achieved by adding additional OSS pairs with storage bac
Dell HPC Lustre Storage solution with Intel Omni-Path Path HFI interface using IPoIB (i.e. ifcfg-ib0) as well as your 10GbE Ethernet interface (i.e. eth0) on your OSS servers to both participate in the Lustre Network. In the Intel Omni-Path network, fast transfer speeds with low latency can be achieved. LNet leverages the Performance Scaled Messaging (PSM) protocol for rapid data and metadata transfer to and from MDTs and OSTs to the clients.
Dell HPC Lustre Storage solution with Intel Omni-Path 4. Performance Evaluation and Analysis The performance studies presented in this paper profile the capabilities of the Dell HPC Lustre Storage with Intel EE for Lustre software, in a 240-drive configuration. The configuration has 240 – 4TB disk drives (960TB raw space). The goal is to quantify the capabilities of the solution, points of peak performance and the most appropriate methods for scaling.
Dell HPC Lustre Storage solution with Intel Omni-Path number of threads above 64 were simulated by increasing the number of threads per client across all clients. For instance, for 128 threads, each of the 64 clients ran two threads. The test environment for the solution has a single MDS pair and a single OSS pair with a total of 960TB of raw disk space. The OSS pair contains two PowerEdge R730s, each with 256GB of memory, four 12Gbps SAS controllers and a single Intel Omni-Path HFI adapter.
Dell HPC Lustre Storage solution with Intel Omni-Path sync echo 3 > /proc/sys/vm/drop_caches In addition, to simulate a cold cache on the server, a “sync” was performed on all the active servers (OSS and MDS) before each test and the kernel was instructed to drop caches with the same commands used on the client. In measuring the performance of the Dell Storage for HPC with Intel EE for Lustre solution, all tests were performed with similar initial conditions.
Dell HPC Lustre Storage solution with Intel Omni-Path We found that single client performance were consistent at 1GB/s to 1.3GB/s for writes and reads respectively. The write and read performance rise sharply as we increase the number of process threads up to 24 where we see level out to 256 with occasional dips. This is partially a result of increasing the number of OSTs utilized, as the number of threads is increased (up to the 24 OSTs in our system).
Dell HPC Lustre Storage solution with Intel Omni-Path Figure 12: N-to-N Random reads and writes IOPS Iozone Random - Dell HPC Lustre Storage with Intel Omni-Path 50000 45000 40000 35000 30000 25000 20000 15000 10000 5000 0 4 8 16 24 32 48 64 72 96 128 256 Number of concurrent threads Read 4.3 Write Metadata Testing Metadata testing measures the time to complete certain file or directory operations that return attributes.
Dell HPC Lustre Storage solution with Intel Omni-Path Also during the preliminary metadata testing, we concluded that the number of files per directory significantly affects the results, even while keeping constant the total number of files created.
Dell HPC Lustre Storage solution with Intel Omni-Path Figure 13: File Metadata Operations MDtest Files Metadata - Dell HPC Lustre Storage with Intel Omni-Path 600000 500000 OPS 400000 300000 200000 100000 0 1 2 4 8 12 16 24 32 48 64 72 96 120 128 144 168 192 216 240 File Create Threads File Stat File Remove Figure 13 illustrates the file metadata results using MDtest.
Dell HPC Lustre Storage solution with Intel Omni-Path The max_pages_per_rpc parameter is a tunable that sets the maximum number of pages that will undergo I/O in a single RPC to that OST. [root@node057 ~]# lctl set_param osc.*.max_rpcs_in_flight=64 The max_rpcs_in_flight is a tunable that sets the maximum number of concurrent RPCs in flight to the OST. This parameter in majority of cases will help with small file IO patterns. [root@node057 ~]# lctl set_param llite.*.
Dell HPC Lustre Storage solution with Intel Omni-Path 5.3 Misc. tunings NOTE: Setup of the Intel Omni-Path HFI interconnect will utilize IPoIB and the ifcfg-ib0 configuration file. Consult the Dell HPC Lustre Solution Configuration Guide for details. Verify that the MTU setting in the IPoIB interface configuration file is set at 65520.
Dell HPC Lustre Storage solution with Intel Omni-Path IOzone Sequential Reads iozone -i 1 -c -e -w -r 1024K -I -s $Size -t $Thread -+n -+m /root/list.$Thread IOzone IOPS Random Reads / Writes – iozone -i 2 -w -c -O -I -r 4K -s $Size -t $Thread -+n -+m /root/list.
Dell HPC Lustre Storage solution with Intel Omni-Path Dell PowerVault MD3460 http://www.dell.com/support/home/us/en/04/product-support/product/powervaultmd3460/research Lustre Home Pages http://www.intel.com/content/www/us/en/software/intel-solutions-for-lustresoftware.html?cid=sem43700011015176072&intel_term=intel+lustre&gclid=COOMvL3w384CFQgOaQodbq sCeQ&gclsrc=aw.ds http://wiki.lustre.org/index.php/Main_Page Dell HPC Solutions Home Page http://www.dell.com/hpc Dell HPC Wiki http://www.HPCatDell.