Dell HPC NFS Storage Solution – High Availability (NSS7.
Revisions Date Description May 2016 Initial release THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL ERRORS AND TECHNICAL INACCURACIES. THE CONTENT IS PROVIDED AS IS, WITHOUT EXPRESS OR IMPLIED WARRANTIES OF ANY KIND. Copyright © 2016 Dell Inc. All rights reserved. Dell and the Dell logo are trademarks of Dell Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies.
Contents Revisions ............................................................................................................................................................................................. 2 Executive summary .......................................................................................................................................................................... 4 1 Introduction ..............................................................................................
Executive summary This Dell technical white paper describes the Dell NFS Storage Solution—High Availability configuration (NSS7.0-HA). The paper compares all available NSS-HA offerings so far, and provides performance results for a configuration with a storage system providing 480 TB of raw storage capacity.
1 Introduction This white paper provides information on the latest Dell NFS Storage Solution - High Availability configurations with Dell PowerEdge 13th generation servers. The solution uses Dell PowerEdge servers and PowerVault storage arrays along with Red Hat High Availability software stack to provide an easy to manage, reliable, and cost effective storage solution for HPC clusters. It leverages the Dell PowerEdge 13th generation servers (R730s) and Red Hat Enterprise Linux 7.
2 Overview of NSS-HA solutions Along with the current version, four versions of NSS-HA solutions have been released since 2011. This section provides a brief description of the NSS-HA solution, and lists the current available Dell NSS-HA offerings. 2.1 A brief introduction to NSS-HA solutions The design of the NSS-HA solution for each version is similar.
Figure 1 The infrastructure of the NSS-HA solution Note: The iDRAC 8 Enterprise is not shown in the figure, and it is installed on each NFS server for Dell NSS-HA solutions. The term Network Power Switches refers to APC PDU (Power Distribution Unit) in Dell NSS-HA solutions. 7 Dell HPC NFS Storage Solution – High Availability (NSS7.
2.2 NSS-HA offerings from Dell Table 1 lists the available Dell NSS-HA solutions with standard configurations. NSS-HA Solutions(1), (2), (3), (4), (5) NSS6.0-HA Release (November 2014) “PowerEdge 13th generation server based solution” NSS7.0-HA Release (May 2016) “PowerEdge 13th generation server based solution” Storage Capacity 240 TB and 480 TB of raw storage capacity. Network Connectivity FDR InfiniBand or 10 GbE Connectivity.
3 Dell PowerVault MD3460 and MD3060e storage arrays Both PowerVault MD3460(7) and MD3060e(8) storage array are 4U, 60-drive dense enclosures. The major difference between them is that the MD3460 has dual active-active RAID controllers and is used as an RBOD (Redundant Array of Inexpensive Disks Bunch of Disks), while, the MD3060e is an EBOD (expansion bunch of disks), which is usually used to extend the capacity of the PowerVault MD3460. NSS7.0-HA shares the same storage configurations as NSS6.0-HA(5).
4 Evaluation The architecture proposed in this technical white paper was evaluated in the Dell HPC lab. This section describes the test methodology and the test bed used for verification. Also provides information about the functionality tests. Performance tests and results are described in Section 5 later in this white paper. 4.1 Method The NFS Storage Solution described in this solution guide was tested for HA functionality and performance. A 480 TB NSS7.
Public network (OPA) Clients Clients Private network R730 R730 1 MD3460 + 1 MD3060e PDU PDU NSS7.0-HA 480TB configuration Figure 2 Public network Private network Power Storage connections NSS7.0-HA test bed NSS7.0-HA hardware configuration Server configuration NFS server model Two Dell PowerEdge R730s Processor Dual Intel Xeon E5-2660 v4 @ 2.0 GHz, 14 cores per processor (The test bed used E5-2660 v3 @ 2.60 GHz.) Memory 8 × 16 GiB 2400 MT/s RDIMMs.
SAS HBAs (slot 1 and slot 6) Two 12 Gbps SAS HBAs Systems Management iDRAC8 Enterprise version Power Supply Dual Power Supply Units (PSUs) Storage configuration Storage Enclosure One Dell PowerVault MD3460 array and one MD3060e array for the 480TB solution RAID controllers Duplex RAID controllers in the Dell MD3460 Hard Disk Drives 60 numbers of 4 TB, 7.
NSS7.0-HA client cluster configuration Client / HPC Compute Cluster Clients 32 PowerEdge R630s Each compute node has: 4.3 CPU: Dual Intel Xeon E5-2697 v4 @ 2.3GHz, 18 cores per processor Memory: 16 x 8GiB 2400 MT/s Red Hat Enterprise Linux 7.1, kernel 3.10.0-229.el7.x86_64 HCA card Intel OPA HFI Switch Intel Omnipath Fabric Edge Switch HA functionality The HA functionality of the solution was tested by simulating several component failures.
Failure type 4.3.1.2 Mechanism to handle failure Power supply or power bus failure Dual PSUs in each server. Each PSU is connected to a separate power bus. Server continues functioning with a single PSU. Fence device failure iDRAC8 Enterprise used as primary fence device. Switched PDUs used as secondary fence devices. SAS cable/port failure Two SAS cards in each NFS server. Each card has a SAS cable to each controller in the shared storage.
ownership of the cluster service. Clients cannot access the data until the failover process is completed. Heartbeat link failure: Simulated by disconnecting the private network link on the active server. When the heartbeat link is removed from the active server, both servers detect the missing heartbeat and attempt to fence each other. The active server is unable to fence the passive server because the missing link prevents it from communicating over the private network.
During the failover period, when the data share is temporarily unavailable, the client processes were in an uninterruptible sleep state. Depending on the characteristics of the client processes, the processes can be expected to either stop abruptly or sleep while the NFS share is temporarily unavailable during the failover process. Any data that has already been written to the file system will be available after the failover is completed.
5 NSS7.0-HA I/O Performance This section presents the results of the I/O performance tests for the current NSS-HA solution. All performance tests were conducted in a failure-free scenario to measure the performance of the solution. The tests focused on two types of I/O patterns: large sequential read- and write operations, and small random read- and write operations. The 480 TB configuration was benchmarked with Intel OPA cluster network connectivity. The 32-node compute cluster described in section 4.
NFS OPA large sequential I/O performance 7000 Throughput in MB/s 6000 5000 4000 3000 2000 1000 0 1 2 4 8 16 32 Number of concurrent clients Write Figure 3 5.2 Read OPA large sequential write and read performance OPA random write and read operations Figure 4 shows the random write and read performance. The figure shows the aggregate I/O operations per second when a number of clients are simultaneously writing or reading to or from the storage over the Intel OPA fabric.
NFS OPA random I/O performance 20000 18000 16000 14000 IOPS 12000 10000 8000 6000 4000 2000 0 1 2 4 8 16 32 Number of concurrent clients Write Figure 4 5.3 Read OPA random write and read performance OPA metadata operations Figure 5, Figure 6, and Figure 7 show the results of file create, stat, and remove operations respectively. Because the test bed has 32 compute nodes, in the graphs here, each client executed one thread for client counts up to 32.
Number of stat () operations per second File stat 200000 180000 160000 140000 120000 100000 80000 60000 40000 20000 0 1 2 4 8 16 32 64 128 256 128 256 512 Number of concurrent clients File stat Figure 6 OPA file stat performance Number of remove () operations per second File remove 70000 60000 50000 40000 30000 20000 10000 0 1 2 4 8 16 32 Number of concurrent clients File remove Figure 7 20 OPA file remove performance Dell HPC NFS Storage Solution – High Availability (NSS7.
6 Summary This Dell technical white provides information about the latest Dell HPC NSS-HA Solution, including the solution configuration, HA functionality evaluation, and perfomrnace evaluation of the solution. With this version, the Dell NSS7.0-HA solution supports the Intel OPA network connection and delivers good sequential and random I/O performance. The Dell NSS-HA solution is available with deployment services and full hardware and software support from Dell.
7 References 1. Dell HPC NFS Storage Solution High Availability Configurations, Version 1.1 http://i.dell.com/sites/content/business/solutions/whitepapers/en/Documents/dell-hpc-nssha-sg.pdf 2. Dell HPC NFS Storage Solution — High availability with large capacities, Version 2.1 http://i.dell.com/sites/content/business/solutions/engineering-docs/en/Documents/hpc-nfs-storagesolution.pdf 3. Dell HPC NFS Storage Solution - High Availability Solution NSS5-HA configurations, http://en.community.dell.
Appendix A: Benchmarks and test tools 23 The IOzone benchmark tool was used to measure sequential read- and write throughput (MBps) and random read- and write I/O operations per second (IOPS). The checkstream utility was used to test for data correctness under failure and failover cases. The Linux dd utility was used for initial failover testing, to measure data throughput, and the time to complete file copy operations. Dell HPC NFS Storage Solution – High Availability (NSS7.
A.1. IOzone You can download IOzone from http://www.iozone.org/. Version 3.420 was used for these tests and installed on both the NFS servers and all the compute nodes. The IOzone tests were run from 1–32 nodes in clustered mode. All tests were N-to-N. Meaning, N clients would read or write N independent files. Between tests, the following procedure was followed to minimize cache effects: Unmount NFS share on clients. Stop the cluster service on the server.
IOzone Argument Description -I Use O_DIRECT, bypass client cache -O Give results in ops/sec For the sequential tests, file size was varied along with the number of clients such that the total amount of data written was 512 GiB (number of clients * file size per client = 512GiB). IOzone Sequential Writes # /usr/sbin/iozone -i 0 -c –e –w –r 1024k –s 16g –t 32 -+n -+m ./clientlist IOzone Sequential Reads # /usr/sbin/iozone -i 1 -c -e -w -r 1024k -s 16g -t 32 -+n -+m .
A.2. mdtest You can download mdtest from http://sourceforge.net/projects/mdtest/. Version 1.9.3 was used in these tests. It was compiled and installed on an NFS share that was accessible by compute nodes. mdtest is used with mpirun. For these tests, OpenMPI version 1.10.0 was used. The following table describes the mdtest command-line arguments.
As with the IOzone random access patterns, the following procedure was followed to minimize cache effects during the metadata testing: 1. 2. 3. 4. Unmount NFS share on clients. Stop the cluster service on the server. This unmounts the XFS file system on the server. Start the cluster service on the server. Mount NFS Share on clients. Metadata file and directory creation test: # mpirun -np 32 --nolocal --hostfile .
A.3. Checkstream The checkstream utility is available at http://sourceforge.net/projects/checkstream/. Version 1.0 was installed and compiled on the NFS servers and used for these tests. First, a large file was created using the genstream utility. This file was copied to and from the NFS share by each client by using dd to simulate write- and read operations. Failures were simulated during the file copy process and the NFS service was failed over from one server to another.
A.4. The dd Linux utility dd is a Linux utility provided by the coreutils rpm distributed with RHEL 7.2. It was used to copy a file. The NFS file system was mounted at /mnt/xfs on the clients. To write data to the storage, the following command line was used: # dd if=/dev/zero of=/mnt/xfs/file bs=1M count=90000 To read data from the storage, the following command line was used: # dd if=/mnt/xfs /file of=/dev/null bs=1M 29 Dell HPC NFS Storage Solution – High Availability (NSS7.