Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Xin Chen, Garima Kochhar and Mario Gallegos Dell HPC Engineering July 2012| Version 1.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers This document is for informational purposes only and may contain typographical errors and technical inaccuracies. The content is provided as is, without express or implied warranties of any kind. © 2012 Dell Inc. All rights reserved. Dell and its affiliates cannot be responsible for errors or omissions in typography or photography.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Contents Executive summary ..................................................................................................... 5 1. Introduction ....................................................................................................... 6 2. NSS-HA solution review .......................................................................................... 6 2.1.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Table 5. Server components in NSS-HA .......................................................................... 15 Table 6. NSS4-HA hardware configuration details ............................................................. 17 Table 7. NSS4-HA software configuration details .............................................................. 18 Table 8.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Executive summary This solution guide describes the Dell NFS Storage Solution High Availability configuration (NSS-HA) with Dell PowerEdge 12th generation servers.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers 1. Introduction This solution guide provides information on the latest Dell NFS Storage Solution High Availability configuration (NSS-HA) with Dell PowerEdge 12th generation servers. The solution uses the NFS file system along with the Red Hat Scalable File system (XFS) and Dell PowerVault storage to provide an easy to manage, reliable, and cost effective storage solution for HPC clusters.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Fence devices – Fence devices are required for fencing (rebooting) the failed or misbehaving cluster node in the HA cluster. In the NSS-HA solution, two types of fence devices are configured: Switched Power Distribution Units (PDU) and the Dell server management controller, the iDRAC. Figure 1.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers 2) Fencing – In the HA cluster, once a node notices that the other node has failed, it fences (reboot) the failed node using a fence device. This is to make sure that only one server accesses the data at any point to protect data integrity. In NSS-HA, a node can fence the other using the Dell iDRAC or an APC PDU.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers 56Gb fourteen data rate (FDR) as well as increased memory capacity and bandwidth. The storage subsystem hardware remains unchanged. Table 1 lists all the Dell NSS-HA solutions with standard configurations. In addition to the standard configurations in Table 1, a special NSS-HA configuration, XL configuration is also available.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers 2.3. General comparisons among NSS-HA solutions Table 2 gives general comparisons among the three releases of NSS-HA solutions, focusing on four aspects: storage capacity, performance, configuration, and HA functionalities.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers 3. Dell PowerEdge 12th generation servers in NSS4-HA As compared to the NSS2-HA solution releases, the biggest change in the NSS4-HA release is that the new Dell PowerEdge R620 12th generation server is deployed as an NFS server, while the Dell PowerEdge R710 11th generation server was used as the NFS server in the previous releases.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers The Dell PowerEdge R620 can support an onboard 10Gigabit Ethernet network daughter card for clusters that require 10GbE connectivity, which frees up a PCI-E slot in the NFS server. Table 3 gives a detailed comparison between the Dell PowerEdge R620 and the Dell PowerEdge R710 used in NSS-HA solutions. Table 3. Dell PowerEdge R620 vs.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers PCI-E slots usage in the Dell PowerEdge R620: option 1 vs. option 2 Figure 3.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers between the current and previous NSS-HA releases to identify their major similarities and differences using the following two tables: Table 4 provides the information about storage subsystem. Table 5 lists the major similarities and differences in the NFS servers. Table 4.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Table 5.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers 4. Evaluation The architecture proposed in this white paper was evaluated in the Dell HPC lab. This section describes the test methodology and the test bed used for verification. It also contains details on the functionality tests. 4.1. Method The NFS Storage Solution described in this solution guide was tested for HA functionality and performance.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Figure 4. NSS4-HA test bed Public network (IB or 10GbE) Clients Clients Private network R620 R620 1 MD3200 + 7 MD1200s PDU PDU NSS4-HA 288TB configuration Table 6. Public network Private network Power Storage connections NSS4-HA hardware configuration details Server configuration NFS server model Two Dell PowerEdge R620 Processor Dual Intel Xeon E5-2680 @ 2.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Storage configuration Storage Enclosure One Dell PowerVault MD3200 array. Seven Dell MD1200 expansion arrays for the 288TB solution.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Firmware and Drivers 10 Gigabit Ethernet driver ixgbe 3.6.7-NAPI PERC H710P firmware 21.0.2-0001 PERC H710P driver megaraid_sas 00.00.05.34-rc1 6Gbps SAS firmware 07.03.05.00 6Gbps SAS driver mpt2sas 08.101.00.00 Table 9. NSS4-HA client configuration details Client / HPC Compute Cluster 4.3. Clients 64 PowerEdge R410 compute nodes Red Hat Enterprise Linux 6.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers In the real world, there are many different types of failures and faults that can impact the functionality of NSS-HA. Table 10 lists the potential failures that are tolerated in NSS-HA solutions. Note: The analysis below assumes that the HA cluster service is running on the active server; the passive server is the other component of the cluster. Table 10.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Private switch failure Fence device failure One SAS link failure Multiple SAS link failures The NSS-HA behaviors are outlined below in response to these failures. Server response to a failure The server response to a failure event within the HA cluster was recorded. Time to recover from a failure was used as a performance metric.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers version (4). In a healthy cluster, any failure event should be noted by the Red Hat cluster management daemon and acted upon within minutes. Note that this is the failover time on the NFS servers; the impact to the clients could be longer. Multiple SAS link failures - simulated by disconnecting all SAS links between one Dell PowerEdge R620 server and the Dell PowerVault MD3200 storage.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers mdtest benchmark and include file create, stat and remove operations. Refer to Appendix A for the complete command lines used in the tests. 5.1. IPoIB sequential writes and reads This section compares the random write and read performance between the current NSS4-HA release with a Dell PowerEdge R620 server and the previous NSS3-HA release(3) with a Dell PowerEdge R710 server.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Figure 6. IPoIB large sequential read performance: NSS4-HA vs. NSS3-HA InfiniBand large sequential read performance 4500 Throughput in MB/s 4000 3500 3000 2500 NSS4-HA NSS3-HA 2000 1500 1000 500 0 1 2 4 8 16 32 48 64 Number of concurrent clients 5.2.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers IPoIB random write performance: NSS4-HA vs. NSS3-HA Figure 7. Random writes 4500.00 4000.00 3500.00 IOPS 3000.00 2500.00 NSS4-HA NSS3-HA 2000.00 1500.00 1000.00 500.00 0.00 1 2 4 8 16 32 48 64 Number of concurrent clients Figure 8. IPoIB random read performance: NSS4-HA vs. NSS3-HA Random reads 20000.00 18000.00 16000.00 14000.00 IOPS 12000.00 NSS4-HA NSS3-HA 10000.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Dell PowerEdge R620 is still better than with the Dell PowerEdge R710. The improvement on average is more than 20 percent. Figure 9, Figure 10, and Figure 11 show the results of file create, stat and remove operations, respectively. As the HPC compute cluster has 64 compute nodes, in the graphs below each client executed a maximum of one thread for client counts up to 64.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers IPoIB file remove performance: NSS4-HA vs. NSS3-HA Figure 11. Number of remove () operations per second File remove 60000 50000 40000 30000 NSS4-HA 20000 NSS3-HA 10000 0 1 2 4 8 16 32 48 64 128 256 512 Number of concurrent clients 6. Conclusion This solution guide provides details the latest NSS-HA Solution for HPC from Dell.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Appendix A: Benchmarks and test tools The iozone benchmark was used to measure sequential read and write throughput (MB/sec) as well as random read and write I/O operations per second (IOPS). The mdtest benchmark was used to test metadata operation performance. The checkstream utility was used to test for data correctness under failure and failover cases.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers IOzone Argument Description -t Number of threads +m Location of clients to run IOzone on when in clustered mode -w Does not unlink (delete) temporary file -I Use O_DIRECT, bypass client cache -O Give results in ops/sec.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers A.2. mdtest You can download mdtest can be downloaded from http://sourceforge.net/projects/mdtest/. Version 1.8.3 was used in these tests. It was compiled and installed on a NFS share that was accessible by compute nodes. dtest is launched with mpirun. For these tests, MPICH2 version 1.3.2 was used. The following table describes the mdtest command-line arguments.
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers Start the cluster service on the server. Mount NFS Share on clients. Metadata file and directory creation test: # mpirun -np 32 --nolocal --hostfile ./hosts /nfs/share/mdtest -d /nfs/share/filedir -i 6 -b 320 -z 1 -L -I 3000 -y -u -t -C Metadata file and directory stat test: # mpirun -np 32 --nolocal --hostfile .
Dell HPC NFS Storage Solution High Availability (NSS-HA) Configurations with Dell PowerEdge 12th Generation Servers checkstream[compute-00-10]: (5.43342 err/sec) checkstream[compute-00-10]: GiB) checkstream[compute-00-10]: err/sec) checkstream[compute-00-10]: checkstream[compute-00-10]: seconds (344598 KiB/sec) checkstream[compute-00-10]: checkstream[compute-00-10]: [valid data] 1488 valid extents in 273.860652 seconds [valid data] 93898678272/96636764160 bytes (87 GiB/90 [zero data] 1487 errors in 273.