White Papers

Dell HPC Lustre Storage solution with Mellanox Infiniband EDR
4. Performance Evaluation and Analysis
The performance studies presented in this paper profile the capabilities of the Dell HPC Lustre Storage
with Mellanox Infiniband EDR in a 240-drive configuration. The configuration has 240 4TB disk drives
(960TB raw space). The goal is to quantify the capabilities of the solution, points of peak performance
and the most appropriate methods for scaling. The client test bed used to provide I/O workload to test
the solution consist of up to 32 nodes based on the Dell PowerEdge C6320 server platform. Specs and
configurations of the client systems are described in Table 1.
A number of performance studies were executed, stressing the configuration with different types of
workloads to determine the limitations of performance and define the sustainability of that
performance. Mellanox Infiniband EDR was used for these studies since its high speed and low latency
allows getting the maximum performance from the Dell HPC Lustre Storage solution, avoiding network
bottlenecks.
We generally try to maintain a “standard and consistent” testing environment and methodology. There
may be some areas where we purposely optimize server or storage configurations. We may have also
take measures to limit caching effects. The goal is to better illustrate the impact to performance. This
paper will detail the specifics of such configurations.
Table 1: Test Client Cluster Details
Component
Description
Compute Nodes:
Dell PowerEdge C6320, 32 nodes
Node BIOS:
1.1
Processors:
Two Intel Xeon™ E5-2660 v3 @ 2.6GHz
Memory:
128GB DDR4 2133MHz
Interconnect:
Infiniband EDR
Lustre:
Lustre 2.7.15.3
OS:
Red Hat Enterprise Linux 7.2 (3.10.0-327.el7.x86_64)
Performance analysis was focused on three key performance markers:
Throughput, data sequentially transferred in GB/s.
I/O Operations per second (IOPS).
Metadata Operations per second (OP/s).
The goal is a broad but accurate review of the capabilities of the Dell HPC Lustre Storage with Mellanox
Infiniband EDR. We selected two benchmarks to accomplish our goal: IOzone and MDtest.
We used N-to-N load to test, where every thread of the benchmark (N clients) writes to a different file
(N files) on the storage system. IOzone can be configured to use the N-to-N file-access method. For this
study, we use IOzone for N-to-N access method workloads. See Appendix A for examples of the
commands used to run these benchmarks.
Each set of tests was executed on a range of clients to test the scalability of the solution. The number
of simultaneous physical clients involved in each test varied from a single client to 32 clients. The
number of threads per node corresponds to the number of physical compute nodes, up to 32. The total
number of threads above 32 were simulated by increasing the number of threads per client across all
clients. For instance, for 128 threads, each of the 32 clients ran four threads.