Reference Guide

8 Dell EMC Ready Solutions for AI Deep Learning with NVIDIA | v1.0
Table 1: PowerEdge R740xd configurations
Component Details
Server Model PowerEdge R740xd
Processor 2 x Intel Xeon Gold 6148 CPU @ 2.40GHz
Memory 24 x 16GB DDR4 2666MT/s DIMMs - 384GB
Disks 12 x 12TB NL SAS RAID 50 (Recommended 10+ drives)
I/O & Ports Network daughter card with
2 x 10GE + 2 x 1GE
Network Adapter 1x InfiniBand EDR adapter
Out of Band Management iDRAC9 Enterprise with Lifecycle Controller
Power Supplies Titanium 1100W, Platinum
Storage Controllers PowerEdge RAID Controller (PERC) H730p
2.1.1 Shared Storage via NFS over InfiniBand
The default shared storage system for the cluster is provided over NFS. It is built using 12x 12 TB NL SAS disks
that are local to the head node configured in RAID 50 with two parity check disks. This provides usable capacity
of 120TB (109TiB). RAID 50 was chosen because it has balanced performance and shorter rebuild time
compared to RAID 6 or RAID 60 (since RAID 50 has fewer parity disks than RAID 6 or RAID 60). This 120TB
volume is formatted as an XFS file system and exported to the compute nodes via NFS over IPoIB.
In the default configuration, both home directories and shared application and library install locations are hosted
on this NFS share. In addition to this, for solutions which require a larger capacity shared storage solution, the
Isilon F800 is as an alternative option and is described in Section 2.5. A comparison between various storage
subsystems is provided in Section 3.1.5, including this NL SAS NFS, the Isilon, and smaller test configurations
using SSDs and NVMe devices.
2.2 Compute Node Configuration
Deep Learning methods would not have gained success without the computational power to drive the iterative
training process. Therefore, a key component of Deep Learning solutions is highly capable nodes that can
support compute intensive workloads. The state-of-art neural network models in Deep Learning have more than
100 layers which require the computation to be able to scale across many compute nodes in order for any timely
results. The Dell EMC PowerEdge C4140, an accelerator-optimized, high density 1U rack server, is used as
the compute node unit in this solution. The PowerEdge C4140 can support four NVIDIA Volta SMX2 GPUs,
both the V100-SXM2 as well as the V100-PCIe models. Figure 3 shows the CPU-GPU and GPU-GPU
connection topology of a compute node.
The detailed configuration of each PowerEdge C4140 compute node is listed in Table 2.