White Papers

Dell - Internal Use - Confidential
its workload is large enough to perform long time training and it is a benchmark dataset used by many
deep learning researchers.
Testing Methodology
This blog quantifies the performance of deep learning frameworks using NVIDIA’s P100-PCIe GPU and
Dell’s PowerEdge C4130 server architecture. Figure 1 shows the testing cluster. The cluster includes one
head node which is Dell’s PowerEdge R630 and four compute nodes which are Dell’s PowerEdge C4130.
All nodes are connected by an InfiniBand network and they share disk storage through NFS. Each compute
node has 2 CPUs and 4 P100-PCIe GPUs. All of the four compute nodes have the same configurations.
Table 1 shows the detailed information about the hardware configuration and software used in every
compute node.
Figure 1: Testing Cluster for Deep Learning
Table 1: Hardware Configuration and Software Details
Platform
PowerEdge C4130 (configuration G)
Processor
2 x Intel Xeon CPU E5-2690 v4 @2.6GHz (Broadwell)
Memory
256GB DDR4 @ 2400MHz
Disk
9TB HDD
GPU
P100-PCIe with 16GB GPU memory
Nodes Interconnects
Mellanox ConnectX-4 VPI (EDR 100Gb/s Infiniband)
Infiniband Switch
Mellanox SB7890
Software and Firmware
Operating System
RHEL 7.2 x86_64
Linux Kernel Version
3.10.0-327.el7
BIOS
Version 2.1.6
CUDA version and driver
CUDA 8.0 (361.77)
NCCL version
Version 1.2.3