White Papers

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies Dell EMC | Infrastructure Solutions Group
15
5 PowerEdge Server Details
5.1 PowerEdge C4140
The Dell EMC PowerEdge C4140, an accelerator-optimized, high density 1U rack server, is used
as the compute node unit in this solution. The PowerEdge C4140 can support four NVIDIA Volta
SMX2 GPUs, both the V100-SXM2 as well as the V100-PCIe models.
Dell EMC PowerEdge C4140 supporting NVIDIA Volta SXM2 in topology ‘Mwith a high bandwidth
host to GPU communication is one of the most advantageous topologies for deep learning. Most
of the competitive systems supporting either a 4-way or 8-way or 16-way NVIDIA Volta SXM use
PCIe bridges and this limits the total available bandwidth between CPU to GPU.
5.1.1 Why is C4140 Configuration-M better?
Configuration
Link Interface b/n
CPU- GPU complex
Total Bandwidth
Notes
K
X16 Gen3
32GB/s
Since there is a PCIe
switch between host
to GPU complex
G
X16 Gen3
32GB/s
Since there is a PCIe
switch between host
to GPU complex
M
4x16 Gen3
128GB/s
Each GPU has
individual x16 Gen3
to Host CPU
Table 3: Host-GPU Complex PCIe Bandwidth comparison
As shown in Table 3 the total available bandwidth between CPU GPU complex is much higher
than other configurations. This greatly benefits neural models in taking advantage of larger
capacity although lower bandwidth DDR memory to speed up learning.
Figure 8 shows the CPU-GPU and GPU-GPU connection topology for C4140-K, Figure 9 shows
topology for C4140-M and Figure 10 shows topology for C4140-B.