White Papers
Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies Dell EMC | Infrastructure Solutions Group
15
5 PowerEdge Server Details
5.1 PowerEdge C4140
The Dell EMC PowerEdge C4140, an accelerator-optimized, high density 1U rack server, is used
as the compute node unit in this solution. The PowerEdge C4140 can support four NVIDIA Volta
SMX2 GPUs, both the V100-SXM2 as well as the V100-PCIe models.
Dell EMC PowerEdge C4140 supporting NVIDIA Volta SXM2 in topology ‘M’ with a high bandwidth
host to GPU communication is one of the most advantageous topologies for deep learning. Most
of the competitive systems supporting either a 4-way or 8-way or 16-way NVIDIA Volta SXM use
PCIe bridges and this limits the total available bandwidth between CPU to GPU.
5.1.1 Why is C4140 Configuration-M better?
Configuration
Link Interface b/n
CPU- GPU complex
Total Bandwidth
Notes
K
X16 Gen3
32GB/s
Since there is a PCIe
switch between host
to GPU complex
G
X16 Gen3
32GB/s
Since there is a PCIe
switch between host
to GPU complex
M
4x16 Gen3
128GB/s
Each GPU has
individual x16 Gen3
to Host CPU
Table 3: Host-GPU Complex PCIe Bandwidth comparison
As shown in Table 3 the total available bandwidth between CPU – GPU complex is much higher
than other configurations. This greatly benefits neural models in taking advantage of larger
capacity although lower bandwidth DDR memory to speed up learning.
Figure 8 shows the CPU-GPU and GPU-GPU connection topology for C4140-K, Figure 9 shows
topology for C4140-M and Figure 10 shows topology for C4140-B.