White Papers

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

5 PowerEdge Server Details

5.1 PowerEdge C4140

The Dell EMC PowerEdge C4140, an accelerator-optimized, high density 1U rack server, is used

as the compute node unit in this solution. The PowerEdge C4140 can support four NVIDIA Volta

SMX2 GPUs, both the V100-SXM2 as well as the V100-PCIe models.

Dell EMC PowerEdge C4140 supporting NVIDIA Volta SXM2 in topology ‘M’ with a high bandwidth

host to GPU communication is one of the most advantageous topologies for deep learning. Most

of the competitive systems supporting either a 4-way or 8-way or 16-way NVIDIA Volta SXM use

PCIe bridges and this limits the total available bandwidth between CPU to GPU.

5.1.1 Why is C4140 Configuration-M better?

Configuration

Link Interface b/n

CPU- GPU complex

Total Bandwidth

Notes

X16 Gen3

32GB/s

Since there is a PCIe

switch between host

to GPU complex

X16 Gen3

32GB/s

Since there is a PCIe

switch between host

to GPU complex

4x16 Gen3

128GB/s

Each GPU has

individual x16 Gen3

to Host CPU

Table 3: Host-GPU Complex PCIe Bandwidth comparison

As shown in Table 3 the total available bandwidth between CPU – GPU complex is much higher

than other configurations. This greatly benefits neural models in taking advantage of larger

capacity although lower bandwidth DDR memory to speed up learning.

Figure 8 shows the CPU-GPU and GPU-GPU connection topology for C4140-K, Figure 9 shows

topology for C4140-M and Figure 10 shows topology for C4140-B.