White Papers

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

and Inception-v3 show performance within 8% of 8X SXM2. The only exception is AlexNet where

it shows quite a bit of difference between 8X SXM2 and PowerEdge C4140.

The good performance shown by PowerEdge C4140 in multi node mode, comparable to a single

node server 8x V100-16GB, was reached after the right software stack configuration with the

distributed framework Horovod over IB/GPUDirect-RDMA, see below Figure 32 the scaling

efficiency reached by PowerEdge C4140:

Figure 32: The Performance with Distributed Horovod TensorFlow, connected by Mellanox ConnectX-5

network adapter with 100Gbit/s over IPoIB, and GPUDirect RDMA