White Papers

Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies Dell EMC | Infrastructure Solutions Group
37
and Inception-v3 show performance within 8% of 8X SXM2. The only exception is AlexNet where
it shows quite a bit of difference between 8X SXM2 and PowerEdge C4140.
The good performance shown by PowerEdge C4140 in multi node mode, comparable to a single
node server 8x V100-16GB, was reached after the right software stack configuration with the
distributed framework Horovod over IB/GPUDirect-RDMA, see below Figure 32 the scaling
efficiency reached by PowerEdge C4140:
Figure 32: The Performance with Distributed Horovod TensorFlow, connected by Mellanox ConnectX-5
network adapter with 100Gbit/s over IPoIB, and GPUDirect RDMA