White Papers
Deep Learning Performance: Scale-up vs Scale-out
Architectures & Technologies Dell EMC | Infrastructure Solutions Group
13
4.1.2 Long Test
The long tests were run to get throughput and the training time to reach certain accuracy
convergence. We used 90 epochs for training run. These tests were run using the maximum
number of GPUs supported by that server.
In the section below, we describe the setup used, and Table 1 gives an overall view on the test
configuration.
• Use Case – The benchmark tests are targeting image classification with convolutional
neural networks models (CNNs).
• Benchmark code – TensorFlow Benchmarks scripts
• Hardware Configuration – Each server is configured based on its maximum GPU
support.
• Servers - The servers tested are PowerEdge R740, PowerEdge C4130, PowerEdge C4140
and non-Dell EMC 8x NVLink GPU server.
• Frameworks – TensorFlow for single node, and TensorFlow with Horovod library for
distributed training.
• Performance – The performance metrics used for comparison across servers is
throughput (images per second) and training time to reach top-5 accuracy and top-1
accuracy.
• Training tests - We conducted two types of tests. 1- Short Tests: for each test, 10
warmup steps were done and then the next 100 steps were averaged. 2-Long Tests: to
get the training accuracy convergence, and elapsed training time.
• Dataset – ILSVRC2012
• Software stack configuration – The benchmarks were run under docker container
environment. See table 1 with details.
4.2 Throughput Testing
Workload application and model
Image classification with convolutional neural networks models
(CNNs)
Benchmarks code
TensorFlow Benchmarks scripts
Server
GPU
Servers – Single Node
▪ PowerEdge R740
▪ P40
▪ PowerEdge C4140
▪ V100-16GB-SXM2
▪ PowerEdge C4140
▪ V100-32GB-SXM2
▪ Non Dell EMC 8x NVLink server
▪ V100-16GB-SXM2
Servers – Multi Node
(2 nodes, 4GPUs each)
▪ PowerEdge C4140-K
▪ V100-16GB-SXM2
▪ PowerEdge C4140-K
▪ V100-32GB-SXM2
▪ PowerEdge C4140-M
▪ V100-16GB-SXM2
Frameworks
▪ TensorFlow for Single Mode
▪ TensorFlow with Horovod library for Distributed Mode
Performance Metrics
▪ Throughput images/second