White Papers

ManualsBrandsDell ManualsConverged InfrastructureServers Solution Resources

Deep Learning Performance: Scale-up vs Scale-out

Architectures & Technologies Dell EMC | Infrastructure Solutions Group

4.1.2 Long Test

The long tests were run to get throughput and the training time to reach certain accuracy

convergence. We used 90 epochs for training run. These tests were run using the maximum

number of GPUs supported by that server.

In the section below, we describe the setup used, and Table 1 gives an overall view on the test

configuration.

• Use Case – The benchmark tests are targeting image classification with convolutional

neural networks models (CNNs).

• Benchmark code – TensorFlow Benchmarks scripts

• Hardware Configuration – Each server is configured based on its maximum GPU

support.

• Servers - The servers tested are PowerEdge R740, PowerEdge C4130, PowerEdge C4140

and non-Dell EMC 8x NVLink GPU server.

• Frameworks – TensorFlow for single node, and TensorFlow with Horovod library for

distributed training.

• Performance – The performance metrics used for comparison across servers is

throughput (images per second) and training time to reach top-5 accuracy and top-1

accuracy.

• Training tests - We conducted two types of tests. 1- Short Tests: for each test, 10

warmup steps were done and then the next 100 steps were averaged. 2-Long Tests: to

get the training accuracy convergence, and elapsed training time.

• Dataset – ILSVRC2012

• Software stack configuration – The benchmarks were run under docker container

environment. See table 1 with details.

4.2 Throughput Testing

Workload application and model

Image classification with convolutional neural networks models

(CNNs)

Benchmarks code

TensorFlow Benchmarks scripts

Server

GPU

Servers – Single Node

▪ PowerEdge R740

▪ P40

▪ PowerEdge C4140

▪ V100-16GB-SXM2

▪ PowerEdge C4140

▪ V100-32GB-SXM2

▪ Non Dell EMC 8x NVLink server

▪ V100-16GB-SXM2

Servers – Multi Node

(2 nodes, 4GPUs each)

▪ PowerEdge C4140-K

▪ V100-16GB-SXM2

▪ PowerEdge C4140-K

▪ V100-32GB-SXM2

▪ PowerEdge C4140-M

▪ V100-16GB-SXM2

Frameworks

▪ TensorFlow for Single Mode

▪ TensorFlow with Horovod library for Distributed Mode

Performance Metrics

▪ Throughput images/second