White Papers

Internal Use - Confidential
Copyright © 2019 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries
Recommendations
1. The training times and scaling behavior vary between different domains and models
o Using superior accelerators would be advantageous for the domains that require the most time
o In order to pick the appropriate server and number of GPU’s, it is useful to understand the models
and domains being used.
2. Increasing GPU’s scales performance at a near linear rate for Image Recognition and Object
Detection domains
o Servers with higher GPU counts will linearly reduce training time for these domains. Scaling to 4
GPUs using NVLink appears to be the sweet spot from an efficiency stand point.
3. Increasing GPU’s does not scale performance at a linear rate for Translation and Recommendation
domains
o Servers with higher GPU counts will not linearly reduce training times for these domains due to data
set or computation/communication ratios. However, using larger GPU counts is still useful to meet
time to solution as the training time is reduced across these models.
Conclusion
Optimizing a platform for ML/DL workloads goes far beyond scaling the accelerators; every variable must be
considered and there are a plethora of them. Fortunately, Dell EMC is committed to designing PowerEdge
servers with GPU counts that cater to specific ML/DL domains, thereby reducing these variables for a smooth
and simple customer experience. This tech note provided insight on how the accelerator model, accelerator
count, and domain type are influenced by unique PowerEdge server models, and more importantly how
customers can make the best decisions to perform their required ML/DL workloads at full throttle.
PowerEdge DfD Repository
For more technical learning
Follow Us
For PowerEdge news
Contact Us
For feedback and requests