White Papers

Ready Solutions Engineering Test Results
Copyright © 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC, and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be
the property of their respective owners. Published in the USA. Dell EMC believes the information in this document is accurate as of its publication date. The information is
subject to change without notice.
1
HPC Applications Performance on V100
Authors: Frank Han, Rengan Xu, Nishanth Dandapanthula.
HPC Innovation Lab. August 2017
Overview
This is one of two articles in our Tesla V100 blog series. In this blog, we present the initial benchmark results of NVIDIA® Tesla® Volta-
based V100™ GPUs on 4 different HPC benchmarks, as well as a comparative analysis against previous generation Tesla P100 GPUs.
We are releasing another V100 series blog, which discusses our V100 and deep learning applications. If you haven’t read it yet, it is
highly recommend to take a look here.
PowerEdge C4130 with V100 GPU support
The NVIDIA® Tesla® V100 accelerator is one of the most advanced accelerators available in the market right now and was launched
within one year of the P100 release. In fact, Dell EMC is the first in the industry to integrate Tesla V100 and bring it to market. As was the
case with the P100, V100 supports two form factors: V100-PCIe and the mezzanine version V100-SXM2. The Dell EMC PowerEdge
C4130 server supports both types of V100 and P100 GPU cards. Table 1 below notes the major enhancements in V100 over P100:
Table 1: The comparison between V100 and P100
PCIe
SXM2
P100
V100
Improvement
V100
Improvement
Architecture
Pascal
Volta
Volta
CUDA Cores
3584
5120
5120
GPU Max Clock rate (MHz)
1329
1380
1530
Memory Clock rate (MHz)
715
877
23%
877
23%
Tensor Cores
N/A
640
640
Tensor Cores/SM
N/A
8
8
Memory Bandwidth (GB/s)
732
900
23%
900
23%
Interconnect Bandwidth
Bi-Directional (GB/s)
32
32
300
Deep Learning (TFlops)
18.6
112
6x
125
6x
Single Precision (TFlops)
9.3
14
1.5x
15.7
1.5x
Double Precision (TFlops)
4.7
7
1.5x
7.8
1.5x
TDP (Watt)
250
300
V100 not only significantly improves performance and scalability as will be shown below, but also comes with new features. Below are
some highlighted features important for HPC Applications:
Second-Generation NVIDIA NVLink™

Summary of content (8 pages)