White Papers

Dell - Internal Use - Confidential
Application Performance on P100-PCIe GPUs
Authors: Rengan Xu, Frank Han and Nishanth Dandapanthu. Dell EMC HPC Innovation Lab. Feb 2017
Introduction to P100-PCIe GPU
This blog describes the performance analysis on NVIDIA® Tesla® P100 GPUs on a cluster of Dell
PowerEdge C4130 servers. There are two types of P100 GPUs: PCIe-based and SXM2-based. In PCIe-based
server, GPUs are connected by PCIe buses and one P100 delivers around 4.7 and 9.3 TeraFLOPS of double
and single precision performance, respectively. And in P100-SXM2, GPUs are connected by NVLink and
one P100 delivers around 5.3 and 10.6 TeraFLOPS of double and single precision performance,
respectively. This blog focuses on P100 for PCIe-based servers, i.e. P100-PCIe. We have already analyzed
the P100 performance for several deep learning frameworks in this blog. The objective of this blog is to
compare the performance of HPL, LAMMPS, NAMD, GROMACS, HOOMD-BLUE, Amber, ANSYS Mechanical
and RELION. The hardware configuration of the cluster is the same as in the deep learning blog. Briefly
speaking, we used a cluster of four C4130 nodes, each node has dual Intel Xeon E5-2690 v4 CPUs and four
NVIDIA P100-PCIe GPUs and all nodes are connected with EDR Infiniband. Table 1 shows the detailed
information about the hardware and software used in every compute node.
Table 1: Experiment Platform and Software Details
Platform
PowerEdge C4130 (configuration G)
Processor
2 x Intel Xeon CPU E5-2690 v4 @2.6GHz (Broadwell)
Memory
256GB DDR4 @ 2400MHz
Disk
9TB HDD
GPU
P100-PCIe with 16GB GPU memory
Nodes Interconnects
Mellanox ConnectX-4 VPI (EDR 100Gb/s Infiniband)
Infiniband Switch
Mellanox SB7890
Software and Firmware
Operating System
RHEL 7.2 x86_64
Linux Kernel Version
3.10.0-327.el7
BIOS
Version 2.3.3
CUDA version and driver
CUDA 8.0.44 (375.20)
OpenMPI compiler
Version 2.0.1

Summary of content (11 pages)