White Papers

Dell PowerEdge C4130 Performance with
K80 GPUs - NAMD
Authors: Saeed Iqbal and Mayura Deshmukh
The field of Molecular Dynamics (MD) has seen a tremendous boost in simulation capacity since the introduction
of General Purpose GPUs. This trend is sustained by freely available feature-rich GPU-enabled molecular
dynamics simulators like NAMD. For more information on NAMD, please visit
http://www.ks.uiuc.edu/Research/namd/ and for more information on GPUs visit http://www.nvidia.com/tesla.
Things only get better for NAMD with the release of the new Tesla K80 GPUs from NVIDIA. K80 offers significant
improvements over the previous model the K40. From the HPC prospective the most important improvement is
the 1.87 TFLOPs (double precision) compute capacity, which is about 30% more than K40. The auto-boost feature
in K80 automatically provides additional performance if additional power head room is available. The internal
GPUs are based on the GK210 architecture and have a total of 4,992 cores which represent a 73% improvement
over K40. The K80 has a total memory of 24GBs which is divided equally between the two internal GPUs; this is a
100% more memory capacity compared to the K40. The memory bandwidth in K80 is improved to 480 GB/s. The
rated power consumption of a single K80 is a maximum of 300 watts.
Combining K80s with the latest high GPU density design from Dell, the PowerEdge C4130, results in an extra-
ordinarily powerful compute node. The C4130 can be configured with up to four K40 or K80 GPUs in a 1U form
factor. Also the uniqueness of PowerEdge C4130 is that it offers several workload specific configurations,
potentially making it a better fit, for MD codes in general , and specifically for NAMD.
The PowerEdge C4130 offers five configurations, noted here as A” through “E”. Part of the goal of this blog is to
find out which configuration is best suited for NAMD. The three quad GPU configurations “A”, “B” and “C” are
compared. Also the two dual GPU configurations “D” and “E” are compared for users interested in lower GPU
density of 2 GPU per 1 rack unit. The first two quad GPU configurations (“A” & “B”) have an internal PCIe switch
module which allows seamless peer to peer GPU communication. We also want to understand the impact of the
switch module on NAMD. Figure 1 below shows the block diagrams for configurations A to E

Summary of content (4 pages)