White Papers

Extreme GPU Computing
Page 17
5. Performance improvement compared to previous
generation of PowerEdge C410X solutions
The compute power of GPU solutions has increased many times over in recent years. The latest GPU-
based PowerEdge C4310 solution offers a substantial performance improvement compared to the
previous PowerEdge C410X solution. In this section, we compare the relative performance on the C4130
solution to the C410X-based solution. This data will prove useful for users considering switching from
the previous external GPU-based C410X solution to the current internal GPU-based C4130 offering. The
number of GPUs is the constant in both cases. Table 1 below shows the configuration of two systems.
Table 1. Configurations of current and previous GPU solutions
Server
PowerEdge C4130
PowerEdge C410X/C6100
Processor
1 or 2 x Intel Xeon CPU E5-2690 v3
@ 2.6 GHz (12 core)
2x Intel X5650
@ 2.67 GHz (6 core)
Memory
64GB or 128GB @ 2133MHz
48GB @ 1333MHz
GPU Board
2 or 4 x NVIDIA Tesla K80
2 or 4 x NVIDIA Fermi M2070
Number of internal per
GPU Board
K80 has two internal GPUs
M2070 has one internal GPU
GPU Connection to host
Internal
External (via HIC)
GPU Memory
24 GB
6 GB
GPU power
300W
225W
Power supply
2 x 1,600W
2 x 1600W
Operating System
RHEL 6.5, (2.6.32-431.el6.x86_64)
RHEL 5.5, (2.6.18-194.e15)
BIOS options
System profile max performance
System profile max performance
Logical processor - disabled
Logical processor - disabled
CUDA Version and driver
CUDA 6.5 (340.46)
CUDA 4.0
BIOS firmware
1.1.0
1.54.92
HPL
NVIDIA pre-compiled HPL 2.1
NVIDIA pre-compiled HPL 1.1
NAMD
Version 2.9
Version 2.8b1
As show in the table above, there are several advances in the hardware and software components of
the solution. The processor core count has doubled from six to twelve. The system memory is now
128GB for two CPUs (64 GB for single CPU). The bulk of the improvement is in the raw compute power
of the GPUs. The M2070 is rated at 515 GFLOPS (double precision) and K80 has a rating of 1.87 to 2.91
TFLOPS (double precision), giving a 3.6X to 5.6X improvement over M2070. The GPU memory has
increased by four fold from 6GB per GPU to 24 GB per GPU. The system architecture also plays a role.
The previous solution had external GPUs. There have been numerous improvements in the application
code, with both HPL and NAMD going through major revisions. Given these specifications, we will
compare the relative performance of these solutions by combining all advances in hardware and
software components.