White Papers

Extreme GPU Computing
Page 20
6. Conclusion
In conclusion, the C4130 meets the current challenges of a high-density, accelerator-enabled compute
node. Targeted specifically towards the HPC market, it offers world-class performance and unique
configurability options to fit extreme HPC requirements. The following are the main results:
The various configurations of the C4130 offer a range of CPU to GPU ratios
C4130 offers both balanced and switched connections between CPU and GPUs
The unique placement of GPU in the C4130 ensures optimal GPU performance
C4130 offers a wide range of GPU boards and coprocessors
On the K80s, the HPL performance is about 9X compared to CPUs resulting in 4.2 GFLOPS per
watt power efficiency
Using the K80s can substantially accelerate industry-standard molecular dynamic codes.
Observed acceleration is from 2X to 16X depending on the code and benchmark used. This
improvement is achieved in a power efficient manner, consuming only 2X to 4X more power.
Compared to the previous generation, the C410X-based GPU solution using M2070s, the C4130
with K80s offers a vastly improved solution. Comparing four GPU boards, the following are the
main enhancements:
o The performance on HPL is 5X to 8X better with reduced power consumption,
resulting in a 9X to 10X performance per watt improvement
o The performance on NAMD is 3X to 4X better with reduced power consumption,
resulting in 6X to 7X performance per watt improvement
o On a “GPU per U” basis the compute density is improved 2.5X to 3.5X. Previously
the C410X based solution required at least 5U to 7U for 16 GPUs (sixteen M2070); with
the latest C4130, users could have access to 16 GPUs (eight K80s) in only 2Us.