White Papers

Extreme GPU Computing
Page 18
Figure 16: Comparison of HPL performance between C410X and C4130 with four GPU boards
Figure 16 shows the comparison on HPL with four GPU boards. The total peak performance goes from
2.2 TFLOP to 8.4 TFLOP an improvement of 3.8X. The actual achieved or sustained performance is
more complex. First to note is the HPL efficiency has gone from 39.9% to 87.8%. This is mainly due to
code enhancements and having internal GPUs. Due the higher efficiency, the effective increase in
sustained performance is about 8.4X. On the power side, even with higher rated power of 300W, the
actual power consumption is less. This is due to two main factors improved system architecture with
internal GPUs reduces power required by external chassis and various architectural improvements in
the GPUs that improve performance per watt. The net gain in performance per watt is about 10X.
Similarly, Figure 17 shows the performance improvement with two GPU boards. The peak performance
goes from 1.1 TFLOPS to 4.6 TFLOPS an increase of about 4X. The sustained performance improves by
5.8X this is lower than the previous four GPU case because the HPL efficiency of C6100+C410X is higher
with two GPUs (56.3%). The power consumption difference is larger than the four-board case, because
with the two board configuration, the C4130 uses less power about 62% of the previous C410X based
solution. Finally, the total gain in performance per watt is 9.3X for the two GPU case.
Figure 17: Comparison of HPL performance between C410X and C4130 with two GPU boards