White Papers

Extreme GPU Computing
Page 9
4.2 Accelerating high performance Linpack (HPL)
Figure 8: HPL performance, efficiency and acceleration compared to CPU-only
In this section, we evaluate the performance of C4130 with up to four K80 GPU boards on HPL. Given
the importance of HPL in comparing HPC computing systems, this section shows key performance
characterization data for the C4130. The performance achieved, HPL acceleration, HPL efficiency,
power consumption and performance per watt on various system configurations are measured.
Figure 8 shows the HPL performance characterization. Configurations A, B and C are four K80
configurations with performance from 6.5 to 7.3 TFLOPS. The difference from A to B is due to the
extra CPU in configurations B. Overall; the C configuration has the highest performance of 7.3 TFLOPS.
The difference from B to C is due to different GPU to CPU ratios; both have the same number of
compute resources. Configuration C is balanced with two GPUs per CPU while B has the all four GPU
attached to a single CPU. On the two GPU configurations, D is higher with 3.8 TFLOPS and E with 3.6
TFLOPS one less CPU in configuration E explains the difference.
Compared to a CPU-only performance, an acceleration of 9X is obtained by using four K80 and an
acceleration of 4.7X with two K80 boards. The HPL efficiency is significantly higher on K80 (low to
upper 80s) compared to previous generation of GPUs.
Four K80 Boards
Two K80 Boards