White Papers

27 Dell HPC System for Manufacturing—System Architecture and Application Performance
4.4.2 Implicit GPGPU Building Block
The Implicit GPGPU building block includes a NVIDIA Tesla K80 which contains two GPUs. Both GPUs
were used for the ANSYS Mechanical benchmarks. GPU acceleration is available with both DMP and SMP
solvers. Therefore, results for both solvers are reported. The performance results for the two solvers on an
Implicit GPGPU building block server are shown in Figure 17 and Figure 18. Each data point on the graphs
records the performance of the specific benchmark data set by using two GPUs plus the number of
processor cores marked on the horizontal axis. In general, the DMP solver offers equivalent or better
performance than the SMP solver. The DMP solver continues to scale up to 16 processor cores or more
for a few of the benchmarks, while the SMP solver does not scale as well with more than eight cores. For
most of the benchmark cases, using GPU acceleration provides a significant performance advantage when
compared against the Implicit BB results.
Note that there is a known issue with benchmark case V17sp-3 when using GPU acceleration with the
DMP solver. Because of this, the results for the DMP solver for this case are not reported.
Figure 17 ANSYS Mechanical DMP Performance—Implicit GPGPU BB
0
100
200
300
400
500
600
700
800
900
1,000
1 2 4 8 16 28
CoreSolvingRating(higherisbetter)
NumberofProcessorCores
ANSYSMechanicalDMPPerformance—ImplicitGPGPUBB
V17cg1 V17cg2 V17cg3 V17ln1 V17ln2
V17sp1 V17sp2 V17sp4 V17sp5