White Papers

Extreme GPU Computing
Page 13
4.4 Accelerating Molecular Dynamics with LAMMPS
Figure 12: LAMMPS performance and acceleration compared to CPU-only
In this section, we evaluate the performance of second common molecular dynamics code, LAMMPS.
LAMMPS stands for “Large-scale Atomic/Molecular Massively Parallel Simulator.” LAMMPS is used to
model solid-state materials and soft matter. The performance measure is in “Jobs/day” for LAMMPS,
and a higher score is better. The benchmark ran on LAMMPS LJ (Lennard-Jones liquid benchmark), it
has 8388608 atoms for 1000 steps.
Figure 12 compares the performance of LAMMPS on the five C4130 configurations mentioned in previous
sections. As a reference, we also compare to the CPU-only runtimes to quantify the acceleration
offered by various configurations on the K80 GPU boards. Configurations A and B are the two four K80
switched configurations, with the only difference being that B has an extra CPU. Since, at this time
LAMMPS just uses the GPU cores for actual compute intensive calculations, the extra CPU does not
increase the performance substantially. Configuration C is the balanced four-GPU non-switched
configuration. Configuration C performs better than A and B. This is partly due to the PCIe switch in
configurations A and B that introduces one extra hop during communications, increasing the latency
when compared to C.
Configurations D and E both have two K80s. Configuration D performs slightly better than E and this is
due to the balanced nature of D. As mentioned previously, LAMMPS cannot use the extra CPU in D.
An interesting observation here is that when moving from two K80s to four K80s (i.e. comparing D and
C configurations) the performance almost quadruples. This shows that for each extra K80 added (2
GPUs per K80) the performance doubles. This can be partially attributed to the size of the dataset
used.
Four K80 Boards
Two K80 Boards