White Papers

Extreme GPU Computing

Page 13

4.4 Accelerating Molecular Dynamics with LAMMPS

Figure 12: LAMMPS performance and acceleration compared to CPU-only

In this section, we evaluate the performance of second common molecular dynamics code, LAMMPS.

LAMMPS stands for “Large-scale Atomic/Molecular Massively Parallel Simulator.” LAMMPS is used to

model solid-state materials and soft matter. The performance measure is in “Jobs/day” for LAMMPS,

and a higher score is better. The benchmark ran on LAMMPS LJ (Lennard-Jones liquid benchmark), it

has 8388608 atoms for 1000 steps.

Figure 12 compares the performance of LAMMPS on the five C4130 configurations mentioned in previous

sections. As a reference, we also compare to the CPU-only runtimes to quantify the acceleration

offered by various configurations on the K80 GPU boards. Configurations A and B are the two four K80

switched configurations, with the only difference being that B has an extra CPU. Since, at this time

LAMMPS just uses the GPU cores for actual compute intensive calculations, the extra CPU does not

increase the performance substantially. Configuration C is the balanced four-GPU non-switched

configuration. Configuration C performs better than A and B. This is partly due to the PCIe switch in

configurations A and B that introduces one extra hop during communications, increasing the latency

when compared to C.

Configurations D and E both have two K80s. Configuration D performs slightly better than E and this is

due to the balanced nature of D. As mentioned previously, LAMMPS cannot use the extra CPU in D.

An interesting observation here is that when moving from two K80s to four K80s (i.e. comparing D and

C configurations) the performance almost quadruples. This shows that for each extra K80 added (2

GPUs per K80) the performance doubles. This can be partially attributed to the size of the dataset

used.

Four K80 Boards

Two K80 Boards