White Papers

Ready Solutions Engineering Test Results
LAMMPS Four Node Comparative Performance
Analysis on Skylake Processors
Author: Joseph Stanfield
The purpose of this blog is to provide a comparative performance analysis of the Intel® Xeon® Gold 6150 processor (architecture code
named “Skylake”) and the previous generation Xeon® E5-2697 v4 processor using the LAMMPS benchmark. The Xeon® Gold 6150
CPU features 18 cores or 36 when utilizing hyper threading. Intel significantly increased the L2 cache per core from 256 KB on previous
generations of Xeon to 1 MB. The new processor also touts 24.75 MB of L3 cache and a six channel DDR4 memory interface.
LAMMPS, or Large Scale Atom/Molecular Massively Parallel Simulator, is an open-source molecular dynamics program originally
developed by Sandia National Laboratories, Temple University, and the United States Department of Energy. The main function of
LAMMPS is to model particles in a gaseous, liquid, or solid state.
Test cluster configuration:
Dell EMC PowerEdge
C6420
Dell EMC PowerEdge C6320
CPU
2x Xeon® Gold 6150 18c 2.7 GHz
(Skylake)
2x Xeon® E5-2697 v4 16c 2.3 GHz
(Broadwell)
RAM
12x 16GB @2666 MT/s 8x 16GB @2400 MT/s
1TB SATA 1 TB SATA
OS
RHEL 7.3 RHEL 7.3
InfiniBand
EDR ConnectX-4 EDR ConnectX-4
BIOS Options
Settings
System Profile
Performance Optimized
Logical Processor
Disabled
Virtualization Technology
Disabled
The LAMMPS version used for testing release was lammps-6June-17. The in.eam dataset was used for the analysis on both
configurations. In.eam is a dataset that simulates a metallic solid, Cu EAM potential with 4.95 Angstrom cutoff (45 neighbors per atom),
NVE integration. The simulation was executed using 100 steps with 32,000 atoms.
The first series of benchmarks conducted were to measure performance in units of timesteps/s. The test environment consisted of four
servers interconnected with InfiniBand EDR, and tests were run on a single node, two nodes, and four nodes with LAMMPS, three times
for each configuration. Average results from a single node showed 106 time steps per second while a two node result nearly doubled
performance with 216 time steps per second. This trend remained consistent as the environment was scaled to four nodes as seen in
Figure 1.

Summary of content (3 pages)