White Papers

Ready Solutions Engineering Test Results
14G with Skylake how much better for HPC?
Dell EMC HPC Innovation Lab, September 2017
This document describes the features of the latest Dell EMC servers and compares performance to previous generation systems for a
variety of HPC applications.
New servers and Skylake
Dell EMC recently announced the 14
th
generation PowerEdge server portfolio (14G) which supports the latest generation Intel Scalable
Processor Family (the micro-architecture that is code named “Skylake”). In addition to the latest processor and accelerator/GPU
support, 14G includes several other technology additions. These servers have enhanced systems management and security features
via iDRAC9, can support up to 24 NVMe drives per server, include support for NVDIMM and future 3D XPoint memory technologies,
allow options for direct contact liquid cooling within the server, etc.
This document is focused on the performance gains available with the latest generation Intel Skylake CPUs in Dell EMC 14G platforms.
The Skylake CPU (SKL) supports six DDR4 memory channels per socket with memory modules that can run at up to 2667 MT/s. SKL
provides up to 28 cores per socket with TDP up to 205 W per socket. Intel has introduced AVX512 instructions on SKL, and this
doubles the floating point capabilities of SKL over previous generation Xeons. The processor now supports 512 bit registers, and the
Platinum 8100 and Gold 6100 SKL CPU models have two fuse multiply add (FMA) units, each of which can execute 8 double precision
calculations per cycle. With two floating point operations per FMA instruction, Skylake can execute 32 FLOP/cycle, double the previous
generation Xeon which was 16 FLOP/cycle. Note that some models of the SKL CPU like the Gold 5100, Silver 4100 and Bronze 3100
CPUs have one FMA unit, giving 16 FLOP/cycle. As before, the CPU frequency will depend on whether the code has a high density of
AVX2 or AVX512 instructions. An application will run at faster CPU clock speeds when running non-AVX codes, and will run slowest
with a high density of AVX512 instructions.
Other changes in SKL include 48 PCIe lanes per socket vs. 40 lanes previously, and a new interconnect called UPI (Ultra Path
Interconnect) between the sockets, replacing the previous QPI interconnect. UPI can operate at up to 10.4 GT/s, faster than the 9.6
GT/s with QPI. There are other architectural changes like a larger L2 cache for the cores, a non-inclusive L3 cache, a new uncore
interconnect, distributed home agent, optimized turbo bins, per core P-states, etc. Architectural changes in the silicon lead to new
tuning options in the BIOS and one of these, Sub NUMA Clustering, is discussed in this blog.
We focus on measuring full system performance and compare 14G compute centric performance to the previous generation Dell EMC
platforms. Some of the performance improvements are due to faster memory, some due to AVX512, some due to additional cores and
some due to the combination of all the Intel micro-architecture enhancements. We show results for up to six generations of Intel
processors and four generations of Dell EMC servers. The storage system and I/O is not a significant portion in these tests.
The shorthand used in the graphs below is explained here.
o 11G WSM Dell EMC 11
th
generation servers with support for Intel Xeon 5600 series processors, micro-architecture
code named Westmere (WSM).
o 12G SB Dell EMC 12
th
generation servers with support for Intel Xeon 2600 series processors, micro-architecture code
named Sandy Bridge (SB).
o 12G IVB Dell EMC 12
th
generation servers with support for Intel Xeon 2600 v2 series processors, micro-architecture
code named Ivy Bridge (IVB).
o 13G HSW Dell EMC 13
th
generation servers with support for Intel Xeon 2600 v3 series processors, micro-architecture
code named Haswell (HSW).
o 13G BDW Dell EMC 13
th
generation servers with support for Intel Xeon 2600 v4 series processors, code named
Broadwell (BDW).
o 14G SKL Dell EMC 14
th
generation servers with support for Intel Xeon Scalable Processor Family, micro-architecture
code named Skylake (SKL).

Summary of content (8 pages)