White Papers

Dell - Internal Use - Confidential
System benchmark results on KNL STREAM and HPL.
By Garima Kochhar. December 2016. HPC Innovation Lab
The Intel Xeon Phi bootable processor (architecture codenamed “Knights Landing” – KNL) is ready for
prime time. The HPC Innovation Lab has had access to a few engineering test units, and this blog
presents the results of our initial benchmarking study. [We also published our results with Cryo-EM
workloads on these systems, and that study is available here.]
The KNL processor is from the Intel Xeon Phi product line but is a bootable processor, i.e., the system
does not need another processor in it to power on, just the KNL. Unlike the Xeon Phi coprocessors or the
NVIDIA K80 and P100 GPU cards that are housed in a system that has a Xeon processor as well, the KNL
is the only processor in the server. This necessitates a new server board design and the PowerEdge
C6320p is the Dell EMC platform that supports the KNL line of processors. A C6320p server includes
support for one KNL processor and six DDR4 memory DIMMs. The network choices include Mellanox
InfiniBand EDR, Intel Omni-Path, or choices of add-in 10GbE Ethernet adapters. The platform has the
other standard components you’d expect from the PowerEdge line including a 1GbE LOM, iDRAC and
systems management capabilities. Further information on C6320p is available here.
The KNL processor models include 16GB of on-package memory called MCDRAM. The MCDRAM can be
used in three modes memory mode, cache mode or hybrid mode. The 16GB of MCDRAM is visible to
the OS as addressable memory and must be addressed explicitly by the application when used in
memory mode. In cache mode, the MCDRAM is used as the last level cache of the processor. And in
hybrid mode, a portion of the MCDRAM is available as memory and the other portion is used as cache.
The default setting is cache mode as this is expected to benefit most applications. This setting is
configurable in the server BIOS.
The architecture of the KNL processor allows the processor cores + cache and home agent directory +
memory to be organized into different clustering modes. These modes are called all2all, quadrant and
hemisphere, Sub-NUMA Clustering-2 and Sub-NUMA Clustering 4. They are described in this Intel article.
The default setting in the Dell EMC BIOS is quadrant mode and can be changed in the Dell EMC BIOS. All
tests below are with the quadrant mode.
The configuration of the systems used in this study is described in Table 1.
Table 1 - Test configuration
Server
12 * Dell EMC PowerEdge C6320p
Processor
Intel Xeon Phi 7230. 64 cores @ 1.3 GHz, AVX base 1.1 GHz.
Memory
96 GB at 2400 MT/s [16 GB * 6 DIMMS]
Interconnect
Intel Omni-Path and Mellanox EDR
Software
Operating System
Red Hat Enterprise Linux 7.2
Compilers
Intel 2017, 17.0.0.098 Build 20160721

Summary of content (4 pages)