White Papers

Page 1 of 13
Available at http://dell.to/XVCU0c
BIOS tuning for HPC on 13
th
Generation Haswell servers
Garima Kochhar, September 2014
This blog discusses the performance and energy efficiency implications of BIOS tuning options available
on the new Haswell-based servers for HPC workloads. Specifically we looked at memory snoop modes,
performance profiles and Intel’s Hyper-Threading technology and their impact on HPC applications. This
blog is part two of a three part series. Blog one provided some initial results on HPC applications and
performance comparisons on these new servers and previous generations. The third blog in this series
will compare performance and energy efficiency across different Haswell processor models.
We’re familiar with performance profiles including power management, Turbo Boost and C-states.
Hyper-Threading or Logical Processor is a known feature as well. The new servers introduce three
different memory snoop modes Early Snoop, Home Snoop and Cluster On Die. Our interest was in
quantifying the performance and power consumed across these different BIOS options.
The “System Profile Settings” category in the BIOS combines several performance and power related
options into a “meta” option. Turbo Boost, C-states, C1E, CPU Power Management, Memory Frequency,
Memory Patrol Scrub, Memory Refresh Rate, Uncore Frequency are some of the sub-options that are
pre-set by this “meta” option. There are four pre-configured profiles, Performance Per Watt (DAPC),
Performance Per Watt (OS), Performance and Dense Configuration, that can be used. The DAPC and OS
profiles balance performance and energy efficiency options aiming for good performance while
controlling the power consumption. With DAPC, the Power Management is handled by the Dell iDRAC
and system level components. With the OS profile, the operating system controls the power
management. In Linux this would be the cpuspeed service and cpufreq governors. The Performance
profile optimizes for only performance most power management options are turned off here. The
Dense Configuration profile is aimed at dense memory configurations, memory patrol scrub is more
frequent and the memory refresh rate is higher and Turbo Boost is disabled. Additionally if the four pre-
set profiles do not meet the requirement, there is a fifth option “Custom” that allows each of the sub-
options to be tuned individually. In this study we focus only on the DAPC and Performance profiles. Past
studies have shown us that DAPC and OS perform similarly, and Dense Configuration performs lower for
HPC workloads.
The Logical Processor feature is based on Intel® Hyper-Threading (HT) technology. HT enabled systems
appear to the operating system as having twice as many processor cores as they actually do by ascribing
two “logical” cores to each physical core. HT can improve performance by assigning threads to each
logical core; logical cores execute their threads by sharing the physical cores’ resources.
Snoop Mode is a new category under Memory Setting. Coherence between sockets is maintained by
way of “snooping” the other sockets. There are two mechanisms for maintaining coherence between
sockets. Snoop broadcast (Snoopy) modes where the sockets are snooped for every memory transaction
and directory support where some information is maintained in memory that gives guidance on whether
there is a need to snoop.

Summary of content (13 pages)