White Papers

Page 1 of 9
Available at http://dell.to/1w8xlKI
New 13G servers what’s new and how much better are they for HPC?
Garima Kochhar, September 2014
It’s been an exciting week –Intel Haswell processors for two-socket servers, DDR4 memory and new Dell
servers were just released. We’ve had a busy few months leading up to this announcement our team
had access to early server units for the HPC lab and we spent time kicking the tires, running benchmarks,
and measuring performance. This blog describes our study and initial results and is part one of a three
part series. The next blog will discuss the performance implications of some BIOS tuning options
available on the new servers, and a third blog will compare performance and energy efficiency across
different Haswell processor models.
Focusing on HPC applications, we ran two benchmarks and four applications on our server. Our interest
was in seeing how the server performed and specifically how it compared to the previous generations.
The server in question is part of Dell’s PowerEdge 13
th
generation (13G) server line-up. These servers
support DDR4 memory at up to 2133 MT/s and Intel’s latest Xeon® E5-2600 v3 Product Family
processors (based on the architecture code-named Haswell). Haswell (HSW) is a net new micro-
architecture when compared to the previous generation - Sandy Bridge/Ivy Bridge. HSW processors use
a 22nm process technology, so there’s no process-shrink this time around. Note the “v3” in the Intel
product name that is what distinguishes a processor as one based on Haswell micro-architecture.
You’ll recall that E5-2600 v2” processors are based on the Ivy Bridge micro-architecture and plain E5-
2600 series with no explicit version are Sandy Bridge based processors. Haswell based processors
require a new server/new motherboard and DDR4 memory. The platform we used is a standard dual-
socket rack server with two Haswell-EP based processors. Each socket has four memory channels and
can support up to 3 DIMMs per channel (DPC). For our study we used 1 DPC for a total of eight DDR4
DIMMs in the server.
From an HPC point of view, one of the most interesting aspects is the Intel® AVX2 technology that allows
the processor to execute 16 FLOP per cycle. The processor supports 256 bit registers, allows three-
operand non-destructive operations (i.e. A = B+C vs. A = A+B), and a Fuse-Multiply-Add (FMA)
instruction (A = A*B+C). The processor has two FMA units each of which can execute 4 double precision
calculations per cycle. With two floating point operations per FMA instructions, HSW can execute 16
FLOP/cycle. This value is double of what was possible with Sandy Bridge/Ivy Bridge (SB/IVB)! There are
many more instructions introduced with HSW and Intel® AVX2 and these are described in detail in this
Intel programming reference or on other blogs.
Double the FLOP/cycle - does this mean that HSW will have 2x the theoretical performance of an
equivalent IVB processor? Close but not quite - read on. In past generations, we’ve looked at the rated
base frequency of a processor and the available Turbo bins/max Turbo frequency. For example, the
Intel® Xeon® E5-2680 v2 has a base frequency of 2.8 GHz and a maximum of 300 MHz of turbo available
when all cores are active. HSW processors will consume more power when running the new Intel® AVX2
instructions than when running non-AVX instructions. And so, starting with Haswell product family there
will be two rated base frequencies provided. The first is the traditional base frequency which is the
frequency one could expect to run non-AVX workloads. The second frequency is the base frequency for
workloads that are running AVX code, the AVX base frequency. For example, the HSW Xeon® E5-2697

Summary of content (9 pages)