Technologies Paper

Executive Summary
It’s not very often that a disruptive technology changes the way enterprises operate.
In the same way that the microprocessor, the personal computer, and virtualization
changed the computing landscape, Apache Hadoop* and the MapReduce framework
have forever changed the way that enterprises capture, store, and analyze information.
But while yesterday’s hardware technology helped lay the foundation for big data
analysis, today’s technologies let enterprises analyze more data faster than ever before.
Intel is leading the big data infrastructure charge with advancements in microprocessors,
storage, and networking. These advancements can help increase the scalability and
performance of large Apache Hadoop clusters. In internal tests that compared servers
equipped with the previous generation Intel® Xeon® processor E5 family and servers
equipped with the Intel® Xeon® processor E7 v2 family, the servers equipped with the
Intel Xeon processor E7 v2 family demonstrated performance gains of up to 3.5 times
across a spectrum of I/O- and CPU-intensive workloads.
1
This paper highlights technologies available from Intel that enterprises can use to scale
up Apache Hadoop clusters to handle the increasing volume, variety, and velocity of
data. Enterprises can reduce the complexity and total cost of ownership (TCO) of their
clusters by using fewer, more powerful servers, which can reduce operational costs up
to 37 percent overall over a four-year period.
2
Apache Hadoop* Overview
Apache Hadoop is a distributed data storage and data processing platform that
enterprises can use for storing and processing large amounts of semi-structured or
unstructured data. Built on Java*, Apache Hadoop has open-source roots and enjoys the
support of a large, active user and developer community. Apache Hadoop also benets
from the collaborative work of Java Virtual Machine (JVM) vendors and Intel engineers to
increase Java performance on the latest Intel platforms. These traits help make Apache
Hadoop a cost-effective, high-performance platform for enterprises to gather and
analyze data from such varied sources as point-of-sale systems, credit card transactions,
server log les, machine logs, and scientic sensors. Intel has worked with Java
Virtual Machine vendors for more than 10 years to optimize Java performance on Intel
hardware, as each new generation of Intel microarchitecture provides new features that
can increase software performance. All of this capability enables advanced analytics for
a range of tasks, from detecting credit card fraud to decoding the human genome.
An Apache Hadoop cluster can scale from a few servers to thousands. This exibility
makes Apache Hadoop an ideal platform across the data analysis spectrum, from
Accelerate Big Data Analysis
with Intel
®
Technologies
Tim Allen
Big Data Domain Expert
Intel Software & Services Group
Eric Kaczmarek
Big Data Performance Architect
Intel Software & Services Group
Frank Jensen
Performance Marketing Engineer
Intel Data Center Group Marketing
White Paper
Intel® Xeon® processor E7 v2
Big Data Analysis

Summary of content (7 pages)