Intel iWARP Quantum ESPRESSO Performance Study

Intel® iWARP Quantum ESPRESSO
Performance Study
Introduction
RDMA enables direct, zero-copy data
transfer between RDMA-capable server
adapters and application memory,
removing the need in Ethernet networks
for data to be copied multiple times
to operating system data buffers.
The mechanism is highly efcient and
eliminates the associated processor-
intensive context switching between
kernel space and user space. HPC
applications can therefore reduce
latency and perform message transfer
very rapidly and consistently by directly
delivering data from application memory
to the network.
Both iWARP and InniBand use RDMA
and a common API for HPC applications,
however iWARP enables the use of RDMA
over the familiar Ethernet fabric. Because
iWARP runs over Ethernet TCP/IP, it
enables both application and management
trafc to operate over a single wire.
This paper reports on Quantum ESPRESSO
performance testing performed
by the Research Computing and
Cyberinfrastructure unit of Information
Technology services at Penn State
to identify how well iWARP fabrics
support workloads on widely used high
performance computing applications
compared to InniBand.
EXECUTIVE SUMMARY
The Intel® iWARP Ethernet Performance Series of test results intends to help close the
gap between real user requirements and the micro-benchmarks promoted by other
RDMA vendors. Each paper in the series demonstrates the real-world performance of
Intel iWARP on an industry standard application.
This paper reports on Quantum ESPRESSO performance testing performed by the
Research Computing and Cyberinfrastructure unit of Information Technology services at
Penn State.
iWARP Features
Unlike InniBand, iWARP is an extension
of conventional Internet Protocol (IP),
so standard IT management tools and
processes can also be used to manage
the trafc and resources associated with
iWARP, which implements the following
key performance features:
Kernel-Bypass: Enabling applications
to interface directly to the Ethernet
adapter removes the latency of the OS
and the expensive CPU context switches
between kernel-space and user-space.
Direct Data Placement: Writing the
data directly into user space eliminates
the need for wasteful, intermediate
buffer copies, thus reducing process-
ing latency and improving memory
bandwidth.
Transport Acceleration: The TCP/IP and
iWARP protocols are accelerated in silicon
vs. host software stacks, thereby freeing
up valuable CPU cycles for application
compute processing.
Julie Cummings
Intel Corporation
iWARP Benefits
HPC applications can use iWARP
technology with NetEffect™ Ethernet
Server Cluster Adapters from Intel
to provide a high-performance, low-
latency Ethernet-based solution. By
making Ethernet networks suitable
for these high-performance clustering
implementations, iWARP provides a
number of benets:
Fabric consolidation. With iWARP
technology, LAN and RDMA traffic
can pass over a single wire. Moreover,
application and management traffic can
be converged, reducing requirements for
cables, ports, and switches.
IP-based management. Network
administrators can use standard IP tools
to manage traffic in an iWARP network,
taking advantage of existing skill sets
and processes to reduce overall cost and
complexity.
Native routing capabilities. Because
iWARP uses Ethernet and the standard
IP stack, it can use standard equipment
and be routed across IP subnets using
existing network infrastructure.
Existing switches, appliances, and
cabling. The flexibility of using standard
TCP/IP Ethernet to carry iWARP traffic
means that no changes are required to
Ethernet-based network equipment.
WHITE PAPER
Internet Wide Area RDMA Protocol (iWARP)
NetEffect 10 Gbps Ethernet
Server Cluster Adapters
Technical and High-Performance Computing

Summary of content (3 pages)