Intel Ethernet 10 Gigabit LAMMPS Performance Study

Intel® Ethernet 10 Gigabit iWARP
LAMMPS Performance Study
Introduction
RDMA enables direct, zero-copy data
transfer between RDMA-capable server
adapters and application memory, remov-
ing the need in Ethernet networks for
data to be copied multiple times to operat-
ing system data buffers. The mechanism is
highly efcient and eliminates the associ-
ated processor-intensive context switch-
ing between kernel space and user space.
HPC applications can therefore reduce
latency and perform message transfer
very rapidly and consistently by directly
delivering data from application memory
to the network.
Both iWARP and InniBand use RDMA
and a common API for HPC applications,
however iWARP enables the use of RDMA
over the familiar Ethernet fabric. Because
iWARP runs over Ethernet TCP/IP, it
enables both application and management
trafc to operate over a single wire.
This paper reports on LAMMPs perfor-
mance testing performed by the Research
Computing and Cyberinfrastructure unit
of Information Technology services at
Penn State to identify how well iWARP
fabrics support workloads on widely used
high performance computing applications
compared to InniBand.
EXECUTIVE SUMMARY
The Intel® iWARP Ethernet Performance Series of test results intends to help close the
gap between real user requirements and the micro-benchmarks promoted by other
RDMA vendors. Each paper in the series demonstrates the real-world performance of
Intel iWARP on an industry standard application.
This paper reports on LAMMPs performance testing performed by the Research Com-
puting and Cyberinfrastructure unit of Information Technology services at Penn State.
iWARP Features
Unlike InniBand, iWARP is an extension
of conventional Internet Protocol (IP),
so standard IT management tools and
processes can also be used to manage
the trafc and resources associated with
iWARP, which implements the following
key performance features:
Kernel-Bypass: Enabling applications
to interface directly to the Ethernet
adapter removes the latency of the OS
and the expensive CPU context switches
between kernel-space and user-space.
Direct Data Placement: Writing the
data directly into user space eliminates
the need for wasteful, intermediate
buffer copies, thus reducing process-
ing latency and improving memory
bandwidth.
Transport Acceleration: The TCP/IP and
iWARP protocols are accelerated in silicon
vs. host software stacks, thereby free-
ing up valuable CPU cycles for application
compute processing.
Julie Cummings
Intel Corporation
iWARP Benefits
HPC applications can use iWARP technol-
ogy with NetEffect™ Ethernet Server
Cluster Adapters from Intel to provide a
high-performance, low-latency Ethernet-
based solution. By making Ethernet
networks suitable for these high-perfor-
mance clustering implementations, iWARP
provides a number of benets:
Fabric consolidation. With iWARP tech-
nology, LAN and RDMA traffic can pass
over a single wire. Moreover, applica-
tion and management traffic can be
converged, reducing requirements for
cables, ports, and switches.
IP-based management. Network ad-
ministrators can use standard IP tools
to manage traffic in an iWARP network,
taking advantage of existing skill sets
and processes to reduce overall cost and
complexity.
Native routing capabilities. Because
iWARP uses Ethernet and the standard IP
stack, it can use standard equipment and
be routed across IP subnets using exist-
ing network infrastructure.
Existing switches, appliances, and ca-
bling. The flexibility of using standard
TCP/IP Ethernet to carry iWARP traffic
means that no changes are required to
Ethernet-based network equipment.
WHITE PAPER
Internet Wide Area RDMA Protocol (iWARP)
NetEffect 10 gigabit Ethernet
Server Cluster Adapters
Technical and High-Performance Computing

Summary of content (3 pages)