Improving Network Performance in Multi-Core Systems

Improving Network Performance in Multi-Core Systems White Paper
3
Multiple Descriptor Queues
Multiple transmit and receive queues in the controllers allow net-
work traffic streams to be distributed into queues. These queues
can be associated with specific processor cores, allowing distribu-
tion of the workload and preventing data traffic processing from
overwhelming a single core. The packet queues can be accessed
by driver threads running on different processor cores, such that
multiple cores can process network packets in parallel.
Packets can be directed to individual queues in two ways:
RSS, which has a table that maps queues to processor cores
VMDq, which filters data into queues based on MAC address
or VLAN tags
The Intel 82575 Gigabit Ethernet Controller supports four
transmit and four receive queues per port. The Intel 82598
10 Gigabit Ethernet Controller provides 32 transmit queues
and 64 receive queues per port, which can be mapped to a
maximum of 16 processor cores. This enables considerable
load balancing on servers with several multi-core processors.
Receive-Side Scaling (RSS)
To determine which receive queue to use for incoming packets,
controllers residing on systems using Windows* Server 2003 or
Windows Vista* can use RSS. (On Linux* systems, this technol-
ogy is known as Scalable I/O). RSS directs packets to different
queues without the need for reordering. Incoming packets are
first segregated into flows. The specific flow for a given packet is
determined by the calculation of a hash value derived from fields
in the packet header. The resulting hash value serves as a look-up
mechanism in a table that indicates to which flow or queue the
packet should be directed. The hash values are also used to select
a specific processor to handle the packet flow, ensuring that the
packets are handled in order.
RSS is intelligent in its distribution of packet processing and is also
programmable. Hence, by its judicious use, controllers with multiple
queues can efficiently direct multiple TCP/IP streams to different
processor cores for handling. The Intel 82575 Gigabit Ethernet
Controller and the Intel 82598 10 Gigabit Ethernet Controller both
support RSS.
Virtual Machine Device Queues (VMDq)
The ability to direct streams to different cores is also an important
element in supporting virtualization. This design allows virtual
machines hosted by hypervisors that emulate network controllers
to rely on a dedicated network stream that is handled by a single
core. Hence, when multiple virtual machines (VMs) are in use, they
can share the controller ports while enjoying their own privately
processed packet stream, a solution that greatly improves virtual-
ized performance.
Intel’s VMDq technology provides multiple hardware queues
and offload features that can be used to reduce the software
overhead associated with sharing a single networking control-
ler between multiple virtual machines. Prior to the advent of
this technology, a network switch emulated in software by the
virtualization platform sorted and routed the packets individually
to the running VMs. This process introduced significant delays in
the network packet processing. With VMDq, individual hardware
queues are associated with the simulated network interfaces of
the running VMs, so the controller itself performs the routing of
received packets, thereby substantially lowering the overhead.
This technology is also used on outbound VM packets to provide
transmit fairness and to avoid a single VM blocking access to the
controller.
Extended Message-Signaled Interrupts (MSI-X)
The ability to communicate efficiently between queues and
particular processor cores is handled by MSI-X. MSI-X is the next
generation of MSI, which passes interrupts to a single processor
core. Conversely, MSI-X provides multiple interrupt vectors, which
allow multiple interrupts to be handled simultaneously and load-
balanced across multiple cores. This improvement helps improve
CPU utilization and lower latency.
The Intel 82575 Gigabit Ethernet Controller and the Intel 82598
10 Gigabit Ethernet Controller give each queue its own set of
MSI-X controllable interrupt vectors, which permits efficient
packet management and fine tuning of the processor load. With
an interrupt vector for each queue, the controller can handle
multiple interrupts simultaneously, preventing the bottlenecks
associated with funneling all interrupts through a single vector.
Figure 2 shows how these features work together to distribute
Ethernet traffic across CPU cores in a multi-core system.