HP-UX TCP/IP Performance White Paper, March 2008

9
3 Advanced Out of the Box Scalability and Performance
Features
The HP-UX Networking Stack has been engineered for best scalability and performance for high end
servers. It can gracefully scale up from a few processors to 256 processors, and from 10 BaseT to 10
Gigabit Ethernet. Due to various configuration requirements for different type of workloads on high end
servers, HP-UX provides the following advanced performance features for a highly-scalable TCP/IP stack:
TOPS
NOSYNC
Protection from Packet Storm
Interrupt Binding
3.1 TOPS
Thread-Optimized Packet Scheduling (TOPS) increases the scalability and performance of TCP and UDP
socket applications sharing a high-bandwidth network interface on multiprocessor systems. The goal is to
move inbound packets processing to the same processor that runs the receiving user application.
IP networking stacks, such as the stack implemented on HP-UX, operate as multiplexers, which route packets
between network interface cards (NICs) and a set of user endpoints. HP-UX achieves excellent scalability
by scheduling multiple applications across a set of processors; and, for outbound data, applications scale
well when sharing a NIC. However, for inbound data, the configuration of each NIC determines which
processor it interrupts. For most NICs, a single processor is interrupted as packets come in from the
network. In the absence of TOPS, this processor will do the protocol processing for each incoming packet.
Since a single high-speed NIC can process incoming data for many connections, the processor interrupted
by this NIC can easily become a bottleneck. This prevents the maximum network throughput or packet rate
from being realized. In order to improve scalability in this case, the TOPS mechanism allows the driver to
quickly hand off packets to the processor where the application is most likely running, and return to
processing packets coming from the wire. In most cases, a single processor will then perform all memory
accesses to the application data inside each packet. This leads to a more efficient use of memory and
cache subsystems.
The TOPS mechanism is used by all TCP and UDP sockets without application modification or
recompilation.
In most cases, an additional benefit of requiring only a single processor to handle application data coming
in from the network is realized. This leads to a more efficient use of memory and cache subsystems.
3.1.1 Configuration Scenario for TOPS
TOPS is most beneficial for system configurations where the number of CPUs is much greater than the
number of NICs such as a 16-way system with one or two Gigabit cards. Inbound packet processing is
spread among the CPUs based on where the socket application processes are scheduled, leading to a
more even distribution of the processing load in MP-scalable and network-intensive applications.
3.1.2 socket_enable_tops Tunable
TOPS is enabled by default on HP-UX 11i, and requires no action on the part of an application to take
advantage of this feature. On the more recent patches of 11i v1 and 11i v2, the ndd tunable
socket_enable_tops is available to turn off or alter the behavior of TOPS. In 11i v3, the equivalent