HP-UX TCP/IP Performance White Paper, March 2008

10
tunable will be provided in a future patch. This may be useful in cases described below where specific
conditions make the TOPS default less than optimal. Refer to Table 2 (at the end of Appendix B) for the
patch level information for the ndd tunable socket_enable_tops.
It should not be necessary to disable TOPS. However, there are cases where the scalability issue
addressed by TOPS does not exist. When there are multiple NICs on a system, it is possible that no NIC
interrupt will become a processing bottleneck even with TOPS disabled (socket_enable_tops = 0). In
these cases, there may be some efficiency gained by avoiding the overhead of TOPS, and allowing more of
the processing to be done in the NIC interrupt context before switching to the processor running the
application. In the most efficient, highest-performing case of the application and NIC being assigned to the
same processor, however, there is no need for TOPS to switch processors, and therefore the TOPS tunable
setting will have no effect on performance.
Another consideration for TOPS tuning is whether the NIC is configured for checksum offload (CKO) on
inbound data. If CKO is enabled, TOPS will provide less benefit for the memory cache, as there will not be
a need to read the payload data during the inbound TCP/UDP processing.
As an application is rescheduled over time between different processors, or in the cases where threads
executing on different processors may share a socket, TOPS may not operate optimally in determining
which processor to switch to in order to match where the system call will execute to receive the data. In
most cases, the default TOPS setting for11i v3 (socket_enable_tops = 2) will work best in following
the application to its current CPU. In cases where sockets are being opened and closed at a high rate, it
may be possible to gain some efficiency by fixing the processor assigned to each connection by TOPS
using the ndd setting socket_enable_tops = 1, which is the default for 11i v1 and 11i v2. However,
these cases may be rare, and can only be determined by experimentation, or by detailed measurement and
analysis of the performance of the HP-UX kernel. As a result, changing from the default setting to
socket_enable_tops = 2 on 11i v1 and 11i v2 will provide equal or better performance in the
majority of cases.
3.2 STREAMS NOSYNC Level Synchronization
Previously the STREAMS framework supported execution of only one instance of the put procedure at a time
for a given STREAMS queue. For multiple requests to the same queue, STREAMS synchronized the requests
depending on the synchronization level of a module. Synchronization ensured that only one request was
executed at a time. With high speed I/O, the synchronization limits imposed by STREAMS could easily
lead to a performance bottleneck.
The restriction imposed by these previous STREAMS synchronization methods has been removed by
providing a new synchronization method NOSYNC in 11i v3 and the latest patches for 11i v1 and 11i v2.
If a module uses NOSYNC level synchronization, the STREAMS framework can concurrently execute
multiple instances of its queue's put procedure and a single instance of the same queue's service procedure.
This requires the modules to protect any module-specific data that is shared between multiple instances of
put procedures, or between the put and service procedures.
3.2.1 IP NOSYNC synchronization
With NOSYNC level synchronization, the IP module can handle requests simultaneously when multiple
requests arrive on the same queue. This feature significantly improves network throughput, reaching near
link speed for high-speed network interfaces such as multi-port Gigabit cards in an Auto Port Aggregation
(APA) configuration or 10Gigabit cards.
To realize the performance gain from this feature, all modules (eg. DLPI, IPFilter) on the networking stack
between the IP layer and the LAN driver must have NOSYNC enabled. HP recommends that providers of