HP Fabric Clustering System for InfiniBand™ Interconnect Performance on HP-UX 11iv2
Confidential Page 3 1/28/2005
itperf
itperf is a user-space program developed over the ICSC defined IT-API v1.0. itperf provides two types of tests and a
brief description of each of these types is listed below:
•
Latency
tests
itperf supports two kinds of latency tests:
Send/Receive programming model
RDMA programming model
The following gives a high-level flow of Send/Receive programming model:
o Transmit/receive buffers share the same physical memory.
o Posts the same buffer repeatedly for the whole test.
o Requests Send Work Request completion notification once every SQ size.
o Blocks for send and receive completions for 0 seconds.
In the RDMA programming model, itperf latency tests use RDMA writes where there will no be receive side
completion. The application polls on the data buffer instead.
•
Bandwidth
tests
o Transmit/receive buffers share the same physical memory.
o Maintains a window of 16 buffers
o Registers the buffers during the test setup.
o Requests Send Work Queue completion notification once every half the window size.
o Waits for send and receive completion notifications.
o Replenishes RQ on receipt of every message.
HP Integrity Servers
The following configurations of HP Integrity servers are used for conducting the performance tests.
rx2600:
• 2 CPUs (IPF 1.5 GHz)
• 4 GB RAM
• HP-UX 11iv2
rx4640:
• 4 CPUs (IPF 1.5 GHz)/ 3 CPUs (IPF 1.5 GHz)
• 2 GB RAM
• HP-UX 11iv2
Performance Considerations
PCI-X is the primary I/O interface for rx2600 and rx4640 HP Integrity servers. PCI-X slot 4 on the rx2600 and PCI-X
slots 7 & 8 on the rx4640 are dual rope slots.
NOTE: A rope is defined as a high-speed, point-to-point data bus.
To achieve the best performance using HP-UX Fabric Clustering System product, it is recommended that the AB286A -
2-port 4X Host Channel Adapter (HCA) is plugged into one of the available dual rope slots available on an HP
Integrity servers. Unless mentioned otherwise, all the results listed in this whitepaper refer to that configuration.
The HP-UX Fabric Clustering System software stack and the HCA provide better performance when large physical
pages are used for the buffers used in data transfers. The itperf program as used in obtaining the performance results
from bandwidth tests is set to request large physical pages from the operating system.
Use chatr +pd L <application name> to request the largest physical page available on the host at the time of running
the test.
NOTE: Refer to chatr(1M) man page for additional details.
Latency sensitive applications can use a RDMA programming model that eliminates receive side completion
processing. An application may also choose to use polling on the EVD for data instead of blocking for completions.