HP Fabric Clustering System for InfiniBand™ Interconnect Performance on HP-UX 11iv2

Confidential Page 5 1/28/2005
Point-to-point Latency (Send/Receive programming model) on rx2600
(2CPUs/1.5 GHz/4 GB RAM) and rx4640 (4 CPUs/1.5GHz/2GB RAM)
0
5
10
15
20
25
30
35
40
45
50
55
1 10 100 1000 10000 100000
Message Length (in bytes)
Latency (usec)
Point-to-point Latency (usec)
on rx2600
Point-to-point Latency (usec)
on rx4640
The latency values remain virtually unaffected when the results are obtained from two independent pairs of HCAs
plugged into rx4640 servers.
As the number of parallel latency streams that share the same HCA increases, latency suffers as the message size
increases. The standard deviation of the latencies for various streams from the mean latency depends on the number
of processors on the server. Given below is an illustration of the impacts of 3 parallel streams on mean latency and
standard deviation.
Parallel Latency Streams: Mean Latency and Standard Deviation (rx4640/3
CPUs/1.5 GHz/2 GB RAM) - Send/Receive programming model
0
20
40
60
80
100
120
140
1 10 100 1000 10000 100000
Message Length (bytes)
Latency (usec)
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Standard Deviation from mean
latency
Mean Latency of 3 parallel latency
streams
Single Stream Latency
Standard Deviation from the
mean latency
As the number of parallel latency streams, running on the same HCA on a server, start exceeding the number of
available processors on the server, process scheduling issues will have an adverse impact on the mean latency and
standard deviation values. Process scheduling issues also impact parallel latency streams running on different HCAs,
when the number of such streams start exceeding the number of processors on the server.