Parallel Programming Guide for HP-UX Systems

Troubleshooting
False cache line sharing
Chapter 9 175
a cache line boundary. Array entries that fall on cache line boundaries
are in shaded cells. Array entries that fall on cache line boundaries are
noted by hashmarks.
HP compilers, by default, give each thread about the same number of
iterations, assigning (if necessary) one extra iteration to some threads.
This happens until all iterations are assigned to a thread. Table 9-1
shows the default distribution of the I loop across 8 threads.
This distribution of iterations causes threads to share cache lines. For
example, thread 0 assigns the elements B(9:12,1), and thread 1 assigns
elements B(13:16,1) in the same cache line. In fact, every thread shares
cache lines with at least one other thread. Most share cache lines with
two other threads. This type of sharing is called false because it is a
result of the data layout and the compiler’s distribution of iterations. It is
not inherent in the algorithm itself. Therefore, it is reduced or even
removed by:
1. Restructuring the data layout by aligning data on cache line
boundaries
2. Controlling the iteration distribution.
Table 9-1 Default distribution of the I loop
Thread ID Iteration range
Number
of iterations
0 1-12 12
1 13-25 13
2 26-37 12
3 38-50 13
4 51-62 12
5 63-75 13
6 76-87 12
7 88-100 13