Parallel Programming Guide for HP-UX Systems

Troubleshooting
False cache line sharing
Chapter 9174
False cache line sharing
False cache line sharing is a form of cache thrashing. It occurs whenever
two or more threads in a parallel program are assigning different data
items in the same cache line. This section discusses how to avoid false
cache line sharing by restructuring the data layout and controlling the
distribution of loop iterations among threads.
Consider the following Fortran code:
REAL*4 A(8)
DO I = 1, 8
A(I) = ...
.
.
.
ENDDO
Assume there are eight threads, each executing one of the above
iterations. A(1) is on a processor cache line boundary (32-byte boundary
for V2250 servers) so that all eight elements are in the same cache line.
Only one thread at a time can “own” the cache line, so not only is the
above loop, in effect, run serially, but every assignment by a thread
requires an invalidation of the line in the cache of its previous “owner.
These problems would likely eliminate any benefit of parallelization.
Taking all of the above into consideration, review the code:
REAL*4 B(100,100)
DO I = 1, 100
DO J = 1, 100
B(I,J) = ...B(I,J-1)...
ENDDO
ENDDO
Assume there are eight threads working on the I loop in parallel.
The J loop cannot be parallelized because of the dependence. *** 'HP
compilers, by default, give each thread about the same number of
iterations, assigning (if necessary) one extra iteration to some threads.
This happens until all iterations are assigned to a thread. Table 9-1
shows the default distribution of the I loop across 8 threads.' on page 175
*** shows how the array maps to cache lines, assuming that B(1,1) is on