Parallel Programming Guide for HP-UX Systems

Troubleshooting

False cache line sharing

Chapter 9174

False cache line sharing

False cache line sharing is a form of cache thrashing. It occurs whenever

two or more threads in a parallel program are assigning different data

items in the same cache line. This section discusses how to avoid false

cache line sharing by restructuring the data layout and controlling the

distribution of loop iterations among threads.

Consider the following Fortran code:

REAL*4 A(8)

DO I = 1, 8

A(I) = ...

ENDDO

Assume there are eight threads, each executing one of the above

iterations. A(1) is on a processor cache line boundary (32-byte boundary

for V2250 servers) so that all eight elements are in the same cache line.

Only one thread at a time can “own” the cache line, so not only is the

above loop, in effect, run serially, but every assignment by a thread

requires an invalidation of the line in the cache of its previous “owner.”

These problems would likely eliminate any beneﬁt of parallelization.

Taking all of the above into consideration, review the code:

REAL*4 B(100,100)

DO I = 1, 100

DO J = 1, 100

B(I,J) = ...B(I,J-1)...

ENDDO

Assume there are eight threads working on the I loop in parallel.

The J loop cannot be parallelized because of the dependence. *** 'HP

compilers, by default, give each thread about the same number of

iterations, assigning (if necessary) one extra iteration to some threads.

This happens until all iterations are assigned to a thread. Table 9-1

shows the default distribution of the I loop across 8 threads.' on page 175

*** shows how the array maps to cache lines, assuming that B(1,1) is on