Parallel Programming Guide for HP-UX Systems

Troubleshooting
False cache line sharing
Chapter 9178
where x times the data size (in bytes) is an integral multiple of 32,
eliminates false cache line sharing. This is only if the following two
conditions below are met:
The arrays are already properly aligned (as discussed earlier in this
section).
The first iteration accesses the first element of each array being
assigned. For example, in a loop DO I = 2, N, because the loop starts
at I = 2, the first iteration does not access the first element of the
array. Consequently, the iteration distribution does not match the
cache line alignment.
The number 32 is used because the cache line size is 32 bytes for V2250
servers.
Thread-specific array elements
Sometimes a parallel loop has each thread update a unique element of a
shared array, which is further processed by thread 0 outside the loop.
Consider the following Fortran code in which false sharing occurs:
REAL*4 S(8)
C$DIR LOOP_PARALLEL
DO I = 1, N
.
.
.
S(MY_THREAD()+1) = ... ! EACH THREAD ASSIGNS ONE
ELEMENT OF S
.
.
.
ENDDO
C$DIR NO_PARALLEL
DO J = 1, NUM_THREADS()
= ...S(J) ! THREAD 0 POST-PROCESSES S
ENDDO
The problem here is that potentially all the elements of S are in a single
cache line, so the assignments cause false sharing. One approach is to
change the code to force the unique elements into different cache lines, as
indicated in the following code: