Parallel Programming Guide for HP-UX Systems

Troubleshooting
False cache line sharing
Chapter 9180
.
ENDDO
C$DIR END_TASKS
Working with unaligned arrays
The most common cache-thrashing complication using arrays and loops
occurs when arrays assigned within a loop are unaligned with each other.
There are several possible causes for this:
Arrays that are local to a routine are allocated on the stack.
Array dummy arguments might be passed an element other than the
first in the actual argument.
Array elements might be assigned with different offset indexes.
Consider the following Fortran code:
COMMON /OKAY/ X(112,100)
...
CALL UNALIGNED (X(I,J))
...
SUBROUTINE UNALIGNED (Y)
REAL*4 Y(*)
! Y(1) PROBABLY NOT ON A CACHE LINE BOUNDARY
The address of Y(1) is unknown. However, if elements of Y are heavily
assigned in this routine, it may be worthwhile to compute an alignment,
given by the following formula:
LREM = LSIZE - ( (
( LOC(Y(1))-4, LSIZE*x) + 4) /x)
where
LSIZE is the appropriate cache line size in words
x is the data size for elements of Y
For this case, LSIZE on V2250 servers is 32 bytes in single precision
words (8 words). Note that:
( ( MOD ( LOC(Y(1))-4, LSIZE*4) + 4) /4)
returns a value in the set 1, 2, 3, ..., LSIZE,soLREM is in the range 0 to 7.
Then a loop such as: