Parallel Programming Guide for HP-UX Systems

Troubleshooting
Triangular loops
Chapter 9 195
Because of the ever-decreasing work in the triangular loop, there are
fewer cache lines left to compute than there are threads. Consequently,
threads drop out until there is only one thread left to compute those
iterations associated with the last cache line. Compare this distribution
to the default distribution that causes false cache line sharing and
consequent thrashing when all threads attempt to compute data into a
few cache lines. See “False cache line sharing” on page 174 in this
chapter.