Parallel Programming Guide for HP-UX Systems

Parallel synchronization
Synchronizing code
Chapter 8154
C$DIR ORDERED_SECTION(ORDGATE)
A(I,J) = A(I+1,J)
C$DIR END_ORDERED_SECTION
.
.
.
ENDDO
ENDDO
Recall that once a given thread has passed through an ordered section, it
cannot reenter it until all other threads have passed through in order.
This is only possible in the given example if the number of available
threads integrally divides 99 (the I loop limit). If not, deadlock results.
To better understand this:
Assume 6 threads, numbered 0 through 5, are running the parallel I
loop.
•ForI = 1, J = 1, thread 0 passes through the ordered section and
loops back through J, stopping when it reaches the ordered section
again for I = 1, J = 2. It cannot enter until threads 1 through 5
(which are executing I = 2 through 6, J = 1 respectively) pass
through in sequence. This is not a problem, and the loop proceeds
through I = 96 in this fashion in parallel.
•ForI > 96, all 6 threads are no longer needed. In a single loop nest
this would not pose a problem as the leftover 3 iterations would be
handled by threads 0 through 2. When thread 2 exited the ordered
section it would hit the ENDDO and the I loop would terminate
normally.
But in this example, the J loop isolates the ordered section from the I
loop, so thread 0 executes J = 1 for I = 97, loops through J and waits
during J = 2 at the ordered section for thread 5, which has gone idle,
to complete. Threads 1 and 2 similarly execute J = 1 for I = 98 and
I = 99, and similarly wait after incrementing J to 2. The entire J loop
must terminate before the I loop can terminate, but the J loop can
never terminate because the idle threads 3, 4, and 5 never pass
through the ordered section. As a result, deadlock occurs.
To handle this problem, you can expand the ordered section to include
the entire j loop, as shown in the following C example: