Parallel Programming Guide for HP-UX Systems

Parallel synchronization
Synchronizing code
Chapter 8 155
#pragma _CNX loop_parallel(ordered,ivar=i)
for(i=0;i<99;i++) {
#pragma _CNX ordered_section(ordgate)
for(j=0;j<m;j++) {
.
.
.
a[i][j] = a[i+1][j];
.
.
.
}
#pragma _CNX end_ordered_section
}
In this approach, each thread executes the entire j loop each time it
enters the ordered section, allowing the i loop to terminate normally
regardless of the number of threads available.
Another approach is to manually interchange the i and j loops, as shown
in the following Fortran example:
DO J = 1, M
LOCK = UNLOCK_GATE(ORDGATE)
C$DIR LOOP_PARALLEL(ORDERED)
DO I = 1, 99
.
.
.
C$DIR ORDERED_SECTION(ORDGATE)
A(I,J) = A(I+1,J)
C$DIR END_ORDERED_SECTION
.
.
.
ENDDO
ENDDO
Here, the I loop is parallelized on every iteration of the J loop. The
ordered section is not isolated from its parent loop, so the loop can
terminate normally. This example has added benefit; elements of A are
accessed more efficiently.