Parallel Programming Guide for HP-UX Systems

Parallel synchronization
Synchronizing code
Chapter 8150
#pragma _CNX loop_parallel(ivar=i)
for(i=0;i<n;i++) {
a[i] = b[i] + c[i];
#pragma _CNX critical_section(gate1)
absum = absum + a[i];
#pragma _CNX end_critical_section
if(adjb[i]) {
b[i] = c[i] + d[i];
#pragma _CNX critical_section(gate1)
absum = absum + b[i];
#pragma _CNX end_critical_section
}
.
.
.
}
lock = free_gate(&gate1);
The shared variable absum must be updated after a(I) is assigned and
again if b(i) is assigned. Access to absum must be guarded by the same
gate to ensure that two threads do not attempt to update it at once. The
critical sections protecting the assignment to ABSUM must explicitly name
this gate, or the compiler chooses unique gates for each section,
potentially resulting in incorrect answers.There must be a substantial
amount of parallelizable code outside of these critical sections to make
parallelizing this loop cost-effective.
Using ordered sections
Like critical sections, ordered sections lock and unlock a specified gate to
isolate a section of code in a loop. However, they also ensure that the
enclosed section of code executes in the same order as the iterations of
the ordered parallel loop that contains it.
Once a given thread passes through an ordered section, it cannot enter
again until all other threads have passed through in order. This ordering
is difficult to implement without using the ordered section directives or
pragmas.
You must use a loop_parallel(ordered) directive or pragma to
parallelize any loop containing an ordered section. See
loop_parallel(ordered)” on page 144 for a description of this.
Example 8-6 Ordered sections
The following Fortran example contains a backward loop-carried
dependence on the array A that would normally inhibit parallelization.