Technical data

Optimizing with MACRO-32
Combine vectorization with decomposition.
Consider a solution at a higher level hierarchy.
Retest the program compiling with /VECTOR (or any
combination with a parallel qualifier) and return to the start
of step 3.
3.1.2 Combining Decomposition with Vectorization
To produce code that executes in parallel and on vector processors, compile
/VECTOR with a parallel qualifier. Table 3–1 lists the compilation
combinations and their interpretations.
Table 3–1 Qualifier Combinations for Parallel Vector Processing
Combination Interpretation
/VECTOR/PARALLEL=AUTOMATIC
Performs a dependence analysis on suitable loops and optimizes
them for parallel-vector processing; chooses loops and prepares
them for vector or parallel processing based on whether they
will execute efficiently and produce correct results. In a nested
structure, decomposition and vectorization may occur for multiple
loops but no loop is decomposed inside a decomposed loop
and no loop is vectorized inside a vectorized loop.
/VECTOR/PARALLEL=MANUAL
Performs a dependence analysis, optimization, and vectorization
only; disqualifies from vectorization any loops preceded by the
CPAR$ DO_PARALLEL directive from vectorization; in these
loops, parses the user-supplied directives.
/VECTOR/PARALLEL
(Same as VECTOR/PARALLEL=MANUAL)
/VECTOR/PARALLEL=(MANUAL,AUTOMATIC)
Same as /VECTOR/PARALLEL=AUTOMATIC except disqualifies
loops preceded by CPAR$ DO_PARALLEL. In those loops,
only user-supplied directives are parsed. Any loops contained
in a manually decomposed loop are disqualified from
autodecomposition but not vectorization.
Both parallel and vector processing have certain tradeoff qualities, which
affect the aggregate speedup of vector and parallel processing. The
combined vector-parallel processing will be somewhat less than the
3–3