HP Compilers for HP Integrity Servers (September 2011)

the newer 64-bit data model where longs and pointers are 64 bits wide. The traditional
32-bit data model is appropriate for many legacy applications which may not be 64-bit
clean. Many other compilers require the application to comply with the 64-bit data model
which usually requires a separate 64-bit migration step for legacy applications.
To extend the lifetime of new applications for Integrity servers, HP compilers provide
several code scheduling options. These options allow software providers to target a
specific processor model or to use a blended model that is suitable for all members of
the processor family.
Faster development and debug
Traditionally, compilers perform minimal optimization by default and no optimization
when debugging is specified. This approach is inappropriate for Itanium-based systems,
where unoptimized programs generally run about two to three times slower than when
optimized at +O1 and four to five times slower than when optimized at +O2.
Some optimizations are also required for a debug build since 30 to 50% of the instructions
in an unoptimized code sequence are no-op instructions. This relatively large number
of no-op instructions is due to the need to form three-instruction bundles, and the limited
number of bundle templates available. With optimization, the compiler is able to make
much more effective use of the bundle templates.
HP has significantly enhanced performance of code compiled for debugging by providing
+O1 level of optimization by default. Optimizations performed at +O1 include common
sub-expression elimination, constant propagation, load store elimination, copy elimination,
register allocation, restricted basic block scheduling, and simple data prefetching. Care
has been taken to ensure that the program can still be debugged correctly; that is, that
breakpoints are at expected places and variables have expected values at breakpoints
corresponding to source lines.
Advanced low-level optimization
At optimization level 2 (option +O2), HP’s low level optimizer takes full advantage of the
key features of the architecture. In addition to the local optimizations applied at +O1,
the optimizer applies Static Single Assignment (SSA)–based global value numbering (see
“Reference 6” (page 35)), global code motion, value congruent instruction elimination
to reduce the static and dynamic number of instructions, aliased scalar promotion (see
“Reference 7” (page 35)), a fast version of interprocedural inlining using “tuned-down
heuristics, and SSA-based partial redundancy elimination. The loop optimizer performs
data prefetching, sum reduction, scalar replacement, strength reduction, post-increment
synthesis and loop unrolling. Data prefetching is automatically performed on loops where
the optimizer is able to discern an array reference pattern or linked-list traversal.
HP compilers divide application code into regions which form the unit of operation for
instruction scheduling. The instruction scheduler employs control speculation, data
speculation, and predication to schedule the region as efficiently as possible, maximizing
instruction-level parallelism (see “Reference 3” (page 35)). Where possible, given
12 HP compilers for HP Integrity servers