HP Compilers for HP Integrity Servers (September 2011)

the newer 64-bit data model where longs and pointers are 64 bits wide. The traditional

32-bit data model is appropriate for many legacy applications which may not be 64-bit

clean. Many other compilers require the application to comply with the 64-bit data model

which usually requires a separate 64-bit migration step for legacy applications.

To extend the lifetime of new applications for Integrity servers, HP compilers provide

several code scheduling options. These options allow software providers to target a

specific processor model or to use a blended model that is suitable for all members of

the processor family.

Faster development and debug

Traditionally, compilers perform minimal optimization by default and no optimization

when debugging is specified. This approach is inappropriate for Itanium-based systems,

where unoptimized programs generally run about two to three times slower than when

optimized at +O1 and four to five times slower than when optimized at +O2.

Some optimizations are also required for a debug build since 30 to 50% of the instructions

in an unoptimized code sequence are no-op instructions. This relatively large number

of no-op instructions is due to the need to form three-instruction bundles, and the limited

number of bundle templates available. With optimization, the compiler is able to make

much more effective use of the bundle templates.

HP has significantly enhanced performance of code compiled for debugging by providing

+O1 level of optimization by default. Optimizations performed at +O1 include common

sub-expression elimination, constant propagation, load store elimination, copy elimination,

has been taken to ensure that the program can still be debugged correctly; that is, that

breakpoints are at expected places and variables have expected values at breakpoints

corresponding to source lines.

Advanced low-level optimization

At optimization level 2 (option +O2), HP’s low level optimizer takes full advantage of the

key features of the architecture. In addition to the local optimizations applied at +O1,

the optimizer applies Static Single Assignment (SSA)–based global value numbering (see

“Reference 6” (page 35)), global code motion, value congruent instruction elimination

to reduce the static and dynamic number of instructions, aliased scalar promotion (see

“Reference 7” (page 35)), a fast version of interprocedural inlining using “tuned-down”

heuristics, and SSA-based partial redundancy elimination. The loop optimizer performs

data prefetching, sum reduction, scalar replacement, strength reduction, post-increment

synthesis and loop unrolling. Data prefetching is automatically performed on loops where

the optimizer is able to discern an array reference pattern or linked-list traversal.

HP compilers divide application code into regions which form the unit of operation for

instruction scheduling. The instruction scheduler employs control speculation, data

speculation, and predication to schedule the region as efficiently as possible, maximizing

instruction-level parallelism (see “Reference 3” (page 35)). Where possible, given

12 HP compilers for HP Integrity servers