HP Compilers for HP Integrity Servers (September 2011)

Recognition of global, static and local variables that are assigned but never used
allows the optimizer to remove dead code (which may result in additional dead
variables).
Conversion of global variables that are referenced only within a module allows the
high level optimizer to convert the symbol to a private symbol, guaranteeing that it
can only be accessed from within this module. This gives the low-level optimizer
greater freedom in optimizing references to that variable.
Dead function removal (functions that are never called) and redundant function
removal (for example, duplicate template instantiations) help to reduce compile time
and improve the effectiveness of cross module inlining by reducing the working set.
Additionally, as the application’s total code size reduces, it will incur fewer cache
and page misses (resulting in potentially higher performance).
Short data optimizations. Global and static data allocated in the short data area
can be accessed with a more efficient access sequence. In whole program mode
(-ipo) the compiler can perform precise analysis to determine if all global and
static data fits into the short data area and allocate it there. If the data doesn’t fit,
the compiler can determine the best safe short data size threshold, enabling a
maximum amount of data items to be addressable more effectively.
This is an advantage over +O2 alone (without -ipo). At optimization level +O2 the
same optimization can be enabled with the option +Oshortdata,
+Oshortdata=<threshold>. However, this method is typically not adaptive to
application change and evolution.
For calls to external functions (function not residing in a binary) the linker introduces
a small call stub. If the compiler knows that a function call is a call to an external
function, it can inline the call stub, resulting in better performance.
The HP compilers support a mechanism that allows annotating function prototypes
with a pragma (#pragma extern) marking those functions as external functions.
When used with the compiler option -minshared (see “Choosing the link mode
(page 25)), the compiler can perform call stub inlining.
All this is no longer necessary with -ipo in whole program mode. In this model the
compiler knows which functions are defined by the application and which are
external and automatically marks functions appropriately.
Interprocedural constant propagation enables more efficient code.
Data layout optimizations, including structure splitting and dead field removal, can
help reduce the working set of an application and thereby improve data cache
behavior. In its framework for interprocedural data layout optimizations, if the
compiler is able to determine that a given structure type can be modified safely, the
compiler may split a structure type into hot and cold parts, with the goal of reducing
cache and TLB penalties. This optimization has been greatly improved in the current
Understanding key features of the HP compilers 17