HP Compilers for HP Integrity Servers (September 2011)

allowing the high-level optimizer to transform the indirect call into a test and a direct
call.
The inliner framework has been designed to scale to very large applications. It uses a
novel and fast underlying algorithm and employs an elaborate set of heuristics to guide
its inlining decisions.
The inlining engine is also employed at +O2 for intra-module inlining. At this optimization
level the inliner uses tuned down heuristics in order to guarantee fast compile times.
Application performance benefits from interprocedural optimization in the following
ways:
Insertion of inter-procedural data prefetches before call sites for data accessed
through dereferences of a pointer parameter to the call.
Interprocedural analysis of memory references and function arguments enables and
improves many optimizations; for example, it yields additional opportunities for
register promotion.
Consider this example:
void foo( int *x, int *y )
{
... = *x; // load 1
*y = ... // store 1
... = *x; // load 2
}
Without any additional knowledge about the properties of the pointers x and y, the
compiler has to issue a second load instruction (load 2), since the store (store 1)
may overwrite the content of the pointer x.
If, as a result of interprocedural analysis, the compiler was able to determine that
x and y never alias (point to the same memory location), the compiler can promote
the value of *x into a register and just reuse this register (load 2).
Function inlining exposes traditional benefits, such as the reduction of call overhead,
the improvement of the locality of the executing code and the reduction of the number
of branches. More importantly though, inlining exposes additional optimization
opportunities because of the widened scope, which also enables better instruction
scheduling.
The whole call graph is constructed, enabling indirect call promotion, where an
indirect call is converted to a test and a direct call. Depending on the application
characteristics, and in the presence of PBO data, this can result in significant
application speedups (we have observed up to 20% improvements for certain
applications).
Dead variable removal allows the high level optimizer to reduce the total memory
requirements of the application by removing global and static variables that are
never referenced.
16 HP compilers for HP Integrity servers