HP Compilers for HP Integrity Servers (September 2011)

Tuning with profile-based optimization
Profile-based optimization (PBO) is likely to be worthwhile if:
The application contains a large number of control-flow branches.
The application contains a large number of indirect branches (for example, C++
virtual function calls) and the -ipo option is used.
HP Caliper data indicates high branch misprediction rates, high numbers of case
statement layout optimization opportunities, or large numbers of if-convert
opportunities for hot branches.
Representative input sets are readily available or the available profile data can be
translated into PBO options and pragmas.
HP Caliper data indicates poor data cache performance, and the application contains
linked-list traversals and/or large numbers of global or static variables.
The application contains loops that tend to iterate only a few times.
For integer code, PBO can be expected to achieve a 5–40% improvement in application
performance; floating-point code will generally see more modest improvements.
Tuning across program modules
The compiler option -ipo requests cross-module optimization (optionally in conjunction
with PBO). If HP Caliper data collected after using -ipo shows an increase in instruction
cache and/or TLB misses, this probably indicates a bit too much inlining was performed.
In this case, the option +inline_level can be used to limit inlining.
Tuning floating-point numerical code
The performance strategies already mentioned can improve floating-point performance.
For example, optimization at +O2 or with PBO will speed up most floating-point code.
Optimization at +O3 further speeds up some code and can dramatically speed up
loop-intensive code. With option +O3, the compiler performs additional optimizations
such as loop transformations (interchange, fusion, distribution, and so on) and more
inlining of math library routines into user code. In particular, if HP Caliper indicates high
data cache or TLB miss rates, the optimizations performed at +O3 can be highly beneficial.
At optimization levels +O2 and higher, the compiler inlines the more commonly used
math functions, including log, exp, sin, and cos, provided that the proper header
files are included. This can substantially improve performance, particularly for calls in
loops, and does not affect function behavior.
28 HP compilers for HP Integrity servers