HP Compilers for HP Integrity Servers (September 2011)

The option +Ofaster is an alias for +Ofast +O4 and is therefore ideally suited for
cross-module optimizations.
Because both +Ofast and [+Ofaster] imply +Ofltacc=relaxed, they are not alone
appropriate for tuning floating-point code that requires more rigorous floating-point
behavior. However, they can be made appropriate by taking advantage of the compiler’s
general left-to-right option processing. For example, the command-line options +Ofast
+Ofltacc=strict tell the compiler that the latter +Ofltacc=strict overrides the
earlier setting of +Ofltacc=relaxed imposed by +Ofast. Likewise, +Ofast +FPd
enables the default gradual underflow mode.
Using inline assembly
HP C and C++ inline assembly support allows the user to directly exploit powerful
assembly-level instructions that would otherwise be difficult for the compiler to generate
from source-level constructs. Inline assembly is implemented as an extension to C/C++.
Other than including an additional header file, no other changes are needed to use
inline assembly. For certain applications, the use of inline assembly can improve
performance or provide access to key functionality above and beyond that which the
compiler alone can provide:
The performance of multimedia applications can be significantly enhanced with
inline assembly because the compiler cannot directly generate many of the most
beneficial multimedia instructions.
The HP-UX compilers and libraries make available most architectural floating-point
features through standard language features and natural extensions. The 80-bit
extended type, the fma() functions and the inquiry macros, such as isinf and
isunordered, are all available using standard features and natural extensions.
However, writers of low-level floating-point codes will still benefit from judicious use
of inline assembly to access architectural features such as the frcpa and frsqrta
instructions, the 82-bit registers, and the alternate status fields.
For more information about inline assembly, see “Reference 17” (page 36).
Troubleshooting optimization problems
Occasionally, optimization can expose defects in an application that were hidden when
the application was compiled without optimization. Here are some representative
examples:
Expressions that perform pointer arithmetic beyond the boundary of an object are
undefined according to the language standards. Use of such non-standard pointer
arithmetic to access data can result in failures in 32-bit mode due to the compiler’s
use of addp4 (add pointer) instructions. The add pointer instruction computes
addresses by adding an offset to a 32-bit pointer, which must point to the same
address region as the resulting pointer. If an application uses non-standard pointer
arithmetic, however, the compiler might not be able to enforce this condition, resulting
Application tuning 31