user manual

72 Avoid Address Generation Interlocks
AMD Athlon Processor x86 Code Optimization
22007E/0November 1999
Always Inline Functions if Called from One Site
A function should always be inlined if it can be established that
it is called from just one site in the code. For the C language,
determination of this characteristic is made easier if functions
are explicitly declared static unless they require external
linkage. This case occurs quite frequently, as functionality that
could be concentrated in a single large function is split across
multiple small functions for improved maintainability and
readability.
Always Inline Functions with Fewer than 25 Machine Instructions
In addition, functions that create fewer than 25 machine
instructions once inlined should always be inlined because it is
likely that the function call overhead is close to or more than
the time spent executing the function body. For large functions,
the benefits of reduced function call overhead gives
diminishing returns. Therefore, a function that results in the
insertion of more than 500 machine instructions at the call site
should probably not be inlined. Some larger functions might
consist of multiple, relatively short paths that are negatively
affected by function overhead. In such a case, it can be
advantageous to inline larger functions. Profiling information is
the best guide in determining whether to inline such large
functions.
Avoid Address Generation Interlocks
Loads and stores are scheduled by the AMD Athlon processor to
access the data cache in program order. Newer loads and stores
with their addresses calculated can be blocked by older loads
and stores whose addresses are not yet calculated this is
known as an address generation interlock. Therefore, it is
advantageous to schedule loads and stores that can calculate
their addresses quickly, ahead of loads and stores that require
the resolution of a long dependency chain in order to generate
their addresses. Consider the following code examples.