User`s guide

Improving the Performance of Ported Code
4.1 Aligning Data
For data in internal or privileged interfaces, do not automatically make
changes to improve data alignment. You should consider the frequency
with which the data structure is accessed, the amount of work involved
in realigning the structure, and the risk that things might go wrong. In
judging the amount of work involved, make sure you know all accesses to the
data; do not merely guess. If you own all accesses in the code for which you
are responsible and if you are making changes in the module (or modules)
anyway, then it is safe to fix the alignment problem.
Do not routinely unpack byte and word data into longwords or quadwords.
The time to do this is when you are fixing an alignment problem (word not on
word boundary), subject to the aforementioned cautions and constraints, or if
you know the data granularity is a problem.
If you do not own all the accesses to the data, there still may be circumstances
under which fixing alignment is appropriate. If the data is frequently
accessed, if performance is a real issue, and if you must unavoidably scramble
the data structure anyway, it makes sense to align the structure at the same
time.
It is important that you notify other programmers whose code may be
affected. Do not assume in such cases that all related modules will recompile
or that program documentation will help others detect errant data cell
separation assumptions. Always assume that changes like this will reveal
irregular programming practices and not go smoothly.
4.2 Code Flow and Branch Prediction
The Alpha and Itanium architectures are pipelined, which means that before
completing the current instruction, they start to execute several instructions
beyond it. By tailoring the code to keep the pipeline filled, you can make the code
run significantly faster.
On each conditional branch, the Alpha and Itanium architectures attempt
to predict whether or not the branch is taken so that they can correctly fill
the instruction pipeline with the next instruction to be executed. The Alpha
architecture predicts that forward conditional branches will not be taken and
backward conditional branches will be taken. The Itanium architecture has
branch-prediction hints in the branch instructions. A mispredicted branch costs
extra time because the pipeline must be flushed, and, in addition, the instruction
at the branch destination may not be in the instruction cache.
The compiler tries to follow the flow of the VAX MACRO code to generate Alpha
and Itanium code that has the most common code path in a contiguous block,
to allow the pipelined Alpha and Itanium architectures to process the code with
the greatest efficiency. However, in some situations, the compiler’s default rules
do not generate the most efficient code. In performance sensitive code sections,
you can often improve the efficiency of the generated code by giving the compiler
information about which code paths will most likely be taken.
Improving the Performance of Ported Code 4–3