User`s guide

Improving the Performance of Ported Code

4.1 Aligning Data

• For data in internal or privileged interfaces, do not automatically make

changes to improve data alignment. You should consider the frequency

with which the data structure is accessed, the amount of work involved

in realigning the structure, and the risk that things might go wrong. In

judging the amount of work involved, make sure you know all accesses to the

data; do not merely guess. If you own all accesses in the code for which you

are responsible and if you are making changes in the module (or modules)

anyway, then it is safe to ﬁx the alignment problem.

• Do not routinely unpack byte and word data into longwords or quadwords.

The time to do this is when you are ﬁxing an alignment problem (word not on

word boundary), subject to the aforementioned cautions and constraints, or if

you know the data granularity is a problem.

• If you do not own all the accesses to the data, there still may be circumstances

under which ﬁxing alignment is appropriate. If the data is frequently

accessed, if performance is a real issue, and if you must unavoidably scramble

the data structure anyway, it makes sense to align the structure at the same

time.

It is important that you notify other programmers whose code may be

affected. Do not assume in such cases that all related modules will recompile

or that program documentation will help others detect errant data cell

separation assumptions. Always assume that changes like this will reveal

irregular programming practices and not go smoothly.

4.2 Code Flow and Branch Prediction

The Alpha and Itanium architectures are pipelined, which means that before

completing the current instruction, they start to execute several instructions

beyond it. By tailoring the code to keep the pipeline ﬁlled, you can make the code

run signiﬁcantly faster.

On each conditional branch, the Alpha and Itanium architectures attempt

to predict whether or not the branch is taken so that they can correctly ﬁll

the instruction pipeline with the next instruction to be executed. The Alpha

architecture predicts that forward conditional branches will not be taken and

backward conditional branches will be taken. The Itanium architecture has

branch-prediction hints in the branch instructions. A mispredicted branch costs

extra time because the pipeline must be ﬂushed, and, in addition, the instruction

at the branch destination may not be in the instruction cache.

The compiler tries to follow the ﬂow of the VAX MACRO code to generate Alpha

and Itanium code that has the most common code path in a contiguous block,

to allow the pipelined Alpha and Itanium architectures to process the code with

the greatest efﬁciency. However, in some situations, the compiler’s default rules

do not generate the most efﬁcient code. In performance sensitive code sections,

you can often improve the efﬁciency of the generated code by giving the compiler

information about which code paths will most likely be taken.

Improving the Performance of Ported Code 4–3