user manual

36 Align Branch Targets in Program Hot Spots
AMD Athlon Processor x86 Code Optimization
22007E/0November 1999
Example 1 (Avoid):
FLD QWORD PTR [foo]
FIMUL DWORD PTR [bar]
FIADD DWORD PTR [baz]
Example 2 (Preferred):
FILD DWORD PTR [bar]
FILD DWORD PTR [baz]
FLD QWORD PTR [foo]
FMULP ST(2), ST
FADDP ST(1),ST
Align Branch Targets in Program Hot Spots
In program hot spots (i.e., innermost loops in the absence of
profiling data), place branch targets at or near the beginning of
16-byte aligned code windows. This technique helps to
maximize the number of instructions that are filled into the
instruction-byte queue while preventing I-cache space in
branch intensive code.
Use Short Instruction Lengths
Assemblers and compilers should generate the tightest code
possible to optimize use of the I-cache and increase average
decode rate. Wherever possible, use instructions with shorter
lengths. Using shorter instructions increases the number of
instructions that can fit into the instruction-byte queue. For
example, use 8-bit displacements as opposed to 32-bit
displacements. In addition, use the single-byte format of simple
integer instructions whenever possible, as opposed to the
2-byte opcode ModR/M format.
Example 1 (Avoid):
81 C0 78 56 34 12 add eax, 12345678h ;uses 2-byte opcode
; form (with ModR/M)
81 C3 FB FF FF FF add ebx, -5 ;uses 32-bit
; immediate
0F 84 05 00 00 00 jz $label1 ;uses 2-byte opcode,
; 32-bit immediate