User manual

Table Of Contents

Zynq-7000 AP SoC Technical Reference Manual www.xilinx.com 66

UG585 (v1.11) September 27, 2016

Chapter 3: Application Processing Unit

loops, and increases the pipeline utilization by removing data dependencies between adjacent

instructions, which also indirectly reduces interrupt latency.

In the Cortex-A9 CPU, dependent load-store instructions can be forwarded for resolution within the

memory system to further reduce pipeline stalls. The core supports up to four data cache line fill

requests that can be through automatic or user-driven pre-fetching.

A key feature of this CPU is the out-of-order write back of instructions that enables the pipeline

resources to be released independent of the order in which the system provides the required data.

Load/store instructions can be issued speculatively before condition of instruction or a preceding

branch has been resolved or before data to be written has become available. If the condition

required for the execution of the load/store fails, any of the side-effects, such as the action to modify

registers, are flushed.

Branch Prediction

To minimize the branch penalty in its highly pipelined CPU, the Cortex-A9 implements both static

and dynamic branch prediction. Static branch prediction is provided by the instructions and is

decided during compilation. Dynamic branch prediction uses the outcome of the previous

executions of a specific branch to determine whether the branch should be taken or not. The

dynamic branch prediction logic employs a global branch history buffer (GHB) which is a 4,096 entry

table holding 2-bit prediction information for specific branches and is updated every time a branch

gets executed.

The branch execution and the overall instruction throughput also benefit greatly from the

implementation of a branch target address cache (BTAC) which holds the target addresses of the

recent branches. This 512-entry address cache is organized as 2-way × 256 entries and provides the

target address for a specific branch to the pre-fetch unit before the actual target address is

generated based on the calculation of the effective address and its translation to the physical

address. Additionally, if an instruction loop fits in four BTAC entries, instruction cache accesses are

turned off to lower power consumption.

Note: Both GHB and BTAC RAMs implement parity for protection; however, this support has limited

diagnostic value. Corruption in GHB data or BTAC data does not generate functional errors in the

Cortex-A 9 processor. Corruption in GHB data or BTAC data results in faulty branch prediction that is

detected and corrected when the branch gets executed.

The Cortex-A9 CPU can predict conditional branches, unconditional branches, indirect branches,

PC-destination data-processing operations, and branches that switch between ARM and Thumb

states. However, the following branch instructions are not predicted:

• Branches that switch between states (except ARM to Thumb transitions, and Thumb to ARM

transitions)

• Instructions with the S suffix are not predicted, as they are typically used to return from

exceptions and have side effects that can change privilege mode and security state.

• All mode-changing instructions

Users can enable program flow prediction by setting the Z bit in the CP15 c1 Control register to 1.

Refer to the System Control Register in the ARM Cortex-A9 Technical Reference Manual (see

Appendix A, Additional Resources). Before switching the program flow prediction on, a BTAC flush

operation must be performed which has the additional effect of setting the GHB into a known state.