User manual

Table Of Contents
Zynq-7000 AP SoC Technical Reference Manual www.xilinx.com 66
UG585 (v1.11) September 27, 2016
Chapter 3: Application Processing Unit
loops, and increases the pipeline utilization by removing data dependencies between adjacent
instructions, which also indirectly reduces interrupt latency.
In the Cortex-A9 CPU, dependent load-store instructions can be forwarded for resolution within the
memory system to further reduce pipeline stalls. The core supports up to four data cache line fill
requests that can be through automatic or user-driven pre-fetching.
A key feature of this CPU is the out-of-order write back of instructions that enables the pipeline
resources to be released independent of the order in which the system provides the required data.
Load/store instructions can be issued speculatively before condition of instruction or a preceding
branch has been resolved or before data to be written has become available. If the condition
required for the execution of the load/store fails, any of the side-effects, such as the action to modify
registers, are flushed.
Branch Prediction
To minimize the branch penalty in its highly pipelined CPU, the Cortex-A9 implements both static
and dynamic branch prediction. Static branch prediction is provided by the instructions and is
decided during compilation. Dynamic branch prediction uses the outcome of the previous
executions of a specific branch to determine whether the branch should be taken or not. The
dynamic branch prediction logic employs a global branch history buffer (GHB) which is a 4,096 entry
table holding 2-bit prediction information for specific branches and is updated every time a branch
gets executed.
The branch execution and the overall instruction throughput also benefit greatly from the
implementation of a branch target address cache (BTAC) which holds the target addresses of the
recent branches. This 512-entry address cache is organized as 2-way × 256 entries and provides the
target address for a specific branch to the pre-fetch unit before the actual target address is
generated based on the calculation of the effective address and its translation to the physical
address. Additionally, if an instruction loop fits in four BTAC entries, instruction cache accesses are
turned off to lower power consumption.
Note: Both GHB and BTAC RAMs implement parity for protection; however, this support has limited
diagnostic value. Corruption in GHB data or BTAC data does not generate functional errors in the
Cortex-A 9 processor. Corruption in GHB data or BTAC data results in faulty branch prediction that is
detected and corrected when the branch gets executed.
The Cortex-A9 CPU can predict conditional branches, unconditional branches, indirect branches,
PC-destination data-processing operations, and branches that switch between ARM and Thumb
states. However, the following branch instructions are not predicted:
Branches that switch between states (except ARM to Thumb transitions, and Thumb to ARM
transitions)
Instructions with the S suffix are not predicted, as they are typically used to return from
exceptions and have side effects that can change privilege mode and security state.
All mode-changing instructions
Users can enable program flow prediction by setting the Z bit in the CP15 c1 Control register to 1.
Refer to the System Control Register in the ARM Cortex-A9 Technical Reference Manual (see
Appendix A, Additional Resources). Before switching the program flow prediction on, a BTAC flush
operation must be performed which has the additional effect of setting the GHB into a known state.