User manual

Table Of Contents
Zynq-7000 AP SoC Technical Reference Manual www.xilinx.com 67
UG585 (v1.11) September 27, 2016
Chapter 3: Application Processing Unit
Cortex-A9 also employs an 8-entry return stack cache that holds the 32-bit subroutine return
addresses. This feature greatly reduces the penalty of executing subroutine calls and can address
nested routines up to eight levels deep.
Instruction and Data Alignment
ARM architecture specifies the ARM instructions as being 32-bits wide and requires them to be
word-aligned. Thumb instructions are 16-bits wide and are required to be half-word aligned.
Thumb-2 instructions which are 16- or 32-bits wide are also required to be half-word aligned. Data
accesses can be unaligned and the load/store unit within the CPU breaks them up to aligned
accesses. The data from these accesses are merged and sent to the register file within the CPU as had
been requested.
Note: The application processing unit (APU), and the PS as a whole, support only little-endian
architecture for both instruction and data.
Trace and Debug
The Cortex-A9 processor implements the ARMv7 debug architecture as described in the ARM
Architecture Reference Manual. In addition, the processor supports a set of Cortex-A9
processor-specific events and system-coherency events. For more information, see Chapter 11,
Performance Monitoring Unit in the ARM Cortex-A9 Technical Reference Manual.
The debug interface of the processor consists of:
A baseline CP14 interface that implements the ARMv7 debug architecture and the set of debug
events as described in the ARM Architecture Reference Manual
An extended CP14 interface that implements a set of debug events specific to this processor
(explained in the ARM Architecture Reference Manual)
An external debug interface connected to an external debugger through a debug access port
(DAP)
The Cortex-A9 includes a program trace module that provides ARM CoreSight technology
compatible program-flow trace capabilities for either of the Cortex-A9 processors and provides full
visibility into the actual instruction flow of the processor. The Cortex-A9 PTM includes visibility over
all code branches and program flow changes with cycle-counting enabling profiling analysis. The
PTM block in conjunction with the CoreSight design kit provides the software developer the ability to
non-obtrusively trace the execution history of multiple processors and either store this, along with
time stamped correlation, into an on-chip buffer, or off chip through a standard trace interface so as
to have improved visibility during development and debug.
The Cortex-A9 processor also implements program counters and event monitors that can be
configured to gather statistics on the operation of the processor and the memory system.
3.2.3 Level 1 Caches
Each of the two Cortex-A9 processors has separate 32 KB level-1 instruction and data caches. Both L1
caches have common features that include: