User manual

Table Of Contents
Zynq-7000 AP SoC Technical Reference Manual www.xilinx.com 60
UG585 (v1.11) September 27, 2016
Chapter 3
Application Processing Unit
3.1 Introduction
3.1.1 Basic Functionality
The application processing unit (APU), located within the PS, contains one processor for single-core
devices or two processors for dual-core devices. These are ARM® Cortex™-A9 processors with NEON
co-processors connected in an MP configuration sharing a 512 KB L2 cache. Each processor is a
high-performance and low-power core that implements two separate 32 KB L1 caches for instruction
and data. The Cortex-A9 processor implements the ARM v7-A architecture with full virtual memory
support and can execute 32-bit ARM instructions, 16-bit and 32-bit Thumb instructions, and 8-bit
Java™ byte codes in the Jazelle state. The NEON™ coprocessor media and signal processing
architecture adds instructions that target audio, video, image and speech processing, and 3D
graphics. These advanced single instruction multiple data (SIMD) instructions are available in both
ARM and Thumb states. A block diagram of the APU is shown in Figure 3-1.
The Cortex-A9 processor(s) within the APU are organized in an MP configuration with a snoop
control unit (SCU) responsible for maintaining L1 cache coherency between the two processors and
the ACP interface from the PL. To increase performance, there is a shared unified 512 KB level-two
(L2) cache for instruction and data. In parallel to the L2 cache, there is a 256 KB on-chip memory
(OCM) module that provides a low-latency memory.
An accelerator coherency port (ACP) facilitates communication between the programmable logic (PL)
and the APU. This 64-bit AXI interface allows the PL to implement an AXI master that can access the
L2 and OCM while maintaining memory coherency with the CPU L1 caches.
The unified 512 KB L2 cache is 8-way set-associative and allows you to lock the cache content on a
line, way, or master basis. All accesses through the L2 cache controller can be routed to the DDR
controller or can be sent to other slaves in the PS or PL depending on their address. To reduce
latency to the DDR memory, there is a dedicated port from the L2 controller to the DDR controller.
Debug and trace capability is built into the two processor cores and interconnects as a part of the
CoreSight™ debug and trace system. You can control and interrogate the processor(s) and the
memory through the debug access port (DAP). Furthermore, 32-bit AMBA® trace bus (ATB) masters
from the processor(s) are funneled with other ATB masters, such as Instrumentation Trace Macrocell
(ITM) and Fabric Trace Monitor (FTM), to generate the unified PS trace through the on-chip
embedded trace buffer (ETB) or the trace-port interface units (TPIU).