User manual

ManualsBrandsDIGILENT ManualsDev KitsPCB design board

101

102

103

104

105

106

107

108

109

110

Table Of Contents

Zynq-7000 All Programmable SoC

Zynq-7000 AP SoC Technical Reference Manual www.xilinx.com 104

UG585 (v1.11) September 27, 2016

Chapter 3: Application Processing Unit

Note: The transaction can optionally allocate into the L2 cache if the write parameters are set

accordingly.

ACP non-coherent write requests: An ACP write request is non-coherent when AWUSER[0] = 0 or

AWCACHE[1] = 0 alongside AWVALID. In this case, the SCU does not enforce coherency and the write

request is forwarded directly to one of the available SCU AXI master ports.

ACP Usage

The ACP provides a low latency path between the PS and the accelerators implemented in the PL

when compared with a legacy cache flushing and loading scheme. Steps that must take place in an

example of a PL-based accelerator are as follows:

1. The CPU prepares input data for the accelerator within its local cache space.

2. The CPU sends a message to the accelerator using one of the general purpose AXI master

interfaces to the PL.

3. The accelerator fetches the data through the ACP, processes the data, and returns the result

through the ACP.

4. The accelerator sets a flag by writing to a known location to indicate that the data processing is

complete. Status of this flag can be polled by the processor or could generate an interrupt.

Table 3-7 shows ACP read and write behavior based on current cache status. Clearly, access latency

is small when cache hits occur.

When compared to a tightly-coupled coprocessor, ACP access latencies are relatively long. Therefore,

ACP is not recommended for fine-grained instruction level acceleration. On the other hand, for

coarse-grain acceleration such as video frame-level processing, ACP does not have a clear advantage

over traditional memory-mapped PL acceleration because the transaction overhead is small relative

to the transaction time, and might potentially cause undesirable cache thrashing. ACP is therefore

optimal for medium-grain acceleration, such as block-level crypto accelerator and video

macro-block level processing.

Table 3-7: ACP Read and Write Behavior

Action Description

ACP read – I (invalid) SCU fetches data from external memory through one of two AXI master

interfaces. Data is forwarded to the ACP directly. It does not affect the CPU

L1 cache state.

ACP read – M (modified) SCU fetches data from L1 cache with M status. It does not affect the L1

cache state.

ACP read – S (shared) SCU fetches data from any L1 cache with S status. It does not affect the L1

cache state.

ACP read – E (exclusive) SCU fetches data from the L1 cache with E status. It does not affect the L1

cache state.

ACP write – I (invalid) Data is written to external memory through one of two AXI master

interfaces. It does not affect the CPU L1 cache state.

ACP write – M (modified) Data in L1 cache with M status is flushed out to external memory first.

After that, ACP data is written into external memory interface. L1 cache

previously with M status is changed to I status. If the SCU overwrites the

entire cache line, L1 cache flush is skipped.