User manual

Table Of Contents

Zynq-7000 AP SoC Technical Reference Manual www.xilinx.com 648

UG585 (v1.11) September 27, 2016

Chapter 22: Programmable Logic Design Guide

Power

An additional benefit of moving operations to the programmable logic is a reduction in power.

Depending on the operations, programmable logic can reduce power per OP by 10-100x. Thus it

might be useful to implement algorithms in the PL solely to reduce system power.

One issue to be aware of is that if the algorithm requires access to external memory, the energy cost

of accessing the external memory could dominate the energy budget making a reduction in the

operation power irrelevant.

Latency

Parallel logic in the PL has a low predictable delay, and cannot be interrupted. For this reason

algorithms which are used to respond to real time events originating the in PL might best be serviced

by algorithms in the programmable logic. This approach can reduce response time from thousands

of clocks to tens of clocks.

22.2.2 Designing PL Accelerators

Programmable logic accelerators are typically created in the RTL languages Verilog or VHDL.

Experienced RTL engineers can use C-code as a golden model to create an efficient hardware

implementation of the algorithm in programmable logic.

For software programmers who are more comfortable with the C language, C-to-Gates compilers

exist which can allow a user to build HW accelerators using the C language. Keeping in mind that C

is a sequential language, automated compiler methods can be used to map the sequential code to

parallel hardware without user intervention.

For instance, a FOR loop such as “for(i=0; i<10; i++){ x[i]=a[i]+b[i]; }” can be

unrolled to create 10 individual adders all operating in parallel.

For video and DSP algorithms, tools such as Matlab Simulink and Xilinx System Generator can be

used to directly create logic from algorithmic flowgraphs. A primary advantage of using Matlab

Simulink is the rich library of functions which can be used to model, simulate and verify the hardware

implementation.

Dataflow

Regardless of how an accelerator or offload engine is designed, once implemented it requires

efficient dataflow to and from the accelerator. In many cases, scheduling the dataflow between the

accelerator and DRAM can be more of a design challenge than implementing the actual algorithm.

The word dataflow is used to reference the motion of data between system memories and PL

functional units using AXI interconnect and local interconnect.