User manual

Table Of Contents
Zynq-7000 AP SoC Technical Reference Manual www.xilinx.com 648
UG585 (v1.11) September 27, 2016
Chapter 22: Programmable Logic Design Guide
Power
An additional benefit of moving operations to the programmable logic is a reduction in power.
Depending on the operations, programmable logic can reduce power per OP by 10-100x. Thus it
might be useful to implement algorithms in the PL solely to reduce system power.
One issue to be aware of is that if the algorithm requires access to external memory, the energy cost
of accessing the external memory could dominate the energy budget making a reduction in the
operation power irrelevant.
Latency
Parallel logic in the PL has a low predictable delay, and cannot be interrupted. For this reason
algorithms which are used to respond to real time events originating the in PL might best be serviced
by algorithms in the programmable logic. This approach can reduce response time from thousands
of clocks to tens of clocks.
22.2.2 Designing PL Accelerators
Programmable logic accelerators are typically created in the RTL languages Verilog or VHDL.
Experienced RTL engineers can use C-code as a golden model to create an efficient hardware
implementation of the algorithm in programmable logic.
For software programmers who are more comfortable with the C language, C-to-Gates compilers
exist which can allow a user to build HW accelerators using the C language. Keeping in mind that C
is a sequential language, automated compiler methods can be used to map the sequential code to
parallel hardware without user intervention.
For instance, a FOR loop such as “for(i=0; i<10; i++){ x[i]=a[i]+b[i]; }” can be
unrolled to create 10 individual adders all operating in parallel.
For video and DSP algorithms, tools such as Matlab Simulink and Xilinx System Generator can be
used to directly create logic from algorithmic flowgraphs. A primary advantage of using Matlab
Simulink is the rich library of functions which can be used to model, simulate and verify the hardware
implementation.
Dataflow
Regardless of how an accelerator or offload engine is designed, once implemented it requires
efficient dataflow to and from the accelerator. In many cases, scheduling the dataflow between the
accelerator and DRAM can be more of a design challenge than implementing the actual algorithm.
The word dataflow is used to reference the motion of data between system memories and PL
functional units using AXI interconnect and local interconnect.