Virtex-7 FPGA XT Connectivity Targeted Reference Design for the VC709 Board User Guide Vivado Design Suite 2014.3 UG962 (v3.
Notice of Disclaimer The information disclosed to you hereunder (the “Materials”) is provided solely for the selection and use of Xilinx products.
Revision History The following table shows the revision history for this document. Date Version Revision 07/26/2013 1.0 Initial Xilinx Release. 06/30/2014 2.0 Updated 64-bit datapath operation from 1,866 MT/s to 1,600 MT/s throughout. Updated Figure 1-1, Figure 1-2, Figure 1-3, and Figure 3-4. Updated Table 1-1. Changed description of Hardware Test Setup and modified VC709 Board Setup in Chapter 2. Replaced Figure 2-13, Figure 2-17, and Figure 2-21.
VC709 Virtex-7 FPGA XT Connectivity TRD www.xilinx.com UG962 (v3.
Table of Contents Revision History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 Chapter 1: Introduction Operation Modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 PCIe DMA Performance Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 3: Functional Description Hardware Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 Base System Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . PCI Express . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Performance Monitor for PCIe . . . . . . . . . . .
Application Driver Interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Interrupt or Polling Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Control Path Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Networking Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Downstream Completion Byte Count (0x9018) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Initial Completion Data Credits for Downstream Port (0x901C) . . . . . . . . . . . . . . . . . . . . Initial Completion Header Credits for Downstream Port (0x9020) . . . . . . . . . . . . . . . . . . PCIe Credits Status - Initial Non-Posted Data Credits for Downstream Port (0x9024) . . . PCIe Credits Status - Initial Non-Posted Header Credits for Downstream Port (0x9028) .
Chapter 1 Introduction This chapter introduces the Virtex®-7 FPGA XT Connectivity Targeted Reference Design (XT Connectivity TRD), summarizes its modes of operation and lists the TRD features.
Chapter 1: Introduction source and generates data packets for the performance demonstration. It connects to a TCP/IP stack for the networking application demonstration. Operation Modes The XT Connectivity TRD offers PCIe DMA Performance, Raw Ethernet Performance and Application modes of operation. All modes are available within a single design bitstream.
Operation Modes • Checker mode: The software user application generates packets that are sent to hardware where data integrity is verified. These modes of operation are configurable by way of the GUI through register programming. See Appendix A, Register Descriptions for more information. Raw Ethernet Performance Mode The raw ethernet performance mode demonstrates the 10G Ethernet path demonstrating the high-performance capabilities of the XT Connectivity TRD.
Chapter 1: Introduction Application Mode The application mode demonstrates end-to-end application of a quad 10G network interface card (NIC). The software driver connects to the networking stack allowing standard networking applications to be used. However, due to lack of an offload engine in hardware, the performance remains low because the software has to handle all of the TCP/IP overhead. In application mode, the packets originate from TCP/IP stack by invoking various standard networking applications.
Resource Utilization Resource Utilization Table 1-1 lists the resources used by the XT Connectivity TRD at the time of production after placement has run. These numbers might vary based on the version of the XT Connectivity TRD and the tools being used to regenerate the design.
Chapter 1: Introduction 14 Send Feedback www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.
Requirements Chapter 2 Getting Started This chapter describes the requirements and provides the procedures for setting up the hardware and software to test and simulate the XT Connectivity TRD. Note: For an updated list of known issues, refer to the Virtex®-7 FPGA VC709 Connectivity Kit known issues and release notes master answer record master answer record (AR#51901). See References in Appendix E.
Chapter 2: Getting Started Note: The VC709 board must be installed in a full-sized chassis because of the presence of the four SFP+ modules on the back side of the board. If the VC709 board is installed in a rack-sized chassis, two of the four SFP+ modules will not be accessible. • PC with monitor or laptop computer (not supplied with the VC709 Connectivity Kit) with Vivado Design Edition installed Hardware Test Setup This section details the hardware setup and use of the application and control GUI.
Hardware Test Setup Note: To provide loopback capability, module connector P2 is connected to P3 and P4 to P5 (module P2 is the bottom one nearer to PCIe finger). For clarity, Figure 2-2 shows the board outside of the PC chassis. X-Ref Target - Figure 2-2 UG962_c2_01_011513 Figure 2-2: SFP+ Connector Locations on the VC709 Board (Left) and Cables Showing Loopback Configuration (Right) Figure 2-3 shows the setup with fiber optic cables installed in the PC chassis.
Chapter 2: Getting Started 5. Connect the 12V ATX power supply adapter cable (6-pin side) to the board at connector J18, and the other side (4-pin) to the ATX Power Supply. Cable is shown in Figure 2-4. X-Ref Target - Figure 2-4 To ATX 4-Pin Peripheral Power Connector To J18 on VC709 Board UG962_c2_03_011513 Figure 2-4: 6. 12V ATX Power Supply Adapter Cable Ensure the connections are secure.
Hardware Test Setup Figure 2-5 shows the location of LED indicators on the board. X-Ref Target - Figure 2-5 LED[0] PCIe Link Up UG962_c2_04_052113 Figure 2-5: GPIO LEDs on the VC709 Board Note: See Appendix D, Troubleshooting if the LEDs are not exhibiting proper behavior.
Chapter 2: Getting Started Figure 2-6 shows the screen images that are displayed during a successful Fedora 16 LiveDVD boot sequence. X-Ref Target - Figure 2-6 First Screen Last Boot Screen Booted UG962_c2_05_011513 Figure 2-6: Fedora 16 LiveDVD Boot Sequence 2. Copy v7_xt_conn_trd.tar.gz from the XT Connectivity TRD file rdf0285-vc709-connectivity-trd-2014-1.zip to the home directory on the Fedora 16 machine 3. Click the Home folder on the left side of the screen to open the folder.
Hardware Test Setup 4. Double-clock on the v7_xt_conn_trd folder. Figure 2-8 shows the contents of the v7_xt_conn_trd folder. X-Ref Target - Figure 2-8 UG962_c2_108_052213 Figure 2-8: 5. Contents of v7_xt_conn_trd Folder Double-click quickstart.sh script (Figure 2-8). This script sets the proper permissions and starts the driver installation GUI, as shown in Figure 2-9.
Chapter 2: Getting Started 6. Click Run in Terminal (Figure 2-9). The XT Connectivity TRD setup window is displayed as shown in Figure 2-10. X-Ref Target - Figure 2-10 UG962_c2_06_052813 Figure 2-10: XT Connectivity TRD Setup As described in the following sections, this window is used to install the drivers for testing the different modes of operation. Hovering the mouse pointer over the choices brings up a short description.
Hardware Test Setup GEN/CHK Performance Mode With the TRD Setup window displayed: 1. In the Driver Mode Selection area select GEN/CHK (Figure 2-10). 2. Click Install. After installation of the GEN/CHK performance mode driver is complete, the XT Connectivity TRD Control and Monitoring Interface is displayed (Figure 2-11). This interface includes control parameters such as test mode (loopback, generator, or checker) and packet length.
Chapter 2: Getting Started 4. Click the Performance Plots tab. The Performance Plots tab (Figure 2-12) shows the system-to-card and card-to-system performance numbers for a specific packet size. You can vary packet size and view performance variations accordingly. X-Ref Target - Figure 2-12 UG962_c2_10_061813 Figure 2-12: 24 GEN/CHK Performance Mode Plots 5. Stop the Gen/Chk test by clicking Stop for Data Path 0. 6.
Hardware Test Setup X-Ref Target - Figure 2-13 8* BF B B Figure 2-13: Performance Mode (GEN/CHK) Block Diagram VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.0) December 20, 2014 www.xilinx.
Chapter 2: Getting Started 7. Close the Virtex-7 XT Connectivity TRD Control and Monitoring Interface by clicking X in the upper right corner. Closing the interface stops any running test, uninstalls the driver, and returns to the TRD Setup window. Raw Ethernet Performance Mode With the TRD setup window displayed: 1. In the Driver Mode Selection area select Raw Ethernet (Figure 2-14). X-Ref Target - Figure 2-14 UG962_c2_11_052813 Figure 2-14: Raw Ethernet Driver Installation 26 Send Feedback www.
Hardware Test Setup 2. Click Install. The Virtex-7 XT Connectivity TRD Control and Monitoring Interface starts with Performance Mode (Raw Ethernet) displayed by default (Figure 2-15). You can configure packet size in this mode. The System Monitor tab monitors system power consumption and die temperature. X-Ref Target - Figure 2-15 UG962_c2_12_052113 Figure 2-15: Raw Ethernet Mode 3. Click Start All to start tests on all channels at once or Start for each datapath to start each channel separately.
Chapter 2: Getting Started 4. Click the Plots tab to see performance on system-to-card and card-to-system (Figure 2-16). X-Ref Target - Figure 2-16 UG962_c2_13_052113 Figure 2-16: Raw Ethernet Driver Performance Plots 5. 28 Stop the Raw Ethernet test by clicking Stop All or stop an individual datapath by clicking Stop associated with the individual datapath. Send Feedback www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.
Hardware Test Setup 6. Click the Block Diagram tab on the right side of the GUI to bring up the block diagram of the XT Connectivity TRD with the datapath highlighted for the selected mode (Figure 2-17). X-Ref Target - Figure 2-17 8* BF B B Figure 2-17: 7. Performance Mode (Raw Ethernet) Block Diagram Close the Virtex-7 XT Connectivity TRD Control and Monitoring Interface by clicking X in the upper right corner.
Chapter 2: Getting Started Application Mode With the TRD Setup window displayed: 1. In the Driver Mode Selection area select Application (Figure 2-18). Note: Do not select the Peer-to-Peer option if a peer machine is not available with 10G NIC or an identical VC709 setup. X-Ref Target - Figure 2-18 UG962_c2_14_052313 Figure 2-18: 30 Send Feedback Application Mode Driver Installation www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.
Hardware Test Setup 2. Click Install. After installing the application mode driver the Virtex-7 XT Connectivity TRD Control and Monitoring Interface starts (see Figure 2-19). However, in application mode you cannot start or stop a test because the traffic is generated by the networking stack. X-Ref Target - Figure 2-19 UG962_c2_15_052113 Figure 2-19: Application Driver Interface 3. Open a command prompt on the host PC and ping the four network interfaces by entering: % ping 10.60.0.1 % ping 10.60.1.
Chapter 2: Getting Started The results should be similar to the output shown in Figure 2-20. X-Ref Target - Figure 2-20 UG962_c2_16_052113 Figure 2-20: System Output from Ping of Network Interfaces 32 Send Feedback www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.
Hardware Test Setup 4. Click the Block Diagram tab on the right side of the GUI to bring up the block diagram of the XT Connectivity TRD with the datapath highlighted for the selected mode. See Figure 2-21. X-Ref Target - Figure 2-21 8* BF B B Figure 2-21: 5. Access the Block Diagram Close the Virtex-7 XT Connectivity TRD Control and Monitoring Interface by clicking X in the upper right corner.
Chapter 2: Getting Started NIC Statistics The NIC statistics can be obtained using the command: ethtool -S ethX The error statistics are obtained by reading the registers provided by the Ethernet Statistics IP. PHY registers can be read using the command: ethtool -d ethX Certain statistics can also be obtained from the command: ifconfig ethX Rebuilding the XT Connectivity TRD The design is rebuilt using Vivado design tools.
Rebuilding the XT Connectivity TRD Linux Implementation Flow 1. In Linux, navigate to the XT Connectivity TRD vivado folder. Do one of the following: • To run the Vivado flow in batch mode, at the prompt enter: vivado -mode batch -source scripts/v7_xt_conn_trd_batch_impl.tcl Synthesis and implementation results can be found in the runs/xt_conn_trd.runs directory. • To run the flow in GUI mode, at the prompt enter: vivado -source scripts/v7_xt_conn_trd.
Chapter 2: Getting Started 2. Connect the computer and power supply to the VC709 board as shown Figure 2-23. 3. Turn on the power to the VC709 board (SW12). X-Ref Target - Figure 2-23 USB cable standard-A plug to micro-B plug Computer (PC) To U26 USB JTAG Receptacle To J18 Power Supply 100VAC–240VAC Input 12 VDC 5.0A Output UG966_c2_106_051513 Figure 2-23: Cable Installation for VC709 Board Programming 4. Copy the v7_xt_conn_trd directory to the computer using the Xilinx programming tools.
Programming the FPGA Programming the FPGA To program the FPGA with a bit file you generated, perform the following steps: 1. Perform step 1 through step 4 from the VC709 Board Setup, page 16. 2. Verify the VC709 board is powered by the external power supply provided in the VC709 Connectivity Kit as shown in Figure 2-24. 3. Connect the PC with Vivado Design Suite to the VC709 board using a standard-A plug to micro-B plug USB cable as shown in Figure 2-24.
Chapter 2: Getting Started Simulation This section details the out-of-box simulation environment provided with the design. This simulation environment provides you with a feel for the general functionality of the design. The simulation environment shows basic traffic movement end-to-end. Overview The out-of-box simulation environment consists of the design under test (DUT) connected to the Virtex®-7 XT FPGA Root Port Model for PCIe.
Simulation Simulation Using QuestaSim To run the simulation in QuestaSim: 1. Copy and unzip rdf0285-vc709-connectivity-trd-2014-1.zip to a working directory. 2. Go to the appropriate operating system instructions: • For Windows, go to Windows QuestaSim Simulation Flow • For Linux, go to Linux QuestaSim Simulation Flow Windows QuestaSim Simulation Flow 1. In Windows, open Vivado Tcl Shell, and navigate to the XT Connectivity TRD vivado folder.
Chapter 2: Getting Started Windows Vivado Simulator Flow 1. In Windows, open Vivado Tcl Shell, and navigate to the XT Connectivity TRD vivado folder. Do one of the following: • To run the Vivado flow in batch mode, type at the prompt: source scripts/v7_xt_conn_trd_batch_sim_xsim.tcl This command runs the simulation flow in Tcl mode. • To run the flow in GUI mode, open the Vivado GUI, and from the Tcl Console, type: cd /vivado source scripts/v7_xt_conn_trd.
Simulation User-Controlled Macros The simulation environment allows you to define macros that control the DUT configuration (Table 2-2). These values can be changed in the user_defines.v file. Table 2-2: User-Controlled Macros Macro Name Description Default Value CH0 Defined Enables Channel-0 initialization and traffic flow. CH1 Defined Enables Channel-1 initialization and traffic flow. DETAILED_LOG Undefined Enables a detailed log of each transaction.
Chapter 2: Getting Started Test Selection Table 2-5 describes the various tests provided by the out-of-box simulation environment. Table 2-5: Test Descriptions Test Name Description basic_test Basic Test: This test runs two packets for each DMA channel. One buffer descriptor defines one full packet in this test. packet_spanning Packet Spanning Multiple Descriptors: This test spans a packet across two buffer descriptors. It runs two packets for each DMA channel.
Hardware Architecture Chapter 3 Functional Description This chapter describes the hardware and software architectures. Hardware Architecture The hardware design architecture is described in the following sections: • Base System Components: PCIe-DMA and the DDR3 virtual FIFO components • Application Components: User application design • Utility Components: Power monitor block and the SI5324 Jitter Attenuator block for 156.
Chapter 3: Functional Description Performance Monitor for PCIe This performance monitor snoops on the 256-bit Requester and Completer PCIe interface operating at 250 MHz and provides the following measurements that are updated once every second: • Count of active beats upstream which include the TLP headers for various transactions • Count of active beats downstream which include the TLP headers for various transactions • Count of payload bytes for upstream memory write transactions that includes buff
Hardware Architecture Figure 3-1 shows the buffer descriptor layout for S2C and C2S directions.
Chapter 3: Functional Description Buffer Descriptor Fields (Cont’d) Table 3-1: Descriptor Field Functional Description ByteCount[19:0] Byte Count: In S2C direction, this indicates DMA the byte count queued up for transmission. In C2S direction, DMA updates this field to indicate the byte count updated in system memory. RsvdByteCount[19 :0] Reserved Byte Count: In S2C direction, this is equivalent to the byte count queued up for transmission.
Hardware Architecture The packet interface signals (for example, user control and the end of packet) are built from the control fields in the descriptor. The information present in the user control field is made available during the start of packet. The reference design does not use the user control field.
Chapter 3: Functional Description Virtual FIFOs The XT Connectivity TRD uses DDR3 space as multiple FIFOs for storage.
Hardware Architecture • • Width and Clock conversion: • 256-bit at 250 MHz from DMA S2C interface and 64-bit at 156.25 MHz from XGEMAC-RX interface to 512-bit at 200 MHz to AXI-VFIFO interface on writes • 512-bit at 200 MHz from AXI-VFIFO interface to 256-bit at 250 MHz to DMA interface and 64-bit at 156.25 MHz to XGEMAC-TX interface on reads Buffer for storage to avoid frequent back-pressure to PCIe-DMA For further information, see LogiCORE IP AXI4-Stream Interconnect v1.
Chapter 3: Functional Description verified by the checker. Hardware generator and checker modules are enabled if Enable Generator and Enable Checker bits are set from software. Packet Format and Data Integrity The data integrity is checked by the use of CRC32 polynomial. This makes the generator and checker modules generic. The data format is incremental sequence numbers but the integrity check is based upon CRC32.
Hardware Architecture Network Path Components When using the Application mode of operation, the XT Connectivity TRD operates as a NIC (network interface card). A NIC is a device used to connect computers to a LAN (local area network). The software driver interfaces to the networking stack (or the TCP-IP stack) and the Ethernet frames are transferred between system memory and Ethernet MAC in hardware using the PCIe interface.
Chapter 3: Functional Description The receive interface logic: • Receives incoming frames from XGEMAC and performs address filtering (if enabled to do so) • Decides whether to drop a packet or pass it ahead to the system for further processing based on packet status provided by XGEMAC-RX interface XGEMAC-RX interface does not allow back-pressure (once a packet reception has started it completes the entire packet). The receive interface logic stores the incoming frame in a local receive FIFO.
Hardware Architecture Figure 3-6 shows the block diagram of the power monitoring logic. X-Ref Target - Figure 3-6 Register Interface XADC Block RAM PicoBlaze Processor UCD9248 Read Logic UG962_c3_06_011513 Figure 3-6: Power Monitor Logic Block Overview PicoBlaze is optimized for Xilinx FPGAs and require minimal interfacing logic resulting in a tiny footprint. PicoBlaze continuously reads the raw temperature, voltage and current values from the internal XADC and the external UCD9248 devices.
Chapter 3: Functional Description Figure 3-7 shows the organization of the hardware and software for the Picoblaze-based clock control circuitry. X-Ref Target - Figure 3-7 Hardware Definition clock_control.vhd Software Definition kcpsm6.vhd KCPSM6 Assembler clock_control_program.psm clock_control_program.vhd Primary definition of I2C signalling i2c_routines.psm I2C Transactions to read/write PCA9548 Switch, Si570, and Si5324 vc709_i2c_routines.psm Programs registers of Si5324 to generate 156.
Hardware Architecture User Register Interface DMA provides AXI4 target interface for user space registers as shown in Figure 3-8.
Chapter 3: Functional Description Software Design Description The software architecture of the XT Connectivity TRD framework is comprised of one or more Linux kernel-space driver modules with one user-space application that controls the design operation. Note: For details on available user APIs, refer to the documentation supplied with the driver sources. The software is comprised of building blocks designed with scalability in mind.
Software Design Description Figure 3-9 illustrates the Ethernet Data Flow.
Chapter 3: Functional Description written to the DDR3 memory are read and then sent to the 10G MAC and 10G PHY where it is looped back. Data received at the 10G MAC is then again stored in DDR3 memory and transferred back to the DMA creating a loopback. On the receive side, the DMA pushes the packets to the software driver through the PCIe Endpoint. The driver receives the packets in its data buffers, verifies the data, and discards the buffers.
Software Design Description Datapath Components Application Traffic Generator The application traffic generator generates the raw data when enabled in the user interface. The application opens the interface of the application driver through exposed driver entry points. The application transfers the data using read and write entry points provided by application driver interface. The application traffic generator also performs the data integrity test if enabled.
Chapter 3: Functional Description Control Path Components Graphical User Interface The Control and Monitor GUI is used to monitor device status, run performance tests, and to display statistics. It communicates the user-configured test parameters to the user traffic generator application which in turn generates traffic with the specified parameters.
Software Design Description Application Mode The Ethernet application mode datapath components and control path components shown in Figure 3-12 are described here in this section.
Chapter 3: Functional Description TCP/IP Stack The TCP/IP stack has defined hooks for the Ethernet driver to attach and allow communication of all standard networking applications with the driver. TCP/IP stack calls appropriate driver entry points to transfer data to driver. Driver Entry Points The driver has several entry points. Some points are used for data connectivity and others are used for Ethernet configurations. Standard network tools use driver entry points for Ethernet configurations.
DMA Descriptor Management Driver implementation User application driver sends the received socket buffer packet to DMA driver for mapping to PCI space and sending it to DMA. On the receiver side buffers are pre-allocated to store incoming packets. These packets are allocated from networking stack. The received packets are added to network stack queue for sending it to application for further processing.
Chapter 3: Functional Description Initialization Phase The Driver prepares descriptor rings, each containing 1,999 descriptors, for each DMA channel. In the current design, driver will thus prepare three rings. Transmit (S2C) Descriptor Management The dark blocks in Figure 3-13 indicate descriptors that are under hardware control while the light blocks indicate descriptors that are under software control.
User Interface–Control and Monitor GUI Receive (C2S) Descriptor Management In Figure 3-14 the dark blocks indicate descriptors that are under hardware control, and the light blocks indicate descriptors that are under software control.
Chapter 3: Functional Description 66 Send Feedback www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.
Theoretical Estimate Chapter 4 Performance Estimation This chapter presents a theoretical estimation of performance. The best effort has been made to achieve performance as close to this but it might not always be realistic. Theoretical Estimate PCIe–DMA This section provides an estimate on performance of the PCIe link using Northwest Logic Packet DMA. PCIe is a serialized, high bandwidth, and scalable point-to-point protocol that provides highly reliable data transfer operations.
Chapter 4: Performance Estimation The C2S DMA engine (which deals with data reception—writing data to system memory) first does a buffer descriptor fetch. Using the buffer address in the descriptor, it issues memory writes to the system. Once the actual payload in transferred to the system it sends a memory write to update the buffer descriptor. Table 4-1 shows the overhead incurred during data transfer in the C2S direction.
Theoretical Estimate Table 4-2: PCIe Performance Estimation with DMA in the S2C Direction (Cont’d) Transaction Overhead ACK Overhead Comment MRD - S2C Buffer Fetch = 20/128 8/128 Buffer fetch S2C engine (TRN-TX). MRRS=128B CPLD - S2C buffer Completion = 20/64 = 40/128 8/64=16/128 Buffer reception S2C engine (TRN-RX). Since RCB=64B, 2 completions are received for every 128 byte read request MWR - S2C Desc Update =20+4/4096=0.75/128 8/4096=0.25/128 Descriptor update S2C engine (TRN-TX).
Chapter 4: Performance Estimation Assuming 5% overhead for refresh or other actions, the total achievable efficiency is ~81% which is ~83 Gb/s throughput on one instance of the Virtual FIFO controller. Ten Gigabit Ethernet The design uses four instances of 10G Ethernet MAC and 10GBASE-R PHY. Each MAC operates 64 bits at 156.25 MHz providing throughput of 10 Gb/s. Connection between 10G MAC and 10G PHY is through XGMII interface (64-bit SDR).
Overview Chapter 5 Designing with the Platform Overview The XT Connectivity TRD is a framework for system designers to extend or modify. This chapter outlines how to modify XT Connectivity TRD. The modifications are grouped under three categories: Software-only modifications: Do not require the design to be re-implemented. For example, modification of software components only (drivers, demonstration parameters, and others).
Chapter 5: Designing with the Platform Descriptor Ring Size The number of descriptors to be set up in the descriptor ring can be defined as a compile time option. To change the size of the buffer descriptor ring used for DMA operations, modify DMA_BD_CNT in linux_driver/xdma/xdma_base.c. Smaller rings can affect throughput adversely, which can be observed by running the performance tests.
Software-only Modifications • • linux_driver_app\driver\xrawdata0\sguser.c For Raw Ethernet mode, edit macro MAX_BUFF_INFO in the files: • linux_driver_app\driver\xrawdata1\sguser.c • linux_driver_app\driver\xrawdata2\sguser.c • linux_driver_app\driver\xrawdata3\sguser.c • Edit macro MAX_BUFF_INFO in the file: linux_driver_app\driver\xrawdata1\sguser.c The depth increase will help in queuing more packets of receiver side and transmit housekeeping.
Chapter 5: Designing with the Platform 74 Send Feedback www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.
Appendix A Register Descriptions This appendix describes the custom registers implemented in the user space. All registers are 32-bits wide. Register bit positions are read 31 to 0 from left to right. All bits undefined in this section are reserved and will return zero on read. All registers return to their default values on reset. Address holes return a value of zero on being read. All registers are mapped to BAR0 and relevant offsets are provided.
Appendix A: Register Descriptions Registers in DMA for interrupt handling are grouped under a category called common registers which are at an offset of 0x4000 from BAR0 (see Figure A-1).
DMA Registers Table A-3 through Table A-6 describe the channel-specific registers. Engine Control (0x0004) Table A-3: DMA Engine Control Register Bit Field Mode Default Value 0 Interrupt Enable RW 0 Enables interrupt generation 1 Interrupt Active RW1C 0 Interrupt active is set whenever an interrupt event occurs. Write 1 to clear. Description 2 Descriptor Complete RW1C 0 Interrupt active was asserted due to completion of descriptor.
Appendix A: Register Descriptions Software Descriptor Pointer (0x000C) Table A-5: DMA Software Descriptor Pointer Register Bit Field Mode Default Value [31:5] Reg_SW_Desc_Ptr RW 0 [4:0] Reserved RO 5'b00000 Description Software descriptor pointer is the location of the first descriptor in chain which is still owned by the software Required for 32-byte alignment Completed Byte Count (0x001C) Table A-6: Bit DMA Completed Byte Count Register Field Mode Default Value Description [31:2] DMA
DMA Registers Table A-7: DMA Common Control and Status Register (Cont’d) Bit Field Mode Default Value 23:16 S2C Interrupt Status RO 0 Bit[i] indicates interrupt status of S2C DMA engine[i]. If S2C engine is not present, then this bit is read as zero. 31:24 C2S Interrupt Status RO 0 Bit[i] indicates interrupt status of C2S DMA engine[i]. If C2S engine is not present, then this bit is read as zero.
Appendix A: Register Descriptions Design Status (0x9008) Table A-10: Bit Design Status Register Mode Default Value 0 RO 0 9:2 RO FF 29:26 RO 00 Description DDR3 memory controller initialization/calibration done for both DDR3 devices (operational status from hardware provided to the XT Connectivity TRD) ddr3_fifo_empty: Indicates the DDR3 FIFO per port is empty 10G BASE-R PHY link status: Bit 29 - PHY3 Bit 28 - PHY2 Bit 27 - PHY1 Bit 29 - PHY0 0 indicates the link is down, 1 indicates the link i
DMA Registers Upstream Memory Write Byte Count (0x9014) Table A-13: Register PCIe Performance Monitor - Upstream Memory Write Byte Count Bit Mode Default Value 1:0 RO 00 31:2 RO 0 Description Sample count. Increments every second. Upstream memory write byte count: This field contains the payload byte count for upstream PCIe memory write transactions and has a resolution of four bytes.
Appendix A: Register Descriptions PCIe Credits Status - Initial Non-Posted Header Credits for Downstream Port (0x9028) Table A-18: PCIe Performance Monitor - Initial NPH Credits Register Bit Mode Default Value 7:0 RO 00 Description INIT_FC_NPH: Captures initial flow control credits for non-posted header for host system PCIe Credits Status - Initial Posted Data Credits for Downstream Port (0x902C) Table A-19: PCIe Performance Monitor - Initial PD Credits Register Bit Mode Default Value Descrip
DMA Registers Table A-23: VCCAUX Power Consumption (0x9044) Bit Mode Default Value 31:0 RO 00 Table A-24: Mode Default Value 31:0 RO 00 Mode Default Value 31:0 RO 00 Mode Default Value 31:0 RO 00 Mode Default Value 31:0 RO 00 Mode Default Value 31:0 RO 00 Mode Default Value 31:0 RO 00 Power for VCC2v5 Description Power for VCC1v5 Description Power for MGT_AVCC Description Power for MGT_AVTT VCC_AUXIO Power Consumption (0x9060) Bit Mode Default Value 31:0 RO
Appendix A: Register Descriptions Table A-32: MGT_VCCAUX Power Consumption (0x9068) Bit Mode Default Value 31:0 RO 00 Table A-33: Power for MGT_VCCAUX VCC1v8 Power Consumption (0x906C) Bit Mode Default Value 31:0 RO 00 Table A-34: Description Description Power for VCC1v8 Die Temperature (0x9070) Bit Mode Default Value 31:0 RO 00 Description Die temperature of the FPGA Performance Mode: Generator/Checker/Loopback Registers Channel-0 Table A-35 through Table A-39 describe the regi
DMA Registers Table A-39: PCIe Performance Module #0 Count Wrap Register (0x9110) Bit Mode Default Value 31:0 RW 511 Description Wrap count: Value at which sequence number should wrap around XGEMAC-Related User Registers Table A-40 through Table A-51 describe the registers that are not part of the IP but were implemented strictly for the XT Connectivity TRD.
Appendix A: Register Descriptions Table A-46: Bit Mode Default Value 0 RW 0 Promiscuous Mode Enable for XGEMAC2 31 RO 0 Receive FIFO Overflow status for XGEMAC2 Table A-47: Description XGEMAC2 MAC Address Lower Register (0x941C) Bit Mode Default Value 31:0 RW 32'hAABBCCDD Table A-48: Description Mode Default Value 15:0 RW 16'hEEFF Description MAC address upper XGEMAC3 Address Filtering Control Register (0x9424) Bit Mode Default Value 0 RW 0 Promiscuous mode enable for XGE
Appendix B Directory Structure and File Descriptions The hardware and software directory structures for the XT Connectivity TRD is shown in Figure B-1. X-Ref Target - Figure B-1 v7_xt_conn_trd linux_driver_app 1 hardware sources quickstart.sh constraints vivado scripts hdl readme.txt runs ip_catalog software linux ip_cores testbench ready_to_test readme.txt Note: 1. The linux_driver_app directory is extracted from v7_xt_conn_trd.tar.gz.
Appendix B: Directory Structure and File Descriptions Scripts Folder The scripts folder contains unified scripts for simulation/implementation. Runs Folder The runs folder is the result folder for simulation/implementation. Sources Folder The sources folder contains all source deliverables. Constraints Folder The constraints folder contains the contraints files. HDL Folder The hdl folder contains the HDL source code.
Directory Structure • include: Contains the include files used in the driver • Makefile: Driver compilation • gui: Contains executable file for running the Control and Monitor GUI. • doc: Contains Doxygen generated HTML files describing software driver details. • Scripts to compile and execute drivers Readme.txt File Text file that provides information about the software directory structure and known issues. Quickstart.
Appendix B: Directory Structure and File Descriptions 90 Send Feedback www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.
Appendix C Software Application Compilation and Network Performance This appendix describes the software application compilation procedure and private Network setup. Note: The traffic generator needs CPP compiler which is not shipped with live OS - it needs additional installation for compilation. Likewise, Java compilation tools are not shipped as part of LiveDVD - hence GUI compilation will need additional installations.
Appendix C: Software Application Compilation and Network Performance The four interfaces are ethX, eth(X+1), eth(X+2) and eth(X+3) with IP addresses 10.60.0.1, 10.60.1.1, 10.60.2.1 and 10.6.3.1 respectively. 2. Open a command prompt and enter: $ netserver -p 5005 This sets up the netserver to listen at port 5005 3. Open another command prompt and enter: $ netperf -H 10.60.0.1 -p 5005 This runs netperf (TCP_STREAM test for 10 seconds) and targets the server at port 5005. To repeat this for the 10.60.1.
Private Network Setup and Test The XT Connectivity TRD setup GUI is displayed as shown in Figure C-2. X-Ref Target - Figure C-2 UG962_aC_02_052813 Figure C-2: Private LAN Driver Setup 4. Select Application mode and the Peer to Peer option. 5. Click Install. The Application mode drivers are installed. 6. Install the Application mode driver on both PCs using the GUI as described in Chapter 2, Getting Started. 7.
Appendix C: Software Application Compilation and Network Performance 94 Send Feedback www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.
Appendix D Troubleshooting For the latest information on known issues, refer the VC709 Connectivity Kit master answer record AR# 51901. Assumptions: • The setup procedures described in Getting Started, page 15 have been followed. • The PCIe link is up, the Endpoint device is discovered by the host and can be seen with the lspci command. • The GPIO LEDs indicate proper operation of the functions listed in Table 2-1, page 18.
Appendix D: Troubleshooting Table D-1: Troubleshooting Tips (Cont’d) Number Issue Possible Solution 4 Performance numbers are very low and the system hangs when uninstalling drivers. Host computer uses: • An Intel motherboard • Fedora 16 operating system If the Fedora 16 operating system is installed on a hard drive, edit the /etc/grab2.cfg file and add IOMMU=pt64 to the kernel boot-up options.
Appendix E Additional Resources Xilinx Resources For support resources such as Answers, Documentation, Downloads, and Forums, see the Xilinx® support website at: www.xilinx.com/support For continual updates, add the Answer Record to your myAlerts: www.xilinx.com/support/myalerts For a glossary of technical terms used in Xilinx documentation, see: www.xilinx.com/company/terms.
Appendix E: Additional Resources 10. Understanding Performance of PCI Express Systems (WP350) 11. Synthesis and Simulation Design Guide (UG626) The following websites provide supplemental material useful with this guide: 12. Northwest Logic: http://nwlogic.com/ (DMA back-end core) 13. Avago Technologies: http://www.avagotech.com (AFBR-703SDZ 10Gb Ethernet 850 nm 10GBASE-SR SFP+ Transceiver) 14. Fedora: http://fedoraproject.org/ (Fedora Linux-based operating system) 15.
Appendix F Warranty Statement THIS LIMITED WARRANTY applies solely to standard hardware development boards and standard hardware programming cables manufactured by or on behalf of Xilinx (“Development Systems”).
Appendix F: Warranty potential effects arising from the presence of hazardous substances in WEEE. Return the marked products to Xilinx for proper disposal. Further information and instructions for free-of-charge return available at: www.xilinx.com\ehs\weee. 100 Send Feedback www.xilinx.com VC709 Virtex-7 FPGA XT Connectivity TRD UG962 (v3.