Whitepaper Real Time Streaming Analytics with Megh Computing on Dell EMC PowerEdge Servers Revision: 1.0 Issue Date: 10/2/2019 Issue Date: 10/2/2019 Abstract This blog evaluates the performance and efficiency of a video analytics pipeline with Deep Learning inference on Intel Programmable Acceleration Card (PAC) FPGA on Dell EMC PowerEdge R740/R740xd server. The objective is not only to report on real-world inferencing performance but also examine the end-to-end use case of the Intel PAC FPGA solution.
Revisions Date Description 02 October 2019 Initial release Acknowledgements This paper was produced by the following people: 2 Name Role Bhavesh Patel Dell EMC Prabhat K Gupta Megh Computing Suchit Subhaschandra Megh Computing Real Time Streaming Analytics with Megh Computing on Dell EMC PowerEdge Servers
Overview of Real time Analytics The streaming analytics market is expected to attain a market size of USD 15.9 billion by 2022, growing at a CAGR of 33.1 percent.1 The major driver of this growth is the massive surge of structured and unstructured data flowing from sources ranging from IoT sensors and events, social media, and web apps tracking data on usage and behavior to transactional and operational data from a broad spectrum of vertical segments.
Figure 1. (Left) Arrayed building blocks are connected via interconnect wires; (Right) Fully featured FPGAs include a variety of advanced building blocks. Figure 1 illustrates the variety of building blocks available in an FPGA. The core fabric implements digital logic with Look-up tables (LUTs), Flip-Flops (FFs), Wires, and I/O pads.
Autonomous Vehicle Manufacturing Figure 2. Examples of mission-critical applications require deterministic, fast response. Intel Programmable Acceleration Card The Intel Programmable Accelerator Card (PAC) features an Intel Arria 10 FPGA, an industry-leading programmable logic built on 20 nm process technology, integrating a rich feature set of embedded peripherals, embedded high-speed transceivers, hard memory controllers and IP protocol controllers.
The Acceleration Stack for Intel Xeon CPU with FPGAs provides optimized and simplified hardware interfaces and software application programming interfaces (APIs), saving developer’s time so they can focus on the unique value-add of their solution. Figure 5. Acceleration Stack for Intel FPGA. Evaluation Methodology In this section, we give a brief overview of sample image-classification application, the hardware inferencing infrastructure used, and the deep learning models that were evaluated. Figure 6.
Application Case Study We consider fraud prevention in the retail supply chain as the application case study. Retail inventory loss (or “shrinkage”) is a serious problem, totaling about $100 billion annually—almost 1.8% of sales— worldwide. Not surprisingly, retailers are seeking solutions. While some are looking at a broad range of possibilities—facial recognition, motion tracking, reading emotions and gestures, and AR—most want to begin with solutions that are low cost and yield immediate returns.
• SPE (Stream Processing Engine) for transforming the data which includes multi-channel H.264 decoder and image re-sizing • DLE (Deep Leaning Engine) for image classification using CNN topologies like Resent50. These are managed by the Arka Runtime which exposes Accelerator Functions-as-a-Service via high level APIs for integration with the applications frameworks like Spark, TensorFlow, kdb+, with no changes. Megh has developed the Deep Learning Engine (DLE) from the ground up for streaming inference.
Figure 7. Dell EMC PowerEdge R740/R740xd. The Dell EMC PowerEdge R740/R740xd is Dell EMC’s latest two-socket, 2U rack server designed to run complex workloads using highly scalable memory, I/O capacity, and network options. The R740/R740xd features the Intel Xeon processor scalable family, up to 24 DIMMs, PCI Express (PCIe) 3.0 enabled expansion slots, and a choice of network interface technologies to cover NIC and rNDC.
Cost* $30000 * 2 = $60000 $17,000 (2x3500 + 2x $5000) Throughput 150 * 2 = 300 fps (Resnet50 8 bit) Latency 240 fps required (max ~400 fps) > 250ms < 100 ms *Cost based on following configurations: Dell PowerEdge R740 Rack Server with dual Xeon Platinum 8280M processors, 64G Memory, 480GB SSD: $31,441 Dell PowerEdge R740 Server with dual Xeon Silver 4214 processors, 16G Memory, 480GB SSD: $3,439 Intel Programmable Acceleration Card with Arria® 10 GX web price: $5,000 Performance Using the data abo
2. Cost: >3x improvement in per channel cost for the same performance of 240 fps TCO (per channel) $8,000 $7,000 $6,000 $5,000 $4,000 $3,000 $2,000 $1,000 $0 CPU FPGA Conclusions Real Time Analytics solution running on Dell EMC Power Edge servers with Intel PACs delivers >8x throughput at <3x lower cost compared to CPU-only solutions to support various Video Analytics use cases for fraud prevention, inventory management and surveillance.