Abstract
With the unique feature of fine-grained parallelism, field-programmable gate arrays (FPGAs) show great potential for streaming algorithm acceleration. However, the lack of a design framework, restrictions on FPGAs, and ineffective tools impede the utilization of FPGAs in practice. In this study, we provide a design paradigm to support streaming algorithm acceleration on FPGAs. We first propose an abstract model to describe streaming algorithms with homogeneous sub-functions (HSF) and stable data dependency (SDD), which we call the HSF-SDD model. Using this model, we then develop an FPGA framework, PE-Ring, that has the advantages of (1) fully exploiting algorithm parallelism to achieve high performance, (2) leveraging block RAM to serve large scale parameters, and (3) enabling flexible parameter adjustments. Based on the proposed model and framework, we finally implement a specific converter to generate the register-transfer level representation of the PE-Ring. Experimental results show that our method outperforms ordinary FPGA design tools by one to two orders of magnitude. Experiments also demonstrate the scalability of the PE-Ring.
- Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi. 2013. Mergeable summaries. ACM Trans. Database Syst. 38, 4 (2013), 26. Google Scholar
Digital Library
- Arvind Arasu, Ken Eguro, Manas Joglekar, Raghav Kaushik, Donald Kossmann, and Ravi Ramamurthy. 2015. Transaction processing on confidential data using cipherbase. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. IEEE, 435--446. Google Scholar
Cross Ref
- Subi Arumugam, Alin Dobra, Christopher M. Jermaine, Niketan Pansare, and Luis Perez. 2010. The DataPath system: A data-centric analytic processing engine for large data warehouses. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 519--530. Google Scholar
Digital Library
- Jeff A. Bilmes et al. 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int. Comput. Sci. Inst. 4, 510 (1998), 126.Google Scholar
- Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB Endow. 1, 2 (2008), 1542--1552.Google Scholar
Digital Library
- Ce Guo, Haohuan Fu, and Wayne Luk. 2012. A fully-pipelined expectation-maximization engine for Gaussian mixture models. In Proceedings of the 2012 International Conference on Field-Programmable Technology (FPT’12). IEEE, 182--189. Google Scholar
Cross Ref
- Informix. 2015. Informix-subsequence similarity search. Retrieved October 25, 2016 from https://crl.ptopenlab.com:8800/accelerator/accelerator/4/.Google Scholar
- Changhoon Kim, Matthew Caesar, Alexandre Gerber, and Jennifer Rexford. 2009. Revisiting route caching: The world should be flat. In Proceedings of the International Conference on Passive and Active Network Measurement. Springer, 3--12. Google Scholar
Digital Library
- NSL Phani Kumar, Sanjiv Satoor, and Ian Buck. 2009. Fast parallel expectation maximization for gaussian mixture models on GPUs using CUDA. In Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009 (HPCC’09). IEEE, 103--109.Google Scholar
- Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the International Symposium on Code Generation and Optimization, 2004 (CGO’04). IEEE, 75--86. Google Scholar
Digital Library
- Oskar Mencer. 2006. ASC: A stream compiler for computing with FPGAs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25, 9 (2006), 1603--1617. Google Scholar
Digital Library
- Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2006. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31, 3 (2006), 1095--1133. Google Scholar
Digital Library
- Rene Mueller, Jens Teubner, and Gustavo Alonso. 2010. Glacier: A query-to-hardware compiler. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 1159--1162. Google Scholar
Digital Library
- Rene Mueller, Jens Teubner, and Gustavo Alonso. 2012. Sorting networks on FPGAs. VLDB J. 21, 1 (2012), 1--23. Google Scholar
Digital Library
- Netezza. 2011. Retrieved October 25, 2016 from http://www.ibm.com/software/data/netezza.Google Scholar
- OpenCL. 2013. Retieved October 25, 2016 from https://www.altera.com/products/design-software/embedded-software-devel opers/opencl/overview.html.Google Scholar
- RIFFA. 2013. http://riffa.ucsd.edu/. (2013).Google Scholar
- Doruk Sart, Abdullah Mueen, Walid Najjar, Eamonn Keogh, and Vit Niennattrakul. 2010. Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 1001--1006. Google Scholar
Digital Library
- Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, and Sameh Asaad. 2012. Database analytics acceleration using FPGAs. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 411--420. Google Scholar
Digital Library
- Takashi Takenaka, Masamichi Takagi, and Hiroaki Inoue. 2012. A scalable complex event processing framework for combination of SQL-based continuous queries and C/C++ functions. In Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL). IEEE, 237--242. Google Scholar
Cross Ref
- Jens Teubner and Rene Mueller. 2011. How soccer players would do stream joins. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 625--636. Google Scholar
Digital Library
- Jens Teubner, Rene Muller, and Gustavo Alonso. 2011. Frequent item computation on a chip. IEEE Trans. Knowl. Data Eng. 23, 8 (2011), 1169--1181. Google Scholar
Digital Library
- UCRSuite. 2012. Retrieved October 25, 2016 from http://www.cs.ucr.edu/%7eeamonn/UCRsuite.html.Google Scholar
- Vivado. 2012. Retrieved October 25, 2016 http://www.xilinx.com/products/design-tools/vivado.html.Google Scholar
- Haixun Wang and Carlo Zaniolo. 1999. User-defined aggregates in database languages. In Proceedings of the International Symposium on Database Programming Languages. Springer, 43--60.Google Scholar
- Zilong Wang, Sitao Huang, Lanjun Wang, Hao Li, Yu Wang, and Huazhong Yang. 2013. Accelerating subsequence similarity search based on dynamic time warping distance with FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 53--62. Google Scholar
Digital Library
- WebDocs. 2003. October 25, 2016 from http://fimi.ua.ac.be/data/.Google Scholar
- Xuechao Wei, Yun Liang, Tao Wang, Songwu Lu, and Jason Cong. 2017. Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems. In Proceedings of the 22th Asia and South Pacific Design Automation Conference (ASP-DAC). Google Scholar
Cross Ref
- Louis Woods, Gustavo Alonso, and Jens Teubner. 2015. Parallelizing data processing on FPGAs with shifter lists. ACM Trans. Reconfig. Technol. Syst. 8, 2 (2015), 7. Google Scholar
Digital Library
- Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An efficient data clustering method for very large databases. In ACM Sigmod Record, Vol. 25. ACM, 103--114.Google Scholar
Digital Library
- Wei Zuo, Yun Liang, Peng Li, Kyle Rupnow, Deming Chen, and Jason Cong. 2013. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 9--18. Google Scholar
Digital Library
Index Terms
Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAs
Recommendations
Reconfigurable stream-processing architecture for sparse linear solvers
ARC'11: Proceedings of the 7th international conference on Reconfigurable computing: architectures, tools and applicationsApplications such as electrical power grid operation and planning rely on high-performance linear solvers involving large sparse matrices. Previous custom sparse solver hardware implemented on a Field Programmable Gate Array (FPGA) has shown an 8-fold ...
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs
This article describes the design and implementation of a novel compilation flow that implements circuits in FPGAs from a streaming programming language. The streaming language supported is called FPGA Brook and is based on the existing Brook language. ...
Software-programmable digital pre-distortion on new generation FPGAs
In this paper we present a software programmable design flow that facilitates the implementation and integration of efficient digital pre-distortion (DPD) solutions on the leading-edge field programmable gate arrays, combining industry-standard embedded ...






Comments