skip to main content
research-article

Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAs

Published:13 July 2017Publication History
Skip Abstract Section

Abstract

With the unique feature of fine-grained parallelism, field-programmable gate arrays (FPGAs) show great potential for streaming algorithm acceleration. However, the lack of a design framework, restrictions on FPGAs, and ineffective tools impede the utilization of FPGAs in practice. In this study, we provide a design paradigm to support streaming algorithm acceleration on FPGAs. We first propose an abstract model to describe streaming algorithms with homogeneous sub-functions (HSF) and stable data dependency (SDD), which we call the HSF-SDD model. Using this model, we then develop an FPGA framework, PE-Ring, that has the advantages of (1) fully exploiting algorithm parallelism to achieve high performance, (2) leveraging block RAM to serve large scale parameters, and (3) enabling flexible parameter adjustments. Based on the proposed model and framework, we finally implement a specific converter to generate the register-transfer level representation of the PE-Ring. Experimental results show that our method outperforms ordinary FPGA design tools by one to two orders of magnitude. Experiments also demonstrate the scalability of the PE-Ring.

References

  1. Pankaj K. Agarwal, Graham Cormode, Zengfeng Huang, Jeff M. Phillips, Zhewei Wei, and Ke Yi. 2013. Mergeable summaries. ACM Trans. Database Syst. 38, 4 (2013), 26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Arvind Arasu, Ken Eguro, Manas Joglekar, Raghav Kaushik, Donald Kossmann, and Ravi Ramamurthy. 2015. Transaction processing on confidential data using cipherbase. In Proceedings of the 2015 IEEE 31st International Conference on Data Engineering. IEEE, 435--446. Google ScholarGoogle ScholarCross RefCross Ref
  3. Subi Arumugam, Alin Dobra, Christopher M. Jermaine, Niketan Pansare, and Luis Perez. 2010. The DataPath system: A data-centric analytic processing engine for large data warehouses. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 519--530. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Jeff A. Bilmes et al. 1998. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models. Int. Comput. Sci. Inst. 4, 510 (1998), 126.Google ScholarGoogle Scholar
  5. Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, and Eamonn Keogh. 2008. Querying and mining of time series data: Experimental comparison of representations and distance measures. Proc. VLDB Endow. 1, 2 (2008), 1542--1552.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ce Guo, Haohuan Fu, and Wayne Luk. 2012. A fully-pipelined expectation-maximization engine for Gaussian mixture models. In Proceedings of the 2012 International Conference on Field-Programmable Technology (FPT’12). IEEE, 182--189. Google ScholarGoogle ScholarCross RefCross Ref
  7. Informix. 2015. Informix-subsequence similarity search. Retrieved October 25, 2016 from https://crl.ptopenlab.com:8800/accelerator/accelerator/4/.Google ScholarGoogle Scholar
  8. Changhoon Kim, Matthew Caesar, Alexandre Gerber, and Jennifer Rexford. 2009. Revisiting route caching: The world should be flat. In Proceedings of the International Conference on Passive and Active Network Measurement. Springer, 3--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. NSL Phani Kumar, Sanjiv Satoor, and Ian Buck. 2009. Fast parallel expectation maximization for gaussian mixture models on GPUs using CUDA. In Proceedings of the 11th IEEE International Conference on High Performance Computing and Communications, 2009 (HPCC’09). IEEE, 103--109.Google ScholarGoogle Scholar
  10. Chris Lattner and Vikram Adve. 2004. LLVM: A compilation framework for lifelong program analysis 8 transformation. In Proceedings of the International Symposium on Code Generation and Optimization, 2004 (CGO’04). IEEE, 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Oskar Mencer. 2006. ASC: A stream compiler for computing with FPGAs. IEEE Trans. Comput.-Aid. Des. Integr. Circ. Syst. 25, 9 (2006), 1603--1617. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ahmed Metwally, Divyakant Agrawal, and Amr El Abbadi. 2006. An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31, 3 (2006), 1095--1133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rene Mueller, Jens Teubner, and Gustavo Alonso. 2010. Glacier: A query-to-hardware compiler. In Proceedings of the 2010 ACM SIGMOD International Conference on Management of Data. ACM, 1159--1162. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Rene Mueller, Jens Teubner, and Gustavo Alonso. 2012. Sorting networks on FPGAs. VLDB J. 21, 1 (2012), 1--23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Netezza. 2011. Retrieved October 25, 2016 from http://www.ibm.com/software/data/netezza.Google ScholarGoogle Scholar
  16. OpenCL. 2013. Retieved October 25, 2016 from https://www.altera.com/products/design-software/embedded-software-devel opers/opencl/overview.html.Google ScholarGoogle Scholar
  17. RIFFA. 2013. http://riffa.ucsd.edu/. (2013).Google ScholarGoogle Scholar
  18. Doruk Sart, Abdullah Mueen, Walid Najjar, Eamonn Keogh, and Vit Niennattrakul. 2010. Accelerating dynamic time warping subsequence search with GPUs and FPGAs. In Proceedings of the 2010 IEEE International Conference on Data Mining. IEEE, 1001--1006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Bharat Sukhwani, Hong Min, Mathew Thoennes, Parijat Dube, Balakrishna Iyer, Bernard Brezzo, Donna Dillenberger, and Sameh Asaad. 2012. Database analytics acceleration using FPGAs. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 411--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Takashi Takenaka, Masamichi Takagi, and Hiroaki Inoue. 2012. A scalable complex event processing framework for combination of SQL-based continuous queries and C/C++ functions. In Proceedings of the 22nd International Conference on Field Programmable Logic and Applications (FPL). IEEE, 237--242. Google ScholarGoogle ScholarCross RefCross Ref
  21. Jens Teubner and Rene Mueller. 2011. How soccer players would do stream joins. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data. ACM, 625--636. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Jens Teubner, Rene Muller, and Gustavo Alonso. 2011. Frequent item computation on a chip. IEEE Trans. Knowl. Data Eng. 23, 8 (2011), 1169--1181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. UCRSuite. 2012. Retrieved October 25, 2016 from http://www.cs.ucr.edu/%7eeamonn/UCRsuite.html.Google ScholarGoogle Scholar
  24. Vivado. 2012. Retrieved October 25, 2016 http://www.xilinx.com/products/design-tools/vivado.html.Google ScholarGoogle Scholar
  25. Haixun Wang and Carlo Zaniolo. 1999. User-defined aggregates in database languages. In Proceedings of the International Symposium on Database Programming Languages. Springer, 43--60.Google ScholarGoogle Scholar
  26. Zilong Wang, Sitao Huang, Lanjun Wang, Hao Li, Yu Wang, and Huazhong Yang. 2013. Accelerating subsequence similarity search based on dynamic time warping distance with FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 53--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. WebDocs. 2003. October 25, 2016 from http://fimi.ua.ac.be/data/.Google ScholarGoogle Scholar
  28. Xuechao Wei, Yun Liang, Tao Wang, Songwu Lu, and Jason Cong. 2017. Throughput optimization for streaming applications on CPU-FPGA heterogeneous systems. In Proceedings of the 22th Asia and South Pacific Design Automation Conference (ASP-DAC). Google ScholarGoogle ScholarCross RefCross Ref
  29. Louis Woods, Gustavo Alonso, and Jens Teubner. 2015. Parallelizing data processing on FPGAs with shifter lists. ACM Trans. Reconfig. Technol. Syst. 8, 2 (2015), 7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Tian Zhang, Raghu Ramakrishnan, and Miron Livny. 1996. BIRCH: An efficient data clustering method for very large databases. In ACM Sigmod Record, Vol. 25. ACM, 103--114.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Wei Zuo, Yun Liang, Peng Li, Kyle Rupnow, Deming Chen, and Jason Cong. 2013. Improving high level synthesis optimization opportunity through polyhedral transformations. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays. ACM, 9--18. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Exploiting Stable Data Dependency in Stream Processing Acceleration on FPGAs

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!