skip to main content
research-article

RIVER: Reconfigurable Flow and Fabric for Real-Time Signal Processing on FPGAs

Published:03 September 2014Publication History
Skip Abstract Section

Abstract

For high-performance embedded hard-real-time systems, ASICs and FPGAs hold advantages over general-purpose processors and graphics accelerators (GPUs). However, developing signal processing architectures from scratch requires significant resources. Our design methodology is based on sets of configurable building blocks that provide storage, dataflow, computation, and control. Based on our building blocks, we generate hundreds of thousands of our dynamic streaming engine processors that we call DSEs. We store our DSEs in a repository that can be queried for (online) design space exploration. From this repository, DSEs can be downloaded and instantiated within milliseconds on FPGAs. If a loss of flexibility can be tolerated then ASIC implementations are feasible as well. In this article we focus on FPGA implementations. Our DSEs vary in cores, computational lanes, bitwidths, power consumption, and frequency. To the best of our knowledge we are the first to propose online design space exploration based on repositories of precompiled cores that are assembled of common building blocks. For demonstration purposes we map algorithms for image processing and financial mathematics to DSEs and compare the performance to existing highly optimized signal and graphics accelerators.

References

  1. J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. 2004. Evaluating the imagine stream architecture. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA'04). 14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. M. Alidina, Devadas, J. Monteiro, A. Ghosh, and M. Papaefthymiou. 1994. Precomputation-based sequential logic optimization for low power. IEEE Trans. VLSI Syst. 2, 4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. T. Alt, G. Grastveit, H. Helstrup, V. Lindenstruth, C. Loizides et al. 2004. The ALICE high level trigger. J. Phys. G: Nucl. Part. Phys. 30, 8.Google ScholarGoogle ScholarCross RefCross Ref
  4. J. M. Arnold. 2005. S5: The architecture and development flow of a software configurable processor. In Proceedings IEEE International Conference on Field-Programmable Technology.Google ScholarGoogle ScholarCross RefCross Ref
  5. L. Bauer, M. Shafique, D. Teufel, and J. Henkel. 2007. A self-adaptive extensible embedded processor. In Proceedings of the 1st International Conference on Self-Adaptive and Self-Organizing Systems (SASO'07). 344--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Benoit, L. Torres, G. Sassatelli, and M. Robert. 2010. Run-time mapping for dynamic reconfiguration management in embedded systems. Int. J. Embedd. Syst. 4, 3, 276--291.Google ScholarGoogle ScholarCross RefCross Ref
  7. D. Bertozzi, A. Jalabert, S. Murali, R. Tamhankar, S. Stergiou, L. Benini, and G. De Micheli. 2005. NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans. Parallel Distrib. Syst. 16, 2, 113--129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bluespec. 2007. Bluespec system verilog. http://bluespec.com.Google ScholarGoogle Scholar
  9. U. Bordoloi. 2009. Image convolution using opencltm—A step-by-step tutorial. http://developer.amd.com/zones/OpenCLZone/programming/ImageConvolutionOpenCL/pages/ImageConvolutionUsingOpenCL.aspxGoogle ScholarGoogle Scholar
  10. A. Cazzaniga, G. Durelli, C. Pilato, D. Sciuto, and M. D. Santambrogio. 2012. On the development of a runtime reconfigurable multicore system-on-chip. In Proceedings of the 15th Euromicro Conference on Digital System Design (DSD'12). 132--135. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. C. de Schryver, D. Schmidt, N. Wehn, E. Korn, H. Marxen, A. Kostiuk, and R. Korn. 2012. A hardware efficient random number generator for nonuniform distributions with arbitrary precision. Int. J. Reconfig. Comput. 2012, 12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. de Schryver, I. Shcherbakov, F. Kienle, N. Wehn, H. Marxen, A. Kostiuk, and R. Korn. 2011. An energy efficient FPGA accelerator for monte carlo option pricing with the heston model. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig'11). IEEE, 468--474. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Drimer. 2011. Dynamic FPGA design framework generator. https://www.boldport.com/docs/fpgaproj.Google ScholarGoogle Scholar
  14. S. L. Heston. 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Rev. Finan. Stud. 6, 2, 327--343.Google ScholarGoogle ScholarCross RefCross Ref
  15. D. Hillenbrand, C. Brugger, J. Tao, S. Yang, and M. Balzer. 2012a. RIVER architecture: Reconfigurable flow and fabric for parallel stream processing on FPGAS. In Proceedings of the 7th International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC'12). 1--8.Google ScholarGoogle Scholar
  16. D. Hillenbrand, C. Brugger, J. Tao, S. Yang, and M. Balzer. 2012b. RIVER: Reconfigurable pre-synthesized-streaming architecture for signal processing on FPGAS. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW'12). 397--400. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. S. Ishihara, M. Hariyama, and M. Kameyama. 2011. A low-power FPGA based on autonomous fine-grain power gating. IEEE Trans. VLSI Syst. 19, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. I. Ishii, T. Tatebe, Q. Gu, Y. Moriue, T. Takaki, and K. Tajima. 2010. 2000 fps real-time vision system with high-frame-rate video recording. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'12). 1536--1541.Google ScholarGoogle Scholar
  19. R. Jahr, H. Calborean, L. Vintan, and T. Ungerer. 2012. Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations. Concurr. Comput. Pract. Exper. 2012.Google ScholarGoogle Scholar
  20. B. K. Khailany, T. Williams, J. Lin, E. P. Long, M. Rygh et al. 2008. A programmable 512 GOPS stream processor for signal, image, and video processing. IEEE J. Solid-State Circ. 43, 1.Google ScholarGoogle ScholarCross RefCross Ref
  21. G. Marianik, V. Sima, G. Palermo, V. Zaccaria, C. Silvano, and K. Bertels. 2012. Using multi-objective design space exploration to enable run-time resource management for reconfigurable architectures. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'12). 1379--1384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Menotti, J. M. P. Cardoso, M. M. Fernandes, and E. Marques. 2012. LALP: A language to program custom FPGA-based acceleration engines. Int. J. Parallel Program. 40, 3.Google ScholarGoogle ScholarCross RefCross Ref
  23. J. Meyer, J. Noguera, M. Hubner, L. Braun, O. Sander et al. 2011. Fast start-up for Spartan-6 FPGAS using dynamic partial reconfiguration. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'11). 1--6.Google ScholarGoogle Scholar
  24. R. Nikhil. 2004. Bluespec system verilog: Efficient, correct RTL from high level specifications. In Proceedings of the 2nd ACM/IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE'04). 69--70.Google ScholarGoogle Scholar
  25. J. Noguera, R. Esser, K. Paulsson, M. Hubner, and J. Becker. 2008. Towards novel approaches in design automation for FPGA power optimization. In Proceedings of the 18th International Conference on Integrated Circuit and Systems Design. Power and Timing Modeling, Optimization and Simulation (PATMOS'08). 419--428.Google ScholarGoogle Scholar
  26. A. Otero, Y. E. Krasteva, E. Torre, and T. Riesgo. 2010. Generic systolic array for run-time scalable cores. In Proceedings of the 6th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC'10). P. Sirisuk, F. Morgan, T. El-Ghazawi, and H. Amano, Eds., Lecture Notes in Computer Science, vol. 5992, Springer, 4--16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. K. Papadimitriou, C. Pilato, D. Pnevmatikatos, M. D. Santambrogio, C. Ciobanu et al. 2012. Novel design methods and a tool flow for unleashing dynamic reconfiguration. In Proceedings of the 15th IEEE International Conference on Computational Science and Engineering (CSE'12). 391--398. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Pnevmatikatos, A. Brokalakis, W. Luk, M. D. Santambrogio, D. Sciuto et al. 2012. FASTER: Facilitating analysis and synthesis technologies for effective reconfiguration. In Proceedings of the 15th Euromicro Conference on Digital System Design (DSD'12). 234--241. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. M. D. Santambrogio, D. Pnevmatikatos, K. Papadimitriou, C. Pilato, G. Gaydadjiev et al. 2012. Smart technologies for effective reconfiguration: The faster approach. In Proceedings of the 7th International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC'12). 1--7.Google ScholarGoogle Scholar
  30. R. Schmogrow, M. Winter, D. Hillerkuss, B. Nebendahl, S. Ben-Ezra et al. 2011. Real-time ofdm transmitter beyond 100 gbit/s. Opt. Express 19, 13.Google ScholarGoogle ScholarCross RefCross Ref
  31. C. Silvano, W. Fornaciari, S. C. Reghizzi, G. Agosta, G. Palermo et al. 2011. Parallel programming and run-time resource management framework for many-core platforms: The 2parma approach. In Proceedings of the 6th International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC'11).Google ScholarGoogle Scholar
  32. Tensilica. 2004. Tensilica automates architecture exploration. IEEE Rev. 50, 7.Google ScholarGoogle Scholar
  33. J. Villarreal, A. Park, W. Najjar, and R. Halstead. 2010. Designing modular hardware accelerators in C with ROCCC 2.0. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM'10). 127--134. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. K. Zhao, J. Bian, S. Dong, Y. Song, and S. Goto. 2008. Automated specific instruction customization methodology for multimedia processor acceleration. In Proceedings of the 9th International Symposium on Quality Electronic Design (ISQED'08). 321--324. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. RIVER: Reconfigurable Flow and Fabric for Real-Time Signal Processing on FPGAs

            Recommendations

            Reviews

            Mohammed Ziaur Rahman

            Field-programmable gate arrays (FPGAs) have become the most effective, powerful, and generic logic design and verification tools during the last decade. Specialized hardware implementations for building high-performance parallel processing capabilities are a key application for which FPGAs are gaining momentum. The authors contribute a parallel dataflow-oriented architecture called RIVER. A cloud-based online FPGA design flow is depicted that utilizes “a repository of precompiled dynamic streaming engine processors (DSEs)”; building blocks based on the architecture are also presented. The online design space can be searched and instantiated within a fraction of seconds. The authors show that "applications from image processing and financial mathematics can be mapped efficiently to multicore DSEs." Furthermore, a comparison of an eight-core DSE example to “existing highly optimized signal and graphics accelerators” is presented, which shows that the “multicore DSE example designs are competitive in terms of power and performance when compared to commercially available signal processors and graphics accelerators ([graphics processing units,] GPUs).” Based on the examples shown in the paper, it is evident that the proposed technique can be effectively utilized for implementing an arbitrary combination of digital image processing or signaling filters. The mathematical solution for the market share decision process highlights another aspect: a complex problem may still need customized effort such as the design of a square root calculation block. However, the concept of building an online repository and using it to minimize design effort and time for FPGAs is quite interesting, and understandably the repository can grow over time to support a wide range of solutions. Online Computing Reviews Service

            Sunil Shukla

            The authors of this paper describe an architecture called RIVER and a design flow for field-programmable gate arrays (FPGAs). They have built a precompiled library consisting of many combinations and permutations of basic “building blocks that provide storage, dataflow, computation, and control,” and have used two kernels, one from signal processing and another from financial modeling, to evaluate their tool. The authors attract attention with a claim in the introduction that they “are the first to propose online design space exploration based on repositories of precompiled cores assembled of common building blocks.” However, they do not do a good job explaining what exactly they mean by “online.” I like the basic building blocks of the architecture. They seem to support streaming signal processing kernels very well. The design flow is poorly explained. My understanding is that there are two parts: one where thousands of designs are synthesized using a cloud-based synthesis framework and copied into a library, and one where the user maps his or her application using one of the precompiled bit files into the FPGA. The authors need to do a better job explaining the latter part. The flow seems restrictive. First, it doesn't allow for the insertion of custom Internet protocol (IP) addresses, which may be required for certain applications. Second, it's a domain-specific architecture that only supports applications that can be described entirely using the basic building blocks. This is not necessarily a negative thing because signal processing applications, which the paper suggests are supported, form a big chunk of the FPGA market segment. The paper would definitely benefit from an elaborated evaluation of several applications, possibly with different characteristics. Also, there are several grammatical mistakes. Online Computing Reviews Service

            Access critical reviews of Computing literature here

            Become a reviewer for Computing Reviews.

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Reconfigurable Technology and Systems
              ACM Transactions on Reconfigurable Technology and Systems  Volume 7, Issue 3
              Special Issue on 11th International Conference on Field-Programmable Technology (FPT'12) and Special Issue on the 7th International Workshop on Reconfigurable Communication-Centric Systems-on-Chip (ReCoSoC'12)
              August 2014
              199 pages
              ISSN:1936-7406
              EISSN:1936-7414
              DOI:10.1145/2664590
              Issue’s Table of Contents

              Copyright © 2014 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 3 September 2014
              • Accepted: 1 September 2013
              • Revised: 1 August 2013
              • Received: 1 January 2013
              Published in trets Volume 7, Issue 3

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!