Abstract
For high-performance embedded hard-real-time systems, ASICs and FPGAs hold advantages over general-purpose processors and graphics accelerators (GPUs). However, developing signal processing architectures from scratch requires significant resources. Our design methodology is based on sets of configurable building blocks that provide storage, dataflow, computation, and control. Based on our building blocks, we generate hundreds of thousands of our dynamic streaming engine processors that we call DSEs. We store our DSEs in a repository that can be queried for (online) design space exploration. From this repository, DSEs can be downloaded and instantiated within milliseconds on FPGAs. If a loss of flexibility can be tolerated then ASIC implementations are feasible as well. In this article we focus on FPGA implementations. Our DSEs vary in cores, computational lanes, bitwidths, power consumption, and frequency. To the best of our knowledge we are the first to propose online design space exploration based on repositories of precompiled cores that are assembled of common building blocks. For demonstration purposes we map algorithms for image processing and financial mathematics to DSEs and compare the performance to existing highly optimized signal and graphics accelerators.
- J. H. Ahn, W. J. Dally, B. Khailany, U. J. Kapasi, and A. Das. 2004. Evaluating the imagine stream architecture. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA'04). 14. Google Scholar
Digital Library
- M. Alidina, Devadas, J. Monteiro, A. Ghosh, and M. Papaefthymiou. 1994. Precomputation-based sequential logic optimization for low power. IEEE Trans. VLSI Syst. 2, 4. Google Scholar
Digital Library
- T. Alt, G. Grastveit, H. Helstrup, V. Lindenstruth, C. Loizides et al. 2004. The ALICE high level trigger. J. Phys. G: Nucl. Part. Phys. 30, 8.Google Scholar
Cross Ref
- J. M. Arnold. 2005. S5: The architecture and development flow of a software configurable processor. In Proceedings IEEE International Conference on Field-Programmable Technology.Google Scholar
Cross Ref
- L. Bauer, M. Shafique, D. Teufel, and J. Henkel. 2007. A self-adaptive extensible embedded processor. In Proceedings of the 1st International Conference on Self-Adaptive and Self-Organizing Systems (SASO'07). 344--350. Google Scholar
Digital Library
- P. Benoit, L. Torres, G. Sassatelli, and M. Robert. 2010. Run-time mapping for dynamic reconfiguration management in embedded systems. Int. J. Embedd. Syst. 4, 3, 276--291.Google Scholar
Cross Ref
- D. Bertozzi, A. Jalabert, S. Murali, R. Tamhankar, S. Stergiou, L. Benini, and G. De Micheli. 2005. NoC synthesis flow for customized domain specific multiprocessor systems-on-chip. IEEE Trans. Parallel Distrib. Syst. 16, 2, 113--129. Google Scholar
Digital Library
- Bluespec. 2007. Bluespec system verilog. http://bluespec.com.Google Scholar
- U. Bordoloi. 2009. Image convolution using opencltm—A step-by-step tutorial. http://developer.amd.com/zones/OpenCLZone/programming/ImageConvolutionOpenCL/pages/ImageConvolutionUsingOpenCL.aspxGoogle Scholar
- A. Cazzaniga, G. Durelli, C. Pilato, D. Sciuto, and M. D. Santambrogio. 2012. On the development of a runtime reconfigurable multicore system-on-chip. In Proceedings of the 15th Euromicro Conference on Digital System Design (DSD'12). 132--135. Google Scholar
Digital Library
- C. de Schryver, D. Schmidt, N. Wehn, E. Korn, H. Marxen, A. Kostiuk, and R. Korn. 2012. A hardware efficient random number generator for nonuniform distributions with arbitrary precision. Int. J. Reconfig. Comput. 2012, 12. Google Scholar
Digital Library
- C. de Schryver, I. Shcherbakov, F. Kienle, N. Wehn, H. Marxen, A. Kostiuk, and R. Korn. 2011. An energy efficient FPGA accelerator for monte carlo option pricing with the heston model. In Proceedings of the International Conference on Reconfigurable Computing and FPGAs (ReConFig'11). IEEE, 468--474. Google Scholar
Digital Library
- S. Drimer. 2011. Dynamic FPGA design framework generator. https://www.boldport.com/docs/fpgaproj.Google Scholar
- S. L. Heston. 1993. A closed-form solution for options with stochastic volatility with applications to bond and currency options. Rev. Finan. Stud. 6, 2, 327--343.Google Scholar
Cross Ref
- D. Hillenbrand, C. Brugger, J. Tao, S. Yang, and M. Balzer. 2012a. RIVER architecture: Reconfigurable flow and fabric for parallel stream processing on FPGAS. In Proceedings of the 7th International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC'12). 1--8.Google Scholar
- D. Hillenbrand, C. Brugger, J. Tao, S. Yang, and M. Balzer. 2012b. RIVER: Reconfigurable pre-synthesized-streaming architecture for signal processing on FPGAS. In Proceedings of the 26th IEEE International Parallel and Distributed Processing Symposium Workshops and PhD Forum (IPDPSW'12). 397--400. Google Scholar
Digital Library
- S. Ishihara, M. Hariyama, and M. Kameyama. 2011. A low-power FPGA based on autonomous fine-grain power gating. IEEE Trans. VLSI Syst. 19, 8. Google Scholar
Digital Library
- I. Ishii, T. Tatebe, Q. Gu, Y. Moriue, T. Takaki, and K. Tajima. 2010. 2000 fps real-time vision system with high-frame-rate video recording. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA'12). 1536--1541.Google Scholar
- R. Jahr, H. Calborean, L. Vintan, and T. Ungerer. 2012. Finding near-perfect parameters for hardware and code optimizations with automatic multi-objective design space explorations. Concurr. Comput. Pract. Exper. 2012.Google Scholar
- B. K. Khailany, T. Williams, J. Lin, E. P. Long, M. Rygh et al. 2008. A programmable 512 GOPS stream processor for signal, image, and video processing. IEEE J. Solid-State Circ. 43, 1.Google Scholar
Cross Ref
- G. Marianik, V. Sima, G. Palermo, V. Zaccaria, C. Silvano, and K. Bertels. 2012. Using multi-objective design space exploration to enable run-time resource management for reconfigurable architectures. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'12). 1379--1384. Google Scholar
Digital Library
- R. Menotti, J. M. P. Cardoso, M. M. Fernandes, and E. Marques. 2012. LALP: A language to program custom FPGA-based acceleration engines. Int. J. Parallel Program. 40, 3.Google Scholar
Cross Ref
- J. Meyer, J. Noguera, M. Hubner, L. Braun, O. Sander et al. 2011. Fast start-up for Spartan-6 FPGAS using dynamic partial reconfiguration. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE'11). 1--6.Google Scholar
- R. Nikhil. 2004. Bluespec system verilog: Efficient, correct RTL from high level specifications. In Proceedings of the 2nd ACM/IEEE International Conference on Formal Methods and Models for Co-Design (MEMOCODE'04). 69--70.Google Scholar
- J. Noguera, R. Esser, K. Paulsson, M. Hubner, and J. Becker. 2008. Towards novel approaches in design automation for FPGA power optimization. In Proceedings of the 18th International Conference on Integrated Circuit and Systems Design. Power and Timing Modeling, Optimization and Simulation (PATMOS'08). 419--428.Google Scholar
- A. Otero, Y. E. Krasteva, E. Torre, and T. Riesgo. 2010. Generic systolic array for run-time scalable cores. In Proceedings of the 6th International Symposium on Reconfigurable Computing: Architectures, Tools and Applications (ARC'10). P. Sirisuk, F. Morgan, T. El-Ghazawi, and H. Amano, Eds., Lecture Notes in Computer Science, vol. 5992, Springer, 4--16. Google Scholar
Digital Library
- K. Papadimitriou, C. Pilato, D. Pnevmatikatos, M. D. Santambrogio, C. Ciobanu et al. 2012. Novel design methods and a tool flow for unleashing dynamic reconfiguration. In Proceedings of the 15th IEEE International Conference on Computational Science and Engineering (CSE'12). 391--398. Google Scholar
Digital Library
- D. Pnevmatikatos, A. Brokalakis, W. Luk, M. D. Santambrogio, D. Sciuto et al. 2012. FASTER: Facilitating analysis and synthesis technologies for effective reconfiguration. In Proceedings of the 15th Euromicro Conference on Digital System Design (DSD'12). 234--241. Google Scholar
Digital Library
- M. D. Santambrogio, D. Pnevmatikatos, K. Papadimitriou, C. Pilato, G. Gaydadjiev et al. 2012. Smart technologies for effective reconfiguration: The faster approach. In Proceedings of the 7th International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC'12). 1--7.Google Scholar
- R. Schmogrow, M. Winter, D. Hillerkuss, B. Nebendahl, S. Ben-Ezra et al. 2011. Real-time ofdm transmitter beyond 100 gbit/s. Opt. Express 19, 13.Google Scholar
Cross Ref
- C. Silvano, W. Fornaciari, S. C. Reghizzi, G. Agosta, G. Palermo et al. 2011. Parallel programming and run-time resource management framework for many-core platforms: The 2parma approach. In Proceedings of the 6th International Workshop on Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC'11).Google Scholar
- Tensilica. 2004. Tensilica automates architecture exploration. IEEE Rev. 50, 7.Google Scholar
- J. Villarreal, A. Park, W. Najjar, and R. Halstead. 2010. Designing modular hardware accelerators in C with ROCCC 2.0. In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM'10). 127--134. Google Scholar
Digital Library
- K. Zhao, J. Bian, S. Dong, Y. Song, and S. Goto. 2008. Automated specific instruction customization methodology for multimedia processor acceleration. In Proceedings of the 9th International Symposium on Quality Electronic Design (ISQED'08). 321--324. Google Scholar
Digital Library
Index Terms
RIVER: Reconfigurable Flow and Fabric for Real-Time Signal Processing on FPGAs
Recommendations
Hardware and software infrastructure to implement many-core systems in modern FPGAs
SBCCI '17: Proceedings of the 30th Symposium on Integrated Circuits and Systems Design: Chip on the SandsMany-core systems are increasingly popular in embedded systems due to their high-performance and flexibility to execute different workloads. These many-core systems provide a rich processing fabric but lack the flexibility to accelerate critical ...
RIVER: Reconfigurable Pre-Synthesized-Streaming Architecture for Signal Processing on FPGAs
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD ForumWe present a scalable run-time configurable and programmable signal processing architecture for real-time applications which covers a wide performance spectrum. Our approach goes beyond conventional special purpose signal processing engines. Scalability ...
Implementation of FFT on General-Purpose Architectures for FPGA
This paper describes two general-purpose architectures targeted to Field Programmable Gate Array FPGA implementation. The first architecture is based on the coupling of a coarse-grain reconfigurable array with a general-purpose processor core. The ...








Comments