Abstract
This article describes TBES, a software end-to-end environment for synthesizing multitask applications on FPGAs. The implementation follows a template-based approach for creating heterogeneous multiprocessor architectures. Heterogeneity stems from the use of general-purpose processors along with custom accelerators. Experimental results demonstrate substantial speedup for several classes of applications.
Furthermore, this work allows for reducing development costs and saving development time for the software architect, the domain expert, and the optimization expert. This work provides a framework to bring together various existing tools and optimisation algorithms. The advantages are manifold: modularity and flexibility, easy customization for best-fit algorithm selection, durability and evolution over time, and legacy preservation including domain experts' know-how.
In addition to the use of architecture templates for the overall system, a second contribution lies in using high-level synthesis for promoting exploration of hardware IPs. The domain expert, who best knows which tasks are good candidates for hardware implementation, selects parts of the initial application to be potentially synthesized as dedicated accelerators. As a consequence, the HLS general problem turns into a constrained and more tractable issue, and automation capabilities eliminate the need for tedious and error-prone manual processes during domain space exploration.
The automation only takes place once the application has been broken down into concurrent tasks by the designer, who can then drive the synthesis process with a set of parameters provided by TBES to balance tradeoffs between optimization efforts and quality of results.
The approach is demonstrated step by step up to FPGA implementations and executions with an MJPEG benchmark and a complex Viola-Jones face detection application. We show that TBES allows one to achieve results with up to 10 times speedup to reduce development times and to widen design space exploration.
- U. Alqasemi, H. Li, A. Aguirre, and Q. Zhu. 2012. FPGA-based reconfigurable processor for ultrafast interlaced ultrasound and photoacoustic imaging. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 59, 7 (2012), 1344--1353.Google Scholar
Cross Ref
- Altera. 2015a. Altera and IBM Unveil FPGA-Accelerated POWER Systems with Coherent Shared Memory. Retrieved from http://newsroom.altera.com/press-releases/nr-ibm-capi.htm.Google Scholar
- Altera. 2015b. Stratix 10 - Overview. Retrieved from https://www.altera.com/products/fpga/stratix-series/stratix-10/overview.html.Google Scholar
- ATL. 2014. The Atlas Transformation Language (ATL). Retrieved from http://www.eclipse.org/atl/.Google Scholar
- I. Augé, F. Pétrot, F. Donnet, and P. Gomez. 2005. Platform-based design from parallel C specifications. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 24, 12 (2005), 1811--1826. Google Scholar
Digital Library
- K. Benkrid, D. Crookes, and A. Benkrid. 2002. Towards a general framework for FPGA based image processing using hardware skeletons. Parallel Computing 28, 7--8 (2002), 1141--1154. Google Scholar
Digital Library
- E. Cartwright, A. Fahkari, S. Ma, C. Smith, M. Huang, D. Andrews, and J. Agron. 2012. Automating the design of mLUT MPSoPC FPGAs in the cloud. In Proceedings of the 2012 22nd International Conference on Field Programmable Logic and Applications (FPL'12). IEEE, 231--236.Google Scholar
- Y. Corre, J. P. Diguet, D. Heller, and L. Lagadec. 2012. A framework for high-level synthesis of heterogeneous MP-SoC. In Proceedings of the Great Lakes Symposium on VLSI. ACM, 283--286. Google Scholar
Digital Library
- P. Coussy, C. Chavet, P. Bomel, D. Heller, E. Senn, and E. Martin. 2008. GAUT: A High-Level Synthesis Tool for DSP applications. In High-Level Synthesis: From Algorithm to Digital Circuit. Springer, 147--169.Google Scholar
- P. Feiler and D. Gluch. 2012. Model-Based Engineering with AADL: An Introduction to the SAE Architecture Analysis & Design Language. Addison-Wesley Professional. Google Scholar
Digital Library
- B. Fort, A. Canis, J. Choi, N. Calagar, R. Lian, S. Hadjis, Y. T. Chen, M. Hall, B. Syrowik, T. Czajkowski, et al. 2014. Automating the design of processor/accelerator embedded systems with legup high-level synthesis. In Proceedings of the 2014 12th IEEE International Conference on Embedded and Ubiquitous Computing (EUC'14). IEEE, 120--129. Google Scholar
Digital Library
- S. L. Graham, P. B. Kessler, and M. K. Mckusick. 1982. Gprof: A call graph execution profiler. ACM Sigplan Notices 17, 6 (1982), 120--126. Google Scholar
Digital Library
- S. Ha, S. Kim, C. Lee, Y. Yi, S. Kwon, and Y. Joo. 2007. PeaCE: A hardware-software codesign environment for multimedia embedded systems. ACM Transactions on Design Automation of Electrical Systems 12, 3 (2007), Article 24. Google Scholar
Digital Library
- M. D. Hill and M. R. Marty. 2008. Amdahl's law in the multicore era. Computer 7 (2008), 33--38. Google Scholar
Digital Library
- G. Kahn. 1974. The semantics of a simple language for parallel programming. Information Processing 74 (1974), 471--475.Google Scholar
- J. Keinert, T. Schlichter, J. Falk, J. Gladigau, C. Haubelt, J. Teich, M. Meredith, and others. 2009. SystemCoDesigner—An automatic ESL synthesis approach by design space exploration and behavioral synthesis for streaming applications. ACM Transactions on Design Automation of Electronic Systems (TODAES) 14, 1 (2009), 1--23. Google Scholar
Digital Library
- M. A. Kinsy and S. Devadas. 2012. Heracles 2.0: A tool for design space exploration of multi/many-core processors. In Proceedings of the Workshop on the Intersections of Computer Architecture and Reconfigurable Logic (CARL'12).Google Scholar
- H. W. Kuhn. 1955. The hungarian method for the assignment problem. Naval Research Logistics Quarterly 2, 1--2 (1955), 83--97.Google Scholar
Cross Ref
- M. Leeser, S. Miller, and H. Yu. 2004. Smart camera based on reconfigurable hardware enables diverse real-time applications. In Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2004 (FCCM'04). IEEE, 147--155. Google Scholar
Digital Library
- S. Li, N. Farahini, A. Hemani, K. Rosvall, and I. Sander. 2013. System level synthesis of hardware for DSP applications using pre-characterized function implementations. In Proceedings of the ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS'13). Google Scholar
Digital Library
- MDE. 2015. Model-Based Engineering Description. Retrieved from http://modelbasedengineering.com.Google Scholar
- L. Moss, H. Guérard, G. Dare, and G. Bois. 2012. Rapid design exploration on an ESL framework featuring hardware-software codesign for ARM processor-based FPGA's. Space 1 (2012), 18.Google Scholar
- H. Nikolov, T. Stefanov, and E. Deprettere. 2006. Multi-processor system design with ESPAM. In CODES+ ISSS'06. 211--216. Google Scholar
Digital Library
- Opencores. 2014. Online OpenCores Library. Retrieved from http://opencores.org/.Google Scholar
- P. Pawelczak, K. Nolan, L. Doyle, S. W. Oh, and D. Cabric. 2011. Cognitive radio: Ten years of experimentation and development. IEEE Communications Magazine 49, 3 (2011), 90--100. Google Scholar
Digital Library
- A. D. Pimentel, C. Erbas, and S. Polstra. 2006. A systematic approach to exploring embedded system architectures at multiple abstraction levels. IEEE Transactions on Computers, 55, 2 (2006), 99--112. Google Scholar
Digital Library
- M. Rashid, F. Ferrandi, and K. Bertels. 2009. Hartes design flow for heterogeneous platforms. In Quality of Electronic Design, 2009 (ISQED'09). IEEE, 330--338. Google Scholar
Digital Library
- M. Sadri, C. Weis, N. Wehn, and L. Benini. 2013. Energy and performance exploration of accelerator coherency port using Xilinx ZYNQ. In Proceedings of the 10th FPGAworld Conference. ACM, 5. Google Scholar
Digital Library
- S. Shibata, S. Honda, H. Tomiyama, and H. Takada. 2010. Advanced systembuilder: A tool set for multiprocessor design space exploration. In Proceedings of the 2010 International SoC Design Conference (ISOCC'10).Google Scholar
- D. Suzuki, N. Natsui, A. Mochizuki, S. Miura, H. Honjo, K. Kinoshita, H. Sato, S. Ikeda, T. Endoh, H. Ohno, and T. Hanyu. 2013. Fabrication of a magnetic tunnel junction-based 240-tile nonvolatile field-programmable gate array chip skipping wasted write operations for greedy power-reduced logic applications. IEICE Electronics Express 10, 23 (2013).Google Scholar
- M. Thompson, H. Nikolov, T. Stefanov, A. D. Pimentel, C. Erbas, S. Polstra, and E. F. Deprettere. 2007. A framework for rapid system-level exploration, synthesis, and programming of multimedia MP-SoCs. In Proceedings of the 5th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis. ACM, 9--14. Google Scholar
Digital Library
- S. Vassiliadis, S. Wong, G. Gaydadjiev, K. Bertels, G. Kuzmanov, and E. M. Panainte. 2004. The MOLEN polymorphic processor. IEEE Transactions on Computers, 53, 11 (2004), 1363--1375. Google Scholar
Digital Library
- S. Verdoolaege, H. Nikolov, and T. Stefanov. 2007. PN: A tool for improved derivation of process networks. EURASIP Journal on Embedded Systems 2007, 1 (2007), 19--32. Google Scholar
Digital Library
- P. Viola and M. Jones. 2001. Rapid object detection using a boosted cascade of simple features. In Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2001 (CVPR'01). Vol. 1. IEEE, I--511.Google Scholar
- Xilinx. 2011a. Platform Format Specification Reference Manual - Xilinx (UG 642). Retrieved from http://www.xilinx.com/support/documentation/sw_manuals/xilinx13_2/psf_rm.pdf. (2011).Google Scholar
- Xilinx. 2011b. Xilinx XUPV5-LX110T FPGA Board Documentation. Retrieved from http://www.xilinx.com/ univ/xupv5-lx110t.htm. (2011).Google Scholar
- Xilinx. 2012. Xilinx ML605 FPGA Board Documentation. Retrieved from http://www.xilinx.com/products/ boards/ml605/reference_designs.htm.Google Scholar
- Xtext. 2015. Xtext website. Retrieved from https://eclipse.org/Xtext/index.html.Google Scholar
- Y. Yankova, G. Kuzmanov, K. Bertels, G. Gaydadjiev, Y. Lu, and S. Vassiliadis. 2007. DWARV: Delftworkbench automated reconfigurable VHDL generator. In International Conference on Field Programmable Logic and Applications, 2007 (FPL'07). IEEE, 697--701.Google Scholar
Cross Ref
Index Terms
TBES: Template-Based Exploration and Synthesis of Heterogeneous Multiprocessor Architectures on FPGA
Recommendations
CODESL: A Framework for System-Level Modelling, Co-simulation and Design-Space Exploration of Embedded Systems Based on System-on-Chip
ISMS '10: Proceedings of the 2010 International Conference on Intelligent Systems, Modelling and SimulationThis paper presents CODESL, a SystemC-based hardware-software co-design and co-simulation framework for embedded systems based on System-on-Chip (SoC). This modelling platform, which works at Electronic System Level (ESL), enables early system ...
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded SystemsEmbedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Accelerating FPGA Prototyping through Predictive Model-Based HLS Design Space Exploration
DAC '19: Proceedings of the 56th Annual Design Automation Conference 2019One of the advantages of High-Level Synthesis (HLS), also called C-based VLSI-design, over traditional RT-level VLSI design flows, is that multiple micro-architectures of unique area vs. performance can be automatically generated by setting different ...






Comments