Abstract
Hardware accelerators are key to the efficiency and performance of system-on-chip (SoC) architectures. With high-level synthesis (HLS), designers can easily obtain several performance-cost trade-off implementations for each component of a complex hardware accelerator. However, navigating this design space in search of the Pareto-optimal implementations at the system level is a hard optimization task. We present COSMOS, an automatic methodology for the design-space exploration (DSE) of complex accelerators, that coordinates both HLS and memory optimization tools in a compositional way. First, thanks to the co-design of datapath and memory, COSMOS produces a large set of Pareto-optimal implementations for each component of the accelerator. Then, COSMOS leverages compositional design techniques to quickly converge to the desired trade-off point between cost and performance at the system level. When applied to the system-level design (SLD) of an accelerator for wide-area motion imagery (WAMI), COSMOS explores the design space as completely as an exhaustive search, but it reduces the number of invocations to the HLS tool by up to 14.6×.
- M. Amdahl. 1967. Validity of the Single Processor Approach to Achieving Large Scale Computing Capabilities. In Proc. of the ACM Spring Joint Computer Conference (AFIPS). Google Scholar
Digital Library
- N. Baradaran and P. C. Diniz. 2008. A Compiler Approach to Managing Storage and Memory Bandwidth in Configurable Architectures. ACM Transaction on Design Automation of Electronic Systems (2008). Google Scholar
Digital Library
- K. Barker, T. Benson, D. Campbell, D. Ediger, R. Gioiosa, A. Hoisie, D. Kerbyson, J. Manzano, A. Marquez, L. Song, N. Tallent, and A. Tumeo. 2013. PERFECT (Power Efficiency Revolution For Embedded Computing Technologies) Benchmark Suite Manual. Pacific Northwest National Laboratory and Georgia Tech Research Institute. http://hpc.pnl.gov/PERFECT/.Google Scholar
- S. Borkar and A. Chien. 2011. The Future of Microprocessors. Communication of the ACM (2011). Google Scholar
Digital Library
- S. Boyd and L. Vandenberghe. 2004. Convex Optimization. Cambridge University Press. Google Scholar
Digital Library
- J. Campos, G. Chiola, J. M. Colom, and M. Silva. 1992. Properties and Performance Bounds for Timed Marked Graphs. IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications (1992).Google Scholar
- L. P. Carloni. 2015. From Latency-Insensitive Design to Communication-Based System-Level Design. Proc. of the IEEE (2015).Google Scholar
Cross Ref
- L. P. Carloni. 2016. The Case for Embedded Scalable Platforms. In Proc. of the ACM/IEEE Design Automation Conference (DAC). (Invited). Google Scholar
Digital Library
- Y. Chen, T. Luo, S. Liu, S. Zhang, L. He, J. Wang, L. Li, T. Chen, Z. Xu, N. Sun, and O. Temam. 2014. DaDianNao: A Machine-Learning Supercomputer. In Proc. of the Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). Google Scholar
Digital Library
- Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits (2017).Google Scholar
- J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, K. Gururaj, and G. Reinman. 2014. Accelerator-Rich Architectures: Opportunities and Progresses. In Proc. of the ACM/IEEE Design Automation Conference (DAC). Google Scholar
Digital Library
- J. Cong, P. Li, B. Xiao, and P. Zhang. 2016. An Optimal Microarchitecture for Stencil Computation Acceleration Based on Nonuniform Partitioning of Data Reuse Buffers. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2016). Google Scholar
Digital Library
- J. Cong, P. Wei, C. H. Yu, and P. Zhou. 2017. Bandwidth Optimization Through On-Chip Memory Restructuring for HLS. In Proc. of the Annual Design Automation Conference (DAC). Google Scholar
Digital Library
- J. Cong, P. Zhang, and Y. Zou. 2011. Combined Loop Transformation and Hierarchy Allocation for Data Reuse Optimization. In Proc. of the ACM/IEEE International Conference on Computer-Aided Design (ICCAD). Google Scholar
Digital Library
- J. Cong, P. Zhang, and Y. Zou. 2012. Optimizing Memory Hierarchy Allocation with Loop Transformations for High-Level Synthesis. In Proc. of the ACM/IEEE Design Automation Conference (DAC). Google Scholar
Digital Library
- E. G. Cota, P. Mantovani, G. Di Guglielmo, and L. P. Carloni. 2015. An Analysis of Accelerator Coupling in Heterogeneous Architectures. In Proc. of the ACM/IEEE Design Automation Conference (DAC). Google Scholar
Digital Library
- F. Ferrandi, P. L. Lanzi, D. Loiacono, C. Pilato, and D. Sciuto. 2008. A Multi-objective Genetic Algorithm for Design Space Exploration in High-Level Synthesis. In Proc. of the IEEE Computer Society Annual Symposium on VLSI. Google Scholar
Digital Library
- A. Gerstlauer, C. Haubelt, A. D. Pimentel, T. P. Stefanov, D. D. Gajski, and J. Teich. 2009. Electronic System-level Synthesis Methodologies. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2009). Google Scholar
Digital Library
- F. Ghenassia. 2006. Transaction-Level Modeling with SystemC. Springer-Verlag. Google Scholar
Digital Library
- T. J. Ham, L. Wu, N. Sundaram, N. Satish, and M. Martonosi. 2016. Graphicionado: A High-Performance and Energy-Efficient Accelerator for Graph Analytics. In Proc. of the Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).Google Scholar
- C. Haubelt and J. Teich. 2003. Accelerating Design Space Exploration Using Pareto-Front Arithmetics {SoC design}. In Proc. of the ACM/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC). Google Scholar
Digital Library
- M. Horowitz. 2014. Computing’s energy problem (and what we can do about it). In Proc. of the IEEE International Solid-State Circuits Conference (ISSCC).Google Scholar
Cross Ref
- L. W. Kim. 2017. DeepX: Deep Learning Accelerator for Restricted Boltzmann Machine Artificial Neural Networks. IEEE Transactions on Neural Networks and Learning Systems (2017).Google Scholar
- S. Kurra, N. K. Singh, and P. R. Panda. 2007. The Impact of Loop Unrolling on Controller Delay in High Level Synthesis. In Proc. of the ACM/IEEE Conference on Design, Automation and Test in Europe (DATE). Google Scholar
Digital Library
- B. Li, Z. Fang, and R. Iyer. 2011. Template-based Memory Access Engine for Accelerators in SoCs. In Proc. of the ACM/IEEE Asia and South Pacific Design Automation Conference (ASP-DAC). Google Scholar
Digital Library
- H. Y. Liu and L. P. Carloni. 2013. On Learning-Based Methods for Design-Space Exploration with High-Level Synthesis. In Proc. of the ACM/IEEE Design Automation Conference (DAC). Google Scholar
Digital Library
- H. Y. Liu, I. Diakonikolas, M. Petracca, and L. P. Carloni. 2011. Supervised Design Space Exploration by Compositional Approximation of Pareto Sets. In Proc. of the ACM/IEEE Design Automation Conference (DAC). Google Scholar
Digital Library
- H. Y. Liu, M. Petracca, and L. P. Carloni. 2012. Compositional System-Level Design Exploration with Planning of High-Level Synthesis. In Proc. of the AMC/IEEE Conference on Design, Automation, and Test in Europe (DATE). Google Scholar
Digital Library
- X. Liu, Y. Chen, T. Nguyen, S. Gurumani, K. Rupnow, and D. Chen. 2016. High Level Synthesis of Complex Applications: An H.264 Video Decoder. In Proc. of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). Google Scholar
Digital Library
- M. J. Lyons, M. Hempstead, G. Y. Wei, and D. Brooks. 2012. The Accelerator Store: A Shared Memory Framework for Accelerator-based Systems. ACM Transactions on Architecture and Code Optimization (2012). Google Scholar
Digital Library
- A. Mahapatra and B. Carrion Schafer. 2014. Machine-learning based Simulated Annealer Method for High Level Synthesis Design Space Exploration. In Proc. of the Electronic System Level Synthesis Conference (ESLsyn).Google Scholar
- W. Meeus, K. Van Beeck, T. Goedemé, J. Meel, and D. Stroobandt. 2012. An Overview of Today’s High-Level Synthesis Tools. Design Automation for Embedded Systems (2012). Google Scholar
Digital Library
- V. K. Mishra and A. Sengupta. 2014. PSDSE: Particle Swarm Driven Design Space Exploration of Architecture and Unrolling Factors for Nested Loops in High Level Synthesis. In Proc. of the IEEE International Symposium on Electronic System Design (ISED).Google Scholar
- T. Murata. 1989. Petri Nets: Properties, Analysis and Applications. Proc. of the IEEE (1989).Google Scholar
Cross Ref
- L. Piccolboni, P. Mantovani, G. Di Guglielmo, and L. P. Carloni. 2017. Broadening the Exploration of the Accelerator Design Space in Embedded Scalable Platforms. In Proc. of the IEEE High Performance Extreme Computing Conference (HPEC).Google Scholar
- C. Pilato, P. Mantovani, G. Di Guglielmo, and L. P. Carloni. 2014. System-level Memory Optimization for High-level Synthesis of Component-based SoCs. In Proc. of the ACM/IEEE International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS). Google Scholar
Digital Library
- C. Pilato, P. Mantovani, G. Di Guglielmo, and L. P. Carloni. 2017. System-Level Optimization of Accelerator Local Memory for Heterogeneous Systems-on-Chip. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2017). Google Scholar
Digital Library
- R. Porter, A. M. Fraser, and D. Hush. 2010. Wide-Area Motion Imagery. IEEE Signal Processing Magazine (2010).Google Scholar
- A. Qamar, F. B. Muslim, F. Gregoretti, L. Lavagno, and M. T. Lazarescu. 2017. High-Level Synthesis for Semi-Global Matching: Is the Juice Worth the Squeeze? IEEE Access (2017).Google Scholar
- C. V. Ramamoorthy and G. S. Ho. 1980. Performance Evaluation of Asynchronous Concurrent Systems Using Petri Nets. IEEE Transaction on Software Engineering (1980). Google Scholar
Digital Library
- B. Reagen, P. Whatmough, R. Adolf, S. Rama, H. Lee, S. K. Lee, J. M. HernÃandez-Lobato, G. Y. Wei, and D. Brooks. 2016. Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators. In Proc. of the ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). Google Scholar
Digital Library
- A. Sangiovanni-Vincentelli. 2007. Quo Vadis, SLD? Reasoning About the Trends and Challenges of System Level Design. Proc. of the IEEE (2007).Google Scholar
Cross Ref
- B. Carrion Schafer. 2016. Probabilistic Multiknob High-Level Synthesis Design Space Exploration Acceleration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2016). Google Scholar
Digital Library
- B. Carrion Schafer, T. Takenaka, and K. Wakabayashi. 2009. Adaptive Simulated Annealer for High Level Synthesis Design Space Exploration. In Proc. of the IEEE International Symposium on VLSI Design, Automation and Test (VLSI-DAT).Google Scholar
- B. Carrion Schafer and K. Wakabayashi. 2012. Machine Learning Predictive Modelling High-Level Synthesis Design Space Exploration. IET Computers Digital Techniques (2012).Google Scholar
- A. Seznec. 2015. Bank-interleaved Cache or Memory Indexing Does Not Require Euclidean Division. In Proc. of the Annual Workshop on Duplicating, Deconstructing, and Debunking (WDDD).Google Scholar
- Y. S. Shao, B. Reagen, G. Y. Wei, and D. Brooks. 2014. Aladdin: A Pre-RTL, Power-performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. In Proc. of the ACM/IEEE Annual International Symposium on Computer Architecture (ISCA). Google Scholar
Digital Library
- C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. 2015. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks. In Proc. of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA). Google Scholar
Digital Library
Index Terms
COSMOS: Coordination of High-Level Synthesis and Memory Optimization for Hardware Accelerators
Recommendations
Spatial: a language and compiler for application accelerators
PLDI 2018: Proceedings of the 39th ACM SIGPLAN Conference on Programming Language Design and ImplementationIndustry is increasingly turning to reconfigurable architectures like FPGAs and CGRAs for improved performance and energy efficiency. Unfortunately, adoption of these architectures has been limited by their programming models. HDLs lack abstractions for ...
From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded SystemsEmbedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
A metaprogrammed C++ framework for hardware/software component integration and communication
With the ever growing complexity of System-on-Chip design, a considerable effort has been made to introduce higher levels of abstraction and to integrate high-level synthesis solutions to the design flow. In such design flows, a uniform communication ...






Comments