Abstract
Many-accelerator Systems-on-Chip (SoC) have recently emerged as a promising platform paradigm that combines parallelization with heterogeneity, in order to cover the increasing demands for high performance and energy efficiency. To exploit the full potential of many-accelerator systems, automated design verification and analysis frameworks are required, targeted to both computational and interconnection optimization. Accurate simulation of interconnection schemes should use real stimuli, which are produced from fully functional nodes, requiring the prototyping of the processing elements and memories of the many-accelerator system. In this article, we argue that the Hierarchical Network-on-Chip (HNoC) scheme forms a very promising solution for many-accelerator systems in terms of scalability and data-congestion minimization. We present a parameterizable SystemC prototyping framework for HNoCs, targeted to domain-specific many-accelerator systems. The framework supports the prototyping of processing elements, memory modules, and underlying interconnection infrastructure, while it provides an API for their easy integration to the HNoC. Finally, it enables holistic system simulation using real node data. Using as a case study a many-accelerator system of an MRI pipeline, an analysis on the proposed framework is presented to demonstrate the impact of the system parameters on the system. Through extensive experimental analysis, we show the superiority of HNoC schemes in comparison to typical interconnection architectures. Finally, we show that, adopting the proposed many-accelerator design flow, significant performance improvements are achieved, from 1.2 × up to 26 × , as compared to a x86 software implementation of the MRI pipeline.
- Niket Agarwal, Tushar Krishna, Li-Shiuan Peh, and Niraj K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS’09. IEEE, 33--42.Google Scholar
- D. Auras, S. Girbal, H. Berry, O. Temam, and S. Yehia. 2010. CMA: Chip multi-accelerator. In IEEE 8th Symposium on Application Specific Processors (SASP’10), 2010. 8--15. Google Scholar
Digital Library
- N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7. Google Scholar
Digital Library
- S. Bourduas and Z. Zilic. 2007. A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing. In 1st International Symposium on Networks-on-Chip (NOCS’07). IEEE, 195--204. Google Scholar
Digital Library
- BrainWeb 2014. BrainWeb: Simulated Brain Database. Retrieved September 10, 2016 from http://brainweb.bic.mni.mcgill.ca/brainweb/.Google Scholar
- A. Bui, Kwang-Ting Cheng, J. Cong, L. Vese, Yi-Chu Wang, Bo Yuan, and Yi Zou. 2012. Platform characterization for domain-specific computing. In ASP-DAC’12. 94--99.Google Scholar
- CACTI 2014. CACTI: An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model. Retrieved September 10, 2016 from http://www.hpl.hp.com/research/cacti/.Google Scholar
- CatapultC 2013. CatapultC, from Calypto website. Retrieved September 10, 2016 from http://calypto.com/en/products/catapult/overview.Google Scholar
- CDSC 2015. CDSC: Center of Domain-Specific Computing -- Image processing pipeline. Retrieved September 10, 2016 from https://code.google.com/p/cdsc-image-processing-pipeline/. (2015).Google Scholar
- J. Cong, Z. Fang, M. Gill, and G. Reinman. 2015a. PARADE: A cycle-accurate full-system simulation platform for accelerator-rich architectural design and exploration. In IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 380--387. Google Scholar
Digital Library
- J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, Hui Huang, and G. Reinman. 2013. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In ISLPED’13. 305--310. Google Scholar
Digital Library
- J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. 2012a. Architecture support for accelerator-rich CMPs. In 49th ACM/EDAC/IEEE Design Automation Conference (DAC’12). 843--849. Google Scholar
Digital Library
- J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. 2012b. CHARM: A composable heterogeneous accelerator-rich microprocessor. In ISLPED’12. ACM, New York, NY, 379--384. Google Scholar
Digital Library
- J. Cong, Mi. Gill, Y. Hao, G. Reinman, and B. Yuan. 2015b. On-chip interconnection network for accelerator-rich architectures. In 52nd Annual Design Automation Conference, San Francisco, CA, June 7--11, 2015. 8:1--8:6. Google Scholar
Digital Library
- William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google Scholar
Digital Library
- R. Das, S. Eachempati, A. K. Mishra, V. Narayanan, and C. R. Das. 2009. Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs. In HPCA’09. 175--186.Google Scholar
- T. Hollstein, R. Ludewig, H. Zimmer, C. Mager, S. Hohenstern, and M. Glesner. 2006. HiNoC: A hierarchical generic approach for on-chip communication, testing and debugging of SoCs. In VLSI-SOC: From Systems to Chips. Springer, 39--54.Google Scholar
- R. Iyer. 2012. Accelerator-rich architectures: Implications, opportunities and challenges. In ASP-DAC’12. 106--107.Google Scholar
- K. Keutzer, A. R. Newton, J. M. Rabaey, and A. Sangiovanni-Vincentelli. 2006. System-level design: Orthogonalization of concerns and platform-based design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 19, 12, 1523--1543. DOI:http://dx.doi.org/10.1109/43.898830 Google Scholar
Digital Library
- D. Matos, G. Palermo, V. Zaccaria, C. Reinbrecht, A. Susin, C. Silvano, and L. Carro. 2011. Floorplanning-aware design space exploration for application-specific hierarchical networks on-chip. In Proceedings of the 4th International Workshop on Network on Chip Architectures. ACM, 31--36. Google Scholar
Digital Library
- Maurizio Palesi, Davide Patti, and Fabrizio Fazzino. 2010. Noxim: The NoC simulator. Retrieved September 10, 2016 from http://noxim. sourceforge. netGoogle Scholar
- E. Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, and G. Economakos. 2014a. Co-design of many-accelerator heterogeneous systems exploiting virtual platforms. In IC-SAMOS’14. 1--8.Google Scholar
- E. Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos, and D. Soudris. 2014b. Effective platform-level exploration for heterogeneous multicores exploiting simulation-induced slacks. In PARMA-DITAM’14. ACM, New York, NY, Article 13, 4 pages. Google Scholar
Digital Library
- E. Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos, and D. Soudris. 2014c. Hardware accelerated Rician denoise algorithm for high performance magnetic resonance imaging. In MOBIHEALTH.Google Scholar
- J. Teich. 2012. Hardware/software codesign: The past, the present, and predicting the future. Proceedings of IEEE 100, Special Centennial Issue, 1411--1430. DOI:http://dx.doi.org/10.1109/JPROC.2011.2182009Google Scholar
- A. T. Tran and B. M. Baas. 2012. NoCTweak: A highly parameterizable simulator for early exploration of performance and energy efficiency of networks on-chip. Dept. Electr. Comput. Eng., Univ. California, Davis, CA, USA, Tech. Rep. ECE-VCL-2012-2 (2012).Google Scholar
- WB Builder. 2014. Wishbone Builder, by OpenCores.org. http://opencores.org/project,wb_builder.Google Scholar
- M. Winter, S. Prusseit, and PF Gerhard. 2010. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. In ISOCC’10. IEEE, 388--391.Google Scholar
- Wishbone 2010. Wishbone Bus Protocol, by OpenCores.org. http://opencores.org/opencores,wishbone. (2010).Google Scholar
Index Terms
A Framework for Interconnection-Aware Domain-Specific Many-Accelerator Synthesis
Recommendations
A modular simulation framework for architectural exploration of on-chip interconnection networks
CODES+ISSS '03: Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesisEver increasing complexity and heterogeneity of SoC platforms require diversified on-chip communication schemes beyond the currently omnipresent shared bus architectures. To prevent time consuming design changes late in the design flow, we propose the ...
Co-simulation framework of SystemC SoC virtual prototype and custom logic (abstract only)
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arraysTo address the increasing demand of System-on-Chip (SoC) for high performance applications and IP programmability, specialized SoC with custom logic is developed in a single chip or multi-chip system. Like any other SoC platforms, early software ...
Multi-objective topology synthesis and FPGA prototyping framework of application specific network-on-chip
GLSVLSI '11: Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSINetwork-on-Chip (NoC) topology synthesis problem targets to generate NoC topology for multiple system design objectives such as performance and area. A multi-objective NoC synthesis and prototyping framework based on FPGA platform is proposed to design ...






Comments