skip to main content
research-article

A Framework for Interconnection-Aware Domain-Specific Many-Accelerator Synthesis

Published:13 October 2016Publication History
Skip Abstract Section

Abstract

Many-accelerator Systems-on-Chip (SoC) have recently emerged as a promising platform paradigm that combines parallelization with heterogeneity, in order to cover the increasing demands for high performance and energy efficiency. To exploit the full potential of many-accelerator systems, automated design verification and analysis frameworks are required, targeted to both computational and interconnection optimization. Accurate simulation of interconnection schemes should use real stimuli, which are produced from fully functional nodes, requiring the prototyping of the processing elements and memories of the many-accelerator system. In this article, we argue that the Hierarchical Network-on-Chip (HNoC) scheme forms a very promising solution for many-accelerator systems in terms of scalability and data-congestion minimization. We present a parameterizable SystemC prototyping framework for HNoCs, targeted to domain-specific many-accelerator systems. The framework supports the prototyping of processing elements, memory modules, and underlying interconnection infrastructure, while it provides an API for their easy integration to the HNoC. Finally, it enables holistic system simulation using real node data. Using as a case study a many-accelerator system of an MRI pipeline, an analysis on the proposed framework is presented to demonstrate the impact of the system parameters on the system. Through extensive experimental analysis, we show the superiority of HNoC schemes in comparison to typical interconnection architectures. Finally, we show that, adopting the proposed many-accelerator design flow, significant performance improvements are achieved, from 1.2 × up to 26 × , as compared to a x86 software implementation of the MRI pipeline.

References

  1. Niket Agarwal, Tushar Krishna, Li-Shiuan Peh, and Niraj K. Jha. 2009. GARNET: A detailed on-chip network model inside a full-system simulator. In ISPASS’09. IEEE, 33--42.Google ScholarGoogle Scholar
  2. D. Auras, S. Girbal, H. Berry, O. Temam, and S. Yehia. 2010. CMA: Chip multi-accelerator. In IEEE 8th Symposium on Application Specific Processors (SASP’10), 2010. 8--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. S. Bourduas and Z. Zilic. 2007. A hybrid ring/mesh interconnect for network-on-chip using hierarchical rings for global routing. In 1st International Symposium on Networks-on-Chip (NOCS’07). IEEE, 195--204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BrainWeb 2014. BrainWeb: Simulated Brain Database. Retrieved September 10, 2016 from http://brainweb.bic.mni.mcgill.ca/brainweb/.Google ScholarGoogle Scholar
  6. A. Bui, Kwang-Ting Cheng, J. Cong, L. Vese, Yi-Chu Wang, Bo Yuan, and Yi Zou. 2012. Platform characterization for domain-specific computing. In ASP-DAC’12. 94--99.Google ScholarGoogle Scholar
  7. CACTI 2014. CACTI: An integrated cache and memory access time, cycle time, area, leakage, and dynamic power model. Retrieved September 10, 2016 from http://www.hpl.hp.com/research/cacti/.Google ScholarGoogle Scholar
  8. CatapultC 2013. CatapultC, from Calypto website. Retrieved September 10, 2016 from http://calypto.com/en/products/catapult/overview.Google ScholarGoogle Scholar
  9. CDSC 2015. CDSC: Center of Domain-Specific Computing -- Image processing pipeline. Retrieved September 10, 2016 from https://code.google.com/p/cdsc-image-processing-pipeline/. (2015).Google ScholarGoogle Scholar
  10. J. Cong, Z. Fang, M. Gill, and G. Reinman. 2015a. PARADE: A cycle-accurate full-system simulation platform for accelerator-rich architectural design and exploration. In IEEE/ACM International Conference on Computer-Aided Design. IEEE Press, 380--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, Hui Huang, and G. Reinman. 2013. Composable accelerator-rich microprocessor enhanced for adaptivity and longevity. In ISLPED’13. 305--310. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. 2012a. Architecture support for accelerator-rich CMPs. In 49th ACM/EDAC/IEEE Design Automation Conference (DAC’12). 843--849. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. J. Cong, M. A. Ghodrat, M. Gill, B. Grigorian, and G. Reinman. 2012b. CHARM: A composable heterogeneous accelerator-rich microprocessor. In ISLPED’12. ACM, New York, NY, 379--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Cong, Mi. Gill, Y. Hao, G. Reinman, and B. Yuan. 2015b. On-chip interconnection network for accelerator-rich architectures. In 52nd Annual Design Automation Conference, San Francisco, CA, June 7--11, 2015. 8:1--8:6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. William Dally and Brian Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann Publishers Inc., San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Das, S. Eachempati, A. K. Mishra, V. Narayanan, and C. R. Das. 2009. Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs. In HPCA’09. 175--186.Google ScholarGoogle Scholar
  17. T. Hollstein, R. Ludewig, H. Zimmer, C. Mager, S. Hohenstern, and M. Glesner. 2006. HiNoC: A hierarchical generic approach for on-chip communication, testing and debugging of SoCs. In VLSI-SOC: From Systems to Chips. Springer, 39--54.Google ScholarGoogle Scholar
  18. R. Iyer. 2012. Accelerator-rich architectures: Implications, opportunities and challenges. In ASP-DAC’12. 106--107.Google ScholarGoogle Scholar
  19. K. Keutzer, A. R. Newton, J. M. Rabaey, and A. Sangiovanni-Vincentelli. 2006. System-level design: Orthogonalization of concerns and platform-based design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 19, 12, 1523--1543. DOI:http://dx.doi.org/10.1109/43.898830 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Matos, G. Palermo, V. Zaccaria, C. Reinbrecht, A. Susin, C. Silvano, and L. Carro. 2011. Floorplanning-aware design space exploration for application-specific hierarchical networks on-chip. In Proceedings of the 4th International Workshop on Network on Chip Architectures. ACM, 31--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Maurizio Palesi, Davide Patti, and Fabrizio Fazzino. 2010. Noxim: The NoC simulator. Retrieved September 10, 2016 from http://noxim. sourceforge. netGoogle ScholarGoogle Scholar
  22. E. Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, and G. Economakos. 2014a. Co-design of many-accelerator heterogeneous systems exploiting virtual platforms. In IC-SAMOS’14. 1--8.Google ScholarGoogle Scholar
  23. E. Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos, and D. Soudris. 2014b. Effective platform-level exploration for heterogeneous multicores exploiting simulation-induced slacks. In PARMA-DITAM’14. ACM, New York, NY, Article 13, 4 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. E. Sotiriou-Xanthopoulos, S. Xydis, K. Siozios, G. Economakos, and D. Soudris. 2014c. Hardware accelerated Rician denoise algorithm for high performance magnetic resonance imaging. In MOBIHEALTH.Google ScholarGoogle Scholar
  25. J. Teich. 2012. Hardware/software codesign: The past, the present, and predicting the future. Proceedings of IEEE 100, Special Centennial Issue, 1411--1430. DOI:http://dx.doi.org/10.1109/JPROC.2011.2182009Google ScholarGoogle Scholar
  26. A. T. Tran and B. M. Baas. 2012. NoCTweak: A highly parameterizable simulator for early exploration of performance and energy efficiency of networks on-chip. Dept. Electr. Comput. Eng., Univ. California, Davis, CA, USA, Tech. Rep. ECE-VCL-2012-2 (2012).Google ScholarGoogle Scholar
  27. WB Builder. 2014. Wishbone Builder, by OpenCores.org. http://opencores.org/project,wb_builder.Google ScholarGoogle Scholar
  28. M. Winter, S. Prusseit, and PF Gerhard. 2010. Hierarchical routing architectures in clustered 2D-mesh networks-on-chip. In ISOCC’10. IEEE, 388--391.Google ScholarGoogle Scholar
  29. Wishbone 2010. Wishbone Bus Protocol, by OpenCores.org. http://opencores.org/opencores,wishbone. (2010).Google ScholarGoogle Scholar

Index Terms

  1. A Framework for Interconnection-Aware Domain-Specific Many-Accelerator Synthesis

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Article Metrics

            • Downloads (Last 12 months)3
            • Downloads (Last 6 weeks)1

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!