skip to main content
research-article

VirtualSoC: A Research Tool for Modern MPSoCs

Published:13 October 2016Publication History
Skip Abstract Section

Abstract

Architectural heterogeneity has proven to be an effective design paradigm to cope with an ever-increasing demand for computational power within tight energy budgets, in virtually every computing domain. Programmable manycore accelerators are currently widely used not only in high-performance computing systems, but also in embedded devices, in which they operate as coprocessors under the control of a general-purpose CPU (the host processor). Clearly, such powerful hardware architectures are paired with sophisticated and complex software ecosystems, composed of operating systems, programming models plus associated runtime engines, and increasingly complex user applications with related libraries. System modeling has always played a key role in early architectural exploration or software development when the real hardware is not available. The necessity of efficiently coping with the huge HW/SW design space provided by the described heterogeneous Systems on Chip (SoCs) calls for advanced full-system simulation methodologies and tools, capable of assessing various metrics for the functional and nonfunctional properties of the target system. In this article, we describe VirtualSoC, a simulation tool targeting the full-system simulation of massively parallel heterogeneous SoCs. We also describe how VirtualSoC has been successfully adopted in several research projects.

References

  1. José L. Abellán, Juan Fernández, Manuel E. Acacio, Davide Bertozzi, Daniele Bortolotti, Andrea Marongiu, and Luca Benini. 2012. Design of a collective communication infrastructure for barrier synchronization in cluster-based nanoscale MPSoCs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, 491--496. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Adapteva. 2013. Epiphany Architecture Reference. Retrieved September 9, 2016 from http://www.adapteva.com/docs/epiphany_arch_ref.pdf.Google ScholarGoogle Scholar
  3. Ali Bakhoda, George L. Yuan, Wilson W. L. Fung, Henry Wong, and Tor M. Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’09). IEEE, 163--174.Google ScholarGoogle Scholar
  4. Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In USENIX 2005 Annual Technical Conference (DATE’05), FREENIX Track. 41--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Luca Benini, Eric Flamand, Didier Fuin, and Diego Melpignano. 2012. P2012: Building an ecosystem for a scalable, modular and high-efficiency embedded computing accelerator. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’12). EDA Consortium, 983--987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, and others. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Daniele Bortolotti, Andrea Bartolini, Christian Weis, Davide Rossi, and Luca Benini. 2014a. Hybrid memory architecture for voltage scaling in ultra-low power multi-core biomedical processors. In Design, Automation and Test in Europe Conference and Exhibition (DATE’14). IEEE, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Daniele Bortolotti, Hossein Mamaghanian, Andrea Bartolini, Maryam Ashouei, Jan Stuijt, David Atienza, Pierre Vandergheynst, and Luca Benini. 2014b. Approximate compressed sensing: Ultra-low power biosignal processing via aggressive voltage scaling on a hybrid memory multi-core processor. In Proceedings of the 2014 International Symposium on Low Power Electronics and Design. ACM, 45--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Nathan Brookwood. 2010. AMD fusion family of APUs: Enabling a superior, immersive PC experience. Insight 64, 1, 1--8.Google ScholarGoogle Scholar
  10. Doug Burger and Todd M. Austin. 1997. The SimpleScalar tool set, version 2.0. ACM SIGARCH Computer Architecture News 25, 3, 13--25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulations. In International Conference for High Performance Computing, Networking, Storage and Analysis (SC’11). 52:1--52:12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ik Joon Chang, Debabrata Mohapatra, and Kaushik Roy. 2011. A priority-based 6t/8t hybrid SRAM architecture for aggressive voltage scaling in video applications. IEEE Transactions on Circuits and Systems for Video Technology 21, 2, 101--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bruce R. Childers, Alex K. Jones, and Daniel Mossé. 2015. A roadmap and plan of action for community-supported empirical evaluation in computer architecture. ACM SIGOPS Operating Systems Review 49, 1, 108--117. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Leonardo Dagum and Rameshm Enon. 1998. OpenMP: An industry standard API for shared-memory programming. IEEE Computational Science 8 Engineering 5, 1, 46--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. M. Dall’Osso, G. Biccari, L. Giovannini, D. Bertozzi, and L. Benini. 2012. Xpipes: A latency insensitive parameterized network-on-chip architecture for multi-processor SoCs. In IEEE 30th International Conference on Computer Design (ICCD’12). 45--48. DOI:http://dx.doi.org/10.1109/ICCD.2012.6378615 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Benoît Dupont de Dinechin, Renaud Ayrignac, Pierre-Edouard Beaucamps, Patrice Couvert, Benoit Ganne, Pierre Guironnet de Massas, Frederique Jacquet, Simon Jones, Nicolas Morey Chaisemartin, Frédéric Riss, and others. 2013. A clustered manycore processor architecture for embedded and accelerated applications. In IEEE High Performance Extreme Computing Conference (HPEC’13). IEEE, 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  17. Cesare Ferri, Andrea Marongiu, Benjamin Lipton, R. Iris Bahar, Tali Moreshet, Luca Benini, and Maurice Herlihy. 2011. SoC-TM: Integrated HW/SW support for transactional memory programming on embedded MPSoCs. In Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11), part of ESWeek’11 7th Embedded Systems Week, Taipei, Taiwan, 9-14 October, 2011. 39--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Christophe Guillon. 2011. Program instrumentation with QEMU. In 1st International QEMU Users Forum, Vol. 1. 15--18.Google ScholarGoogle Scholar
  19. Alvaro Gutierrez, Joseph Pusdesris, Ronald G. Dreslinski, Trevor Mudge, Chander Sudanthi, Christopher D. Emmons, Mitchell Hayenga, and Nigel Paver. 2014. Sources of error in full-system simulation. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’14). IEEE, 13--22.Google ScholarGoogle ScholarCross RefCross Ref
  20. C. Helmstetter and V. Joloboff. 2008. SimSoC: A SystemC TLM integrated ISS for full system simulation. In IEEE Asia Pacific Conference on Circuits and Systems (APCCAS’08). 1759--1762.Google ScholarGoogle Scholar
  21. Maurice Herlihy and J. Eliot B. Moss. 1993. Transactional Memory: Architectural Support for Lock-free Data Structures. Vol. 21. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Imperas Software. 2015. OVPSim. Retrieved September 9, 2016 from http://www.ovpworld.org/technology_ovpsim.Google ScholarGoogle Scholar
  23. James Jeffers and James Reinders. 2013. Intel Xeon Phi Coprocessor High-performance Programming. Newnes, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Kalray. 2015. MPPA 256 - Programmable Manycore Processor. Retrieved September 9, 2016 from www.kalray.eu/products/mppa-manycore/mppa-256.Google ScholarGoogle Scholar
  25. Khronos OpenCL Working Group and others. 2008. The OpenCL specification. A. Munshi, ed.Google ScholarGoogle Scholar
  26. Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. Computer 35, 2, 50--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Hossein Mamaghanian, Nadia Khaled, David Atienza, and Pierre Vandergheynst. 2011. Compressed sensing for real-time energy-efficient ECG compression on wireless body sensor nodes. IEEE Transactions on Biomedical Engineering 58, 9, 2456--2466.Google ScholarGoogle ScholarCross RefCross Ref
  28. Andrea Marongiu, Alessandro Capotondi, and Luca Benini. 2016. Controlling {NUMA} effects in embedded manycore applications with lightweight nested parallelism support. Parallel Computing In press. DOI:http://dx.doi.org/10.1016/j.parco.2016.02.002Google ScholarGoogle Scholar
  29. Andrea Marongiu, Alessandro Capotondi, Giuseppe Tagliavini, and Luca Benini. 2015. Simplifying many-core-based heterogeneous SoC programming with offload directives. IEEE Transactions on Industrial Informatics 11, 4, 957--967.Google ScholarGoogle ScholarCross RefCross Ref
  30. Aline Mello, Isaac Maia, Alain Greiner, and Francois Pecheux. 2010. Parallel simulation of SystemC TLM 2.0 compliant MPSoC on SMP workstations. In Design, Automation 8 Test in Europe Conference 8 Exhibition (DATE’10). IEEE, 606--609. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. MentorGraphics. 2015. Vista Virtual Prototyping. (2015). Retrieved September 9, 2016 from https://www.mentor.com/esl/vista/virtual-prototyping/.Google ScholarGoogle Scholar
  32. Marius Monton, Antoni Portero, Marc Moreno, Borja Martinez, and Jordi Carrabina. 2007. Mixed SW/SystemC SoC emulation framework. In IEEE International Symposium on Industrial Electronics (ISIE’07). 2338--2341.Google ScholarGoogle ScholarCross RefCross Ref
  33. NVIDIA. 2015. NVIDIA Tegra X1. Retrieved September 9, 2016 from http://www.nvidia.com/object/tegra-x1-processor.html.Google ScholarGoogle Scholar
  34. NVIDIA Corp. 2015. NVIDIA Tegra X1 Architecture. Retrieved September 9, 2016 from http://international.download.nvidia.com/pdf/tegra/Tegra-X1-whitepaper-v1.0.pdf.Google ScholarGoogle Scholar
  35. OpenACC. 2013. The OpenACC Application Programming Interface, Version 2.0. Retrieved September 9, 2016 from http://www.openacc.org/sites/default/files/OpenACC.2.0a_1.pdf.Google ScholarGoogle Scholar
  36. OpenMP Architecture Review Board. 2013. OpenMP Application Program Interface Version 4.0. Retrieved September 9, 2016 from http://www.openmp.org/mp-documents/OpenMP4.0.0.pdf.Google ScholarGoogle Scholar
  37. OSCI. 2009. Open SystemC Initiative (OSCI) TLM-2.0 LANGUAGE REFERENCE MANUAL. Retrieved September 9, 2016 from http://www.accellera.org/images/downloads/standards/systemc/TLM_2_0_LRM.pdf.Google ScholarGoogle Scholar
  38. Dimitra Papagiannopoulou, Tali Moreshet, Andrea Marongiu, Luca Benini, Maurice Herlihy, and R. Iris Bahar. 2014. Speculative synchronization for coherence-free embedded NUMA architectures. In International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS XIV’14). IEEE, 99--106.Google ScholarGoogle Scholar
  39. Avadh Patel, Furat Afram, Shunfei Chen, and Kanad Ghose. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Proceedings of the 48th Design Automation Conference. ACM, 1050--1055. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. PEZY. 2015. PEZY-SC Many Core Processor. Retrieved September 9, 2016 from http://www.pezy.co.jp/en/products/pezy-sc.html.Google ScholarGoogle Scholar
  41. Christian Pinto, Shivani Raghav, Andrea Marongiu, Martino Ruggiero, David Atienza, and Luca Benini. 2011. GPGPU-accelerated parallel and fast simulation of thousand-core platforms. In Proceedings of the 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGRID’11). IEEE Computer Society, Washington, DC, 53--62. DOI:http://dx.doi.org/10.1109/CCGrid.2011.64 Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Plurality Ltd. 2010. The hypercore architecture. White paper. Technical report version 1.7.Google ScholarGoogle Scholar
  43. J. Power, J. Hestness, M. S. Orr, M. D. Hill, and D. A. Wood. 2015. Gem5-gpu: A heterogeneous CPU-GPU simulator. Computer Architecture Letters 14, 1, 34--36. DOI:http://dx.doi.org/10.1109/LCA.2014.2299539Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. PULP. 2016. PULP - An Open Parallel Ultra-Low-Power Processing-Platform. Retrieved September 9, 2016 from http://iis-projects.ee.ethz.ch/index.php/PULP.Google ScholarGoogle Scholar
  45. Shivani Raghav, Andrea Marongiu, Christian Pinto, David Atienza, Martino Ruggiero, and Luca Benini. 2012. Full system simulation of many-core heterogeneous SoCs using GPU and QEMU semihosting. In Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units (GPGPU’12). ACM, New York, NY, 101--109. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Shivani Raghav, Andrea Marongiu, Christian Pinto, Martino Ruggiero, David Atienza, and Luca Benini. 2013. SIMinG-1k: A thousand-core simulator running on general-purpose graphical processing units. Concurrency and Computation: Practice and Experience 25, 10, 1443--1461.Google ScholarGoogle ScholarCross RefCross Ref
  47. Abbas Rahimi, Daniele Cesarini, Andrea Marongiu, Rajesh K. Gupta, and Luca Benini. 2015. Task scheduling strategies to mitigate hardware variability in embedded shared memory clusters. In Proceedings of the 52nd Annual Design Automation Conference (DAC’15). ACM, New York, NY, Article 152, 152:1--152:6 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Abbas Rahimi, Igor Loi, Mohammad Reza Kakoee, and Luca Benini. 2011. A fully-synthesizable single-cycle interconnection network for shared-L1 processor clusters. In Design, Automation Test in Europe Conference Exhibition (DATE’11). 1--6.Google ScholarGoogle Scholar
  49. Davide Rossi, Igor Loi, Germain Haugou, and Luca Benini. 2014. Ultra-low-latency lightweight DMA for tightly coupled multi-core clusters. In Proceedings of the 11th ACM Conference on Computing Frontiers. ACM, 15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Synopsys. 2015. Platform Architect. Retrieved September 9, 2016 from http://www.synopsys.com/Prototyping/ArchitectureDesign/pages/platform-architect.aspx.Google ScholarGoogle Scholar
  51. Texas Instruments. 2013. Multicore DSP+ARM KeyStone II System-on-Chip (SoC). Retrieved September 9, 2016 from http://www.ti.com/lit/ds/sprs866e/sprs866e.pdf.Google ScholarGoogle Scholar
  52. Rafael Ubal, Byunghyun Jang, Perhaad Mistry, Dana Schaa, and David Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 335--344. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. David Wang, Brinda Ganesh, Nuengwong Tuaycharoen, Kathleen Baynes, Aamer Jaleel, and Bruce Jacob. 2005. DRAMsim: A memory system simulator. ACM SIGARCH Computer Architecture News 33, 4, 100--107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Wind River. 2015. Simics Full System Simulator. Retrieved September 9, 2016 from http://www.windriver.com/products/simics.Google ScholarGoogle Scholar
  55. Matt T. Yourst. 2007. PTLsim: A cycle accurate full system x86-64 microarchitectural simulator. In IEEE International Symposium on Performance Analysis of Systems 8 Software (ISPASS’07). IEEE, 23--34.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. VirtualSoC: A Research Tool for Modern MPSoCs

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!