skip to main content
research-article

Efficiently parallelizing instruction set simulation of embedded multi-core processors using region-based just-in-time dynamic binary translation

Authors Info & Claims
Published:12 June 2012Publication History
Skip Abstract Section

Abstract

Embedded systems, as typified by modern mobile phones, are already seeing a drive toward using multi-core processors. The number of cores will likely increase rapidly in the future. Engineers and researchers need to be able to simulate systems, as they are expected to be in a few generations time, running simulations of many-core devices on today's multi-core machines. These requirements place heavy demands on the scalability of simulation engines, the fastest of which have typically evolved from just-in-time (Jit) dynamic binary translators (Dbt).

Existing work aimed at parallelizing Dbt simulators has focused exclusively on trace-based Dbt, wherein linear execution traces or perhaps trees thereof are the units of translation. Region-based Dbt simulators have not received the same attention and require different techniques than their trace-based cousins.

In this paper we develop an innovative approach to scaling multi-core, embedded simulation through region-based Dbt. We initially modify the Jit code generator of such a simulator to emit code that does not depend on a particular thread with its thread-specific context and is, therefore, thread-agnostic. We then demonstrate that this thread-agnostic code generation is comparable to thread-specific code with respect to performance, but also enables the sharing of JIT-compiled regions between different threads. This sharing optimisation, in turn, leads to significant performance improvements for multi-threaded applications. In fact, our results confirm that an average of 76% of all JIT-compiled regions can be shared between 128 threads in representative, parallel workloads. We demonstrate that this translates into an overall performance improvement by 1.44x on average and up to 2.40x across 12 multi-threaded benchmarks taken from the Splash-2 benchmark suite, targeting our high-performance multi-core Dbt simulator for embedded Arc processors running on a 4-core Intel host machine.

References

  1. Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation (New York, NY, USA), PLDI '00, ACM, 2000, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Michael Bebenita, Mason Chang, Gregor Wagner, Andreas Gal, Christian Wimmer, and Michael Franz, Trace-based compilation in execution environments without interpreters, Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java (New York, NY, USA), PPPJ '10, ACM, 2010, pp. 59--68. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Fabrice Bellard, QEMU, a fast and portable dynamic translator, Proceedings of the annual conference on USENIX Annual Technical Conference (Berkeley, CA, USA), ATEC '05, USENIX Association, 2005, pp. 41--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Igor Böhm, Tobias J. K. Edler von Koch, Stephen Kyle, Björn Franke, and Nigel Topham, Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'11, ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Brandner, A. Fellnhofer, A. Krall, and D. Riegler, Fast and accurate simulation using the LLVM compiler framework, Proceedings of the 1st Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO'09, 2009, pp. 1--6.Google ScholarGoogle Scholar
  6. Derek Bruening, Vladimir Kiriansky, Timothy Garnett, and Sanjeev Banerji, Thread-shared software code caches, Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC, USA), CGO '06, IEEE Computer Society, 2006, pp. 28--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Martin Burtscher and et al., Automatic synthesis of high-speed processor simulators, Proceedings of the 37th annual International Symposium on Microarchitecture, MICRO'04, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Simone Campanoni, Giovanni Agosta, and Stefano Crespi Reghizzi, A parallel dynamic compiler for CIL bytecode, SIGPLAN Not. 43 (2008), 11--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. W.-K. Chen, S. Lerner, R. Chaiken, and D. M. Gillies, Mojo: a dynamic optimization system, Proceedings of the Third ACM Workshop on Feedback-Directed and Dynamic Optimization, FDDO'00, 2000.Google ScholarGoogle Scholar
  10. Bob Cmelik and David Keppel, Shade: A fast instruction-set simulator for execution profiling, Proceedings of the ACM SIGMETRICS Conference on the Measurement and Modeling of Computer Systems, SIGMETRICS'94, 1994, pp. 128--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Jianwen Zhu Daniel and Daniel D. Gajski, A retargetable, ultra-fast instruction set simulator, Proceedings of the Design Automation and Test Conference In Europe, DATE'95, 1995, pp. 363--373. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kemal Ebcioglu and Erik R. Altman, Daisy: Dynamic compilation for 100% architectural compatibility, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gal, C. W. Probst, and M. Franz, HotpathVM: an effective JIT compiler for resource-constrained devices, Proceedings of the 2nd International Conference on Virtual Execution Environments, VEE'06, ACM, 2006, pp. 144--153. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Andreas Gal, Michael Bebenita, Mason Chang, and Michael Franz, Making the compilation "pipeline" explicit: Dynamic compilation using trace tree serialization, Tech. Report 07--12, University of California, Irvine, 2007.Google ScholarGoogle Scholar
  15. Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R. Haghighat, Blake Kaplan, Graydon Hoare, Boris Zbarsky, Jason Orendorff, Jesse Ruderman, Edwin W. Smith, Rick Reitmaier, Michael Bebenita, Mason Chang, and Michael Franz, Trace-based just-in-time type specialization for dynamic languages, Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation (New York, NY, USA), PLDI '09, ACM, 2009, pp. 465--478. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Michael Gschwind, Erik R. Altman, Sumedh Sathaye, Paul Ledak, and David Appenzeller, Dynamic and transparent binary translation, Computer 33 (2000), 54--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jungwoo Ha, Mohammad R. Haghighat, Shengnan Cong, and Kathryn S. McKinley, A concurrent trace-based just-in-time compiler for single-threaded Javascript, Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, PESPMA'09, June 2009, in conjunction with ISCA 09.Google ScholarGoogle Scholar
  18. Christian Häubl and Hanspeter Mössenböck, Trace-based compilation for the Java HotSpot virtual machine, Proceedings of the International Conference on Principles and Practice of Programming in Java (Kongens Lyngby, Denmark), PPPJ'11, August 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. H. Inoue, H. Hayashizaki, Peng Wu, and T. Nakatani, A trace-based Java JIT compiler retrofitted from a method-based compiler, Code Generation and Optimization, 2011 9th Annual IEEE/ACM International Symposium on, CGO'11, april 2011, pp. 246--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Alexander Klaiber, The Technology Behind Crusoe Processors, Tech. report, Transmeta Corporation, January 2000.Google ScholarGoogle Scholar
  21. Chandra Krintz, David Grove, Derek Lieber, Vivek Sarkar, and Brad Calder, Reducing the overhead of dynamic compilation, Software: Practice And Experience 31 (2000), 200--1.Google ScholarGoogle Scholar
  22. P.A. Kulkarni and J. Fuller, JIT compilation policy on single-core and multi-core machines, Interaction between Compilers and Computer Architectures, 2011 15th Workshop on, INTERACT'11, feb. 2011, pp. 54--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Prasad Kulkarni, Matthew Arnold, and Michael Hind, Dynamic compilation: the benefits of early investing, Proceedings of the 3rd international conference on Virtual execution environments (New York, NY, USA), VEE '07, ACM, 2007, pp. 94--104. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. May, Mimic: a fast System/370 simulator, Papers of the Symposium on Interpreters and interpretive techniques (New York, NY, USA), SIGPLAN '87, ACM, 1987, pp. 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. M. Mehrara and S. Mahlke, Dynamically accelerating client-side web applications through decoupled execution, Code Generation and Optimization, 2011 9th Annual IEEE/ACM International Symposium on, CGO'09, april 2011, pp. 74--84. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Christopher Mills, Stanley C. Ahalt, and Jim Fowler, Compiled instruction set simulation, 1991.Google ScholarGoogle Scholar
  27. Achim Nohl, Gunnar Braun, Oliver Schliebusch, Rainer Leupers, Heinrich Meyr, and Andreas Hoffmann, A universal technique for fast and flexible instruction-set architecture simulation, Proceedings of the 39th annual Design Automation Conference (New York, NY, USA), DAC '02, ACM, 2002, pp. 22--27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Michael Paleczny, Christopher Vick, and Cliff Click, The Java Hotspot#8482; server compiler, USENIX Java Virtual Machine Research and Technology Symposium, USENIX-JVM'01, 2001, pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Michael Plezbert and Ron K. Cytron, Does "just in time" = "better late than never"?, In Proceedings of ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL'97, ACM Press, 1997, pp. 120--131. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Wei Qin, Joseph D'Errico, and Xinping Zhu, A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation, Proceedings of the 4th international conference on Hardware/software codesign and system synthesis (New York, NY, USA), CODES+ISSS '06, ACM, 2006, pp. 193--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Mehrdad Reshadi, Prabhat Mishra, and Nikil Dutt, Instruction set compiled simulation: a technique for fast and flexible instruction set simulation, Proceedings of the 40th annual Design Automation Conference (New York, NY, USA), DAC '03, ACM, 2003, pp. 758--763. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Graham Sanderson, High performance: Writing a Sony PlayStation emulator using Java#8482; technology, 2006.Google ScholarGoogle Scholar
  33. Toshio Suganuma, Toshiaki Yasue, and Toshio Nakatani, A region-based compilation technique for dynamic compilers, ACM Trans. Program. Lang. Syst. 28 (2006), 134--174. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. P. Unnikrishnan, M. Kandemir, and F. Li, Reducing dynamic compilation overhead by overlapping compilation and execution, Proceedings of the 2006 Asia and South Pacific Design Automation Conference (Piscataway, NJ, USA), ASP-DAC '06, IEEE Press, 2006, pp. 929--934. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. John Whaley, Partial method compilation using dynamic profile information, Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (New York, NY, USA), OOPSLA '01, ACM, 2001, pp. 166--179. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Christian Wimmer, Marcelo S. Cintra, Michael Bebenita, Mason Chang, Andreas Gal, and Michael Franz, Phase detection using trace compilation, Proceedings of the 7th International Conference on Principles and Practice of Programming in Java (New York, NY, USA), PPPJ '09, ACM, 2009, pp. 172--181. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Emmett Witchel and Mendel Rosenblum, Embra: Fast and flexible machine simulation, Measurement and Modeling of Computer Systems, SIGMETRICS'96, 1996, pp. 68--79. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture (New York, NY, USA), ISCA '95, ACM, 1995, pp. 24--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Mathew Zaleski, Angela Demke Brown, and Kevin Stoodley, Yeti: a gradually extensible trace interpreter, Proceedings of the 3rd international conference on Virtual execution environments (New York, NY, USA), VEE '07, ACM, 2007, pp. 83--93. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficiently parallelizing instruction set simulation of embedded multi-core processors using region-based just-in-time dynamic binary translation

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 47, Issue 5
        LCTES '12
        MAY 2012
        152 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2345141
        Issue’s Table of Contents
        • cover image ACM Conferences
          LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
          June 2012
          153 pages
          ISBN:9781450312127
          DOI:10.1145/2248418

        Copyright © 2012 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 June 2012

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!