Abstract
Embedded systems, as typified by modern mobile phones, are already seeing a drive toward using multi-core processors. The number of cores will likely increase rapidly in the future. Engineers and researchers need to be able to simulate systems, as they are expected to be in a few generations time, running simulations of many-core devices on today's multi-core machines. These requirements place heavy demands on the scalability of simulation engines, the fastest of which have typically evolved from just-in-time (Jit) dynamic binary translators (Dbt).
Existing work aimed at parallelizing Dbt simulators has focused exclusively on trace-based Dbt, wherein linear execution traces or perhaps trees thereof are the units of translation. Region-based Dbt simulators have not received the same attention and require different techniques than their trace-based cousins.
In this paper we develop an innovative approach to scaling multi-core, embedded simulation through region-based Dbt. We initially modify the Jit code generator of such a simulator to emit code that does not depend on a particular thread with its thread-specific context and is, therefore, thread-agnostic. We then demonstrate that this thread-agnostic code generation is comparable to thread-specific code with respect to performance, but also enables the sharing of JIT-compiled regions between different threads. This sharing optimisation, in turn, leads to significant performance improvements for multi-threaded applications. In fact, our results confirm that an average of 76% of all JIT-compiled regions can be shared between 128 threads in representative, parallel workloads. We demonstrate that this translates into an overall performance improvement by 1.44x on average and up to 2.40x across 12 multi-threaded benchmarks taken from the Splash-2 benchmark suite, targeting our high-performance multi-core Dbt simulator for embedded Arc processors running on a 4-core Intel host machine.
- Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia, Dynamo: a transparent dynamic optimization system, Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation (New York, NY, USA), PLDI '00, ACM, 2000, pp. 1--12. Google Scholar
Digital Library
- Michael Bebenita, Mason Chang, Gregor Wagner, Andreas Gal, Christian Wimmer, and Michael Franz, Trace-based compilation in execution environments without interpreters, Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java (New York, NY, USA), PPPJ '10, ACM, 2010, pp. 59--68. Google Scholar
Digital Library
- Fabrice Bellard, QEMU, a fast and portable dynamic translator, Proceedings of the annual conference on USENIX Annual Technical Conference (Berkeley, CA, USA), ATEC '05, USENIX Association, 2005, pp. 41--41. Google Scholar
Digital Library
- Igor Böhm, Tobias J. K. Edler von Koch, Stephen Kyle, Björn Franke, and Nigel Topham, Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator, Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI'11, ACM, 2011. Google Scholar
Digital Library
- F. Brandner, A. Fellnhofer, A. Krall, and D. Riegler, Fast and accurate simulation using the LLVM compiler framework, Proceedings of the 1st Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO'09, 2009, pp. 1--6.Google Scholar
- Derek Bruening, Vladimir Kiriansky, Timothy Garnett, and Sanjeev Banerji, Thread-shared software code caches, Proceedings of the International Symposium on Code Generation and Optimization (Washington, DC, USA), CGO '06, IEEE Computer Society, 2006, pp. 28--38. Google Scholar
Digital Library
- Martin Burtscher and et al., Automatic synthesis of high-speed processor simulators, Proceedings of the 37th annual International Symposium on Microarchitecture, MICRO'04, 2004. Google Scholar
Digital Library
- Simone Campanoni, Giovanni Agosta, and Stefano Crespi Reghizzi, A parallel dynamic compiler for CIL bytecode, SIGPLAN Not. 43 (2008), 11--20. Google Scholar
Digital Library
- W.-K. Chen, S. Lerner, R. Chaiken, and D. M. Gillies, Mojo: a dynamic optimization system, Proceedings of the Third ACM Workshop on Feedback-Directed and Dynamic Optimization, FDDO'00, 2000.Google Scholar
- Bob Cmelik and David Keppel, Shade: A fast instruction-set simulator for execution profiling, Proceedings of the ACM SIGMETRICS Conference on the Measurement and Modeling of Computer Systems, SIGMETRICS'94, 1994, pp. 128--137. Google Scholar
Digital Library
- Jianwen Zhu Daniel and Daniel D. Gajski, A retargetable, ultra-fast instruction set simulator, Proceedings of the Design Automation and Test Conference In Europe, DATE'95, 1995, pp. 363--373. Google Scholar
Digital Library
- Kemal Ebcioglu and Erik R. Altman, Daisy: Dynamic compilation for 100% architectural compatibility, 1997. Google Scholar
Digital Library
- A. Gal, C. W. Probst, and M. Franz, HotpathVM: an effective JIT compiler for resource-constrained devices, Proceedings of the 2nd International Conference on Virtual Execution Environments, VEE'06, ACM, 2006, pp. 144--153. Google Scholar
Digital Library
- Andreas Gal, Michael Bebenita, Mason Chang, and Michael Franz, Making the compilation "pipeline" explicit: Dynamic compilation using trace tree serialization, Tech. Report 07--12, University of California, Irvine, 2007.Google Scholar
- Andreas Gal, Brendan Eich, Mike Shaver, David Anderson, David Mandelin, Mohammad R. Haghighat, Blake Kaplan, Graydon Hoare, Boris Zbarsky, Jason Orendorff, Jesse Ruderman, Edwin W. Smith, Rick Reitmaier, Michael Bebenita, Mason Chang, and Michael Franz, Trace-based just-in-time type specialization for dynamic languages, Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation (New York, NY, USA), PLDI '09, ACM, 2009, pp. 465--478. Google Scholar
Digital Library
- Michael Gschwind, Erik R. Altman, Sumedh Sathaye, Paul Ledak, and David Appenzeller, Dynamic and transparent binary translation, Computer 33 (2000), 54--59. Google Scholar
Digital Library
- Jungwoo Ha, Mohammad R. Haghighat, Shengnan Cong, and Kathryn S. McKinley, A concurrent trace-based just-in-time compiler for single-threaded Javascript, Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, PESPMA'09, June 2009, in conjunction with ISCA 09.Google Scholar
- Christian Häubl and Hanspeter Mössenböck, Trace-based compilation for the Java HotSpot virtual machine, Proceedings of the International Conference on Principles and Practice of Programming in Java (Kongens Lyngby, Denmark), PPPJ'11, August 2011. Google Scholar
Digital Library
- H. Inoue, H. Hayashizaki, Peng Wu, and T. Nakatani, A trace-based Java JIT compiler retrofitted from a method-based compiler, Code Generation and Optimization, 2011 9th Annual IEEE/ACM International Symposium on, CGO'11, april 2011, pp. 246--256. Google Scholar
Digital Library
- Alexander Klaiber, The Technology Behind Crusoe Processors, Tech. report, Transmeta Corporation, January 2000.Google Scholar
- Chandra Krintz, David Grove, Derek Lieber, Vivek Sarkar, and Brad Calder, Reducing the overhead of dynamic compilation, Software: Practice And Experience 31 (2000), 200--1.Google Scholar
- P.A. Kulkarni and J. Fuller, JIT compilation policy on single-core and multi-core machines, Interaction between Compilers and Computer Architectures, 2011 15th Workshop on, INTERACT'11, feb. 2011, pp. 54--62. Google Scholar
Digital Library
- Prasad Kulkarni, Matthew Arnold, and Michael Hind, Dynamic compilation: the benefits of early investing, Proceedings of the 3rd international conference on Virtual execution environments (New York, NY, USA), VEE '07, ACM, 2007, pp. 94--104. Google Scholar
Digital Library
- C. May, Mimic: a fast System/370 simulator, Papers of the Symposium on Interpreters and interpretive techniques (New York, NY, USA), SIGPLAN '87, ACM, 1987, pp. 1--13. Google Scholar
Digital Library
- M. Mehrara and S. Mahlke, Dynamically accelerating client-side web applications through decoupled execution, Code Generation and Optimization, 2011 9th Annual IEEE/ACM International Symposium on, CGO'09, april 2011, pp. 74--84. Google Scholar
Digital Library
- Christopher Mills, Stanley C. Ahalt, and Jim Fowler, Compiled instruction set simulation, 1991.Google Scholar
- Achim Nohl, Gunnar Braun, Oliver Schliebusch, Rainer Leupers, Heinrich Meyr, and Andreas Hoffmann, A universal technique for fast and flexible instruction-set architecture simulation, Proceedings of the 39th annual Design Automation Conference (New York, NY, USA), DAC '02, ACM, 2002, pp. 22--27. Google Scholar
Digital Library
- Michael Paleczny, Christopher Vick, and Cliff Click, The Java Hotspot#8482; server compiler, USENIX Java Virtual Machine Research and Technology Symposium, USENIX-JVM'01, 2001, pp. 1--12. Google Scholar
Digital Library
- Michael Plezbert and Ron K. Cytron, Does "just in time" = "better late than never"?, In Proceedings of ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL'97, ACM Press, 1997, pp. 120--131. Google Scholar
Digital Library
- Wei Qin, Joseph D'Errico, and Xinping Zhu, A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation, Proceedings of the 4th international conference on Hardware/software codesign and system synthesis (New York, NY, USA), CODES+ISSS '06, ACM, 2006, pp. 193--198. Google Scholar
Digital Library
- Mehrdad Reshadi, Prabhat Mishra, and Nikil Dutt, Instruction set compiled simulation: a technique for fast and flexible instruction set simulation, Proceedings of the 40th annual Design Automation Conference (New York, NY, USA), DAC '03, ACM, 2003, pp. 758--763. Google Scholar
Digital Library
- Graham Sanderson, High performance: Writing a Sony PlayStation emulator using Java#8482; technology, 2006.Google Scholar
- Toshio Suganuma, Toshiaki Yasue, and Toshio Nakatani, A region-based compilation technique for dynamic compilers, ACM Trans. Program. Lang. Syst. 28 (2006), 134--174. Google Scholar
Digital Library
- P. Unnikrishnan, M. Kandemir, and F. Li, Reducing dynamic compilation overhead by overlapping compilation and execution, Proceedings of the 2006 Asia and South Pacific Design Automation Conference (Piscataway, NJ, USA), ASP-DAC '06, IEEE Press, 2006, pp. 929--934. Google Scholar
Digital Library
- John Whaley, Partial method compilation using dynamic profile information, Proceedings of the 16th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications (New York, NY, USA), OOPSLA '01, ACM, 2001, pp. 166--179. Google Scholar
Digital Library
- Christian Wimmer, Marcelo S. Cintra, Michael Bebenita, Mason Chang, Andreas Gal, and Michael Franz, Phase detection using trace compilation, Proceedings of the 7th International Conference on Principles and Practice of Programming in Java (New York, NY, USA), PPPJ '09, ACM, 2009, pp. 172--181. Google Scholar
Digital Library
- Emmett Witchel and Mendel Rosenblum, Embra: Fast and flexible machine simulation, Measurement and Modeling of Computer Systems, SIGMETRICS'96, 1996, pp. 68--79. Google Scholar
Digital Library
- Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, Jaswinder Pal Singh, and Anoop Gupta, The SPLASH-2 programs: characterization and methodological considerations, Proceedings of the 22nd annual international symposium on Computer architecture (New York, NY, USA), ISCA '95, ACM, 1995, pp. 24--36. Google Scholar
Digital Library
- Mathew Zaleski, Angela Demke Brown, and Kevin Stoodley, Yeti: a gradually extensible trace interpreter, Proceedings of the 3rd international conference on Virtual execution environments (New York, NY, USA), VEE '07, ACM, 2007, pp. 83--93. Google Scholar
Digital Library
Index Terms
Efficiently parallelizing instruction set simulation of embedded multi-core processors using region-based just-in-time dynamic binary translation
Recommendations
Efficiently parallelizing instruction set simulation of embedded multi-core processors using region-based just-in-time dynamic binary translation
LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded SystemsEmbedded systems, as typified by modern mobile phones, are already seeing a drive toward using multi-core processors. The number of cores will likely increase rapidly in the future. Engineers and researchers need to be able to simulate systems, as they ...
Efficient and Retargetable Dynamic Binary Translation on Multicores
Dynamic binary translation (DBT) is a core technology to many important applications such as system virtualization, dynamic binary instrumentation, and security. However, there are several factors that often impede its performance: 1) emulation overhead ...
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation
Single instruction multiple data (SIMD) has been adopted for decades because of its superior performance and power efficiency. The SIMD capability (i.e., width, number of registers, and advanced instructions) has diverged rapidly on different SIMD ...






Comments