Abstract

Dynamic Binary Translation (DBT) is the key technology behind cross-platform virtualization and allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Under the hood, DBT is typically implemented using Just-In-Time (JIT) compilation of frequently executed program regions, also called traces. The main challenge is translating frequently executed program regions as fast as possible into highly efficient native code. As time for JIT compilation adds to the overall execution time, the JIT compiler is often decoupled and operates in a separate thread independent from the main simulation loop to reduce the overhead of JIT compilation. In this paper we present two innovative contributions. The first contribution is a generalized trace compilation approach that considers all frequently executed paths in a program for JIT compilation, as opposed to previous approaches where trace compilation is restricted to paths through loops. The second contribution reduces JIT compilation cost by compiling several hot traces in a concurrent task farm. Altogether we combine generalized light-weight tracing, large translation units, parallel JIT compilation and dynamic work scheduling to ensure timely and efficient processing of hot traces. We have evaluated our industry-strength, LLVM-based parallel DBT implementing the ARCompact ISA against three benchmark suites (EEMBC, BioPerf and SPEC CPU2006) and demonstrate speedups of up to 2.08 on a standard quad-core Intel Xeon machine. Across short- and long-running benchmarks our scheme is robust and never results in a slowdown. In fact, using four processors total execution time can be reduced by on average 11.5% over state-of-the-art decoupled, parallel (or asynchronous) JIT compilation.
- J. Aycock. A brief history of just-in-time. ACM Comput. Surv., 35: 97--113, June 2003. Google Scholar
Digital Library
- D. Bader, Y. Li, T. Li, and V. Sachdeva. Bioperf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. In Proceedings of the IEEE International Workload Characterization Symposium, IISWC'05, pages 163--173, 2005.Google Scholar
Cross Ref
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 1--12, New York, NY, USA, 2000. ACM. Google Scholar
Digital Library
- M. Bebenita, M. Chang, G. Wagner, A. Gal, C. Wimmer, and M. Franz. Trace-based compilation in execution environments without interpreters. In Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java, PPPJ '10, pages 59--68, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- F. Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference, ATEC '05, page 41, Berkeley, CA, USA, 2005. USENIX Association. Google Scholar
Digital Library
- I. Böhm, B. Franke, and N. Topham. Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator. In Proceedings of the International Conference on Embedded Computer Systems, IC-SAMOS'10, pages 1--10, 2010.Google Scholar
Cross Ref
- F. Brandner, A. Fellnhofer, A. Krall, and D. Riegler. Fast and accurate simulation using the llvm compiler framework. In Proceedings of the 1st Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO'09, pages 1--6, 2009.Google Scholar
- S. Campanoni, G. Agosta, and S. C. Reghizzi. A parallel dynamic compiler for CIL bytecode. SIGPLAN Not., 43: 11--20, April 2008. Google Scholar
Digital Library
- A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. Bharadwaj Yadavalli, and J. Yates. FX!32: a profile-directed binary translator. Micro, IEEE, 18 (2): 56--64, 1998. Google Scholar
Digital Library
- B. Cmelik and D. Keppel. Shade: a fast instruction-set simulator for execution profiling. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '94, pages 128--137, New York, NY, USA, 1994. ACM. Google Scholar
Digital Library
- A. Cohen and E. Rohou. Processor virtualization and split compilation for heterogeneous multicore embedded systems. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 102--107, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- S. Farfeleder, A. Krall, and N. Horspool. Ultra fast cycle-accurate compiled emulation of inorder pipelined architectures. J. Syst. Archit., 53: 501--510, August 2007. Google Scholar
Digital Library
- A. Gal, C. W. Probst, and M. Franz. HotpathVM: an effective JIT compiler for resource-constrained devices. In Proceedings of the 2nd International Conference on Virtual Execution Environments, VEE '06, pages 144--153, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- A. Gal, M. Bebenita, M. Chang, and M. Franz. Making the compilation "pipeline" explicit: Dynamic compilation using trace tree serialization. Technical Report 07-12, University of California, Irvine, 2007.Google Scholar
- A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '09, pages 465--478, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- J. Ha, M. Haghighat, S. Cong, and K. McKinley. A concurrent trace-based just-in-time compiler for single-threaded JavaScript. In Proceedings of the Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, PESPMA'09, 2009.Google Scholar
- M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: understanding the behavior of object-priented applications. In Proceedings of the 19th annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 251--269, New York, NY, USA, 2004. ACM. Google Scholar
Digital Library
- J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34: 1--17, September 2006. Google Scholar
Digital Library
- D. Jones and N. Topham. High speed CPU simulation using LTU dynamic binary translation. In High Performance Embedded Architectures and Compilers, volume 5409 of Lecture Notes in Computer Science, pages 50--64. Springer Berlin Heidelberg, 2009. Google Scholar
- C. J. Krintz, D. Grove, V. Sarkar, and B. Calder. Reducing the overhead of dynamic compilation. Software: Practice and Experience, 31 (8): 717--738, July 2001.Google Scholar
Cross Ref
- R. Lantz. Fast functional simulation with parallel Embra. In Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation, MOBS'08, 2008.Google Scholar
- C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '04, page 75, Washington, DC, USA, 2004. IEEE Computer Society. Google Scholar
Digital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM. Google Scholar
Digital Library
- C. May. MIMIC: a fast System/370 simulator. In Papers of the Symposium on Interpreters and Interpretive Techniques, SIGPLAN '87, pages 1--13, New York, NY, USA, 1987. ACM. Google Scholar
Digital Library
- J. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A distributed parallel simulator for multicores. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture, HPCA'10, pages 1--12, 2010.Google Scholar
Cross Ref
- Mozilla Foundation. Tamarin project, 17 November 2010. URL http://www.mozilla.org/projects/tamarin/.Google Scholar
- Mozilla Foundation. Tracemonkey, 17 November 2010. URL https://wiki.mozilla.org/JavaScript:TraceMonkey.Google Scholar
- A. Nohl, G. Braun, O. Schliebusch, R. Leupers, H. Meyr, and A. Hoffmann. A universal technique for fast and flexible instruction-set architecture simulation. In Proceedings of the 39th annual Design Automation Conference, DAC '02, pages 22--27, New York, NY, USA, 2002. ACM. Google Scholar
Digital Library
- Oracle Corporation. HotSpot VM, 17 November 2010. URL http://java.sun.com/performance/reference/whitepapers/.Google Scholar
- W. Qin, J. D'Errico, and X. Zhu. A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, CODES Google Scholar
Digital Library
- M. Reshadi, P. Mishra, and N. Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In Proceedings of the 40th annual Design Automation Conference, DAC '03, pages 758--763, New York, NY, USA, 2003. ACM. Google Scholar
Digital Library
- M. Reshadi, P. Mishra, and N. Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In Proceedings of the 40th annual Design Automation Conference, DAC '03, pages 758--763, New York, NY, USA, 2003. ACM. Google Scholar
Digital Library
- E. Stahl and M. Anand. A comparison of PowerVM and x86-based virtualization performance. Technical Report WP101574, IBM Techdocs White Papers, 2010.Google Scholar
- T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for dynamic compilers. ACM Trans. Program. Lang. Syst., 28: 134--174, January 2006. Google Scholar
Digital Library
- V. Tan. Asynchronous just-in-time compilation. United States Patent Application, WO/2007/055883, 2007.Google Scholar
- The LLVM Compiler Infrastructure. Users of LLVM JIT compilation engine, 17 November 2010. URL http://llvm.org/Users.html.Google Scholar
- The WebKit Open Source Project. SquirrelFish, 17 November 2010. URL http://trac.webkit.org/wiki/SquirrelFish.Google Scholar
- N. Topham and D. Jones. High speed CPU simulation using JIT binary translation. In Proceedings of the Annual Workshop on Modelling, Benchmarking and Simulation, MOBS '07, 2007.Google Scholar
- P. Unnikrishnan, M. Kandemir, and F. Li. Reducing dynamic compilation overhead by overlapping compilation and execution. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference, ASP-DAC '06, pages 929--934, Piscataway, NJ, USA, 2006. IEEE Press. Google Scholar
Digital Library
- K. Wang, Y. Zhang, H. Wang, and X. Shen. Parallelization of IBM Mambo system simulator in functional modes. SIGOPS Oper. Syst. Rev., 42: 71--76, January 2008. Google Scholar
Digital Library
- J. Whaley. Partial method compilation using dynamic profile information. In Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '01, pages 166--179, New York, NY, USA, 2001. ACM. Google Scholar
Digital Library
- B. Yee, D. Sehr, G. Dardyk, J. Chen, R. Muth, T. Ormandy, S. Okasaka, N. Narula, and N. Fullagar. Native Client: A sandbox for portable, untrusted x86 native code. In 30th IEEE Symposium on Security and Privacy, pages 79--93, May 2009. Google Scholar
Digital Library
- M. Zaleski, A. D. Brown, and K. Stoodley. YETI: a gradually extensible trace interpreter. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE '07, pages 83--93, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- C. Zheng and C. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. Computer, 33 (3): 47--52, Mar. 2000. Google Scholar
Digital Library
- J. Zhu and D. D. Gajski. A retargetable, ultra-fast instruction set simulator. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '99, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
Index Terms
Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator
Recommendations
Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationDynamic Binary Translation (DBT) is the key technology behind cross-platform virtualization and allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Under the hood, DBT is ...
Efficiently parallelizing instruction set simulation of embedded multi-core processors using region-based just-in-time dynamic binary translation
LCTES '12: Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded SystemsEmbedded systems, as typified by modern mobile phones, are already seeing a drive toward using multi-core processors. The number of cores will likely increase rapidly in the future. Engineers and researchers need to be able to simulate systems, as they ...
Trace-based compilation in execution environments without interpreters
PPPJ '10: Proceedings of the 8th International Conference on the Principles and Practice of Programming in JavaTrace-based compilation is a technique used in managed language runtimes to detect and compile frequently executed program paths. The goal is to reduce compilation time and improve code quality by only considering "hot" parts of methods for compilation. ...







Comments