skip to main content
research-article

Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

Dynamic Binary Translation (DBT) is the key technology behind cross-platform virtualization and allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Under the hood, DBT is typically implemented using Just-In-Time (JIT) compilation of frequently executed program regions, also called traces. The main challenge is translating frequently executed program regions as fast as possible into highly efficient native code. As time for JIT compilation adds to the overall execution time, the JIT compiler is often decoupled and operates in a separate thread independent from the main simulation loop to reduce the overhead of JIT compilation. In this paper we present two innovative contributions. The first contribution is a generalized trace compilation approach that considers all frequently executed paths in a program for JIT compilation, as opposed to previous approaches where trace compilation is restricted to paths through loops. The second contribution reduces JIT compilation cost by compiling several hot traces in a concurrent task farm. Altogether we combine generalized light-weight tracing, large translation units, parallel JIT compilation and dynamic work scheduling to ensure timely and efficient processing of hot traces. We have evaluated our industry-strength, LLVM-based parallel DBT implementing the ARCompact ISA against three benchmark suites (EEMBC, BioPerf and SPEC CPU2006) and demonstrate speedups of up to 2.08 on a standard quad-core Intel Xeon machine. Across short- and long-running benchmarks our scheme is robust and never results in a slowdown. In fact, using four processors total execution time can be reduced by on average 11.5% over state-of-the-art decoupled, parallel (or asynchronous) JIT compilation.

References

  1. J. Aycock. A brief history of just-in-time. ACM Comput. Surv., 35: 97--113, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. D. Bader, Y. Li, T. Li, and V. Sachdeva. Bioperf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications. In Proceedings of the IEEE International Workload Characterization Symposium, IISWC'05, pages 163--173, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  3. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 1--12, New York, NY, USA, 2000. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Bebenita, M. Chang, G. Wagner, A. Gal, C. Wimmer, and M. Franz. Trace-based compilation in execution environments without interpreters. In Proceedings of the 8th International Conference on the Principles and Practice of Programming in Java, PPPJ '10, pages 59--68, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX Annual Technical Conference, ATEC '05, page 41, Berkeley, CA, USA, 2005. USENIX Association. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. I. Böhm, B. Franke, and N. Topham. Cycle-accurate performance modelling in an ultra-fast just-in-time dynamic binary translation instruction set simulator. In Proceedings of the International Conference on Embedded Computer Systems, IC-SAMOS'10, pages 1--10, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  7. F. Brandner, A. Fellnhofer, A. Krall, and D. Riegler. Fast and accurate simulation using the llvm compiler framework. In Proceedings of the 1st Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools, RAPIDO'09, pages 1--6, 2009.Google ScholarGoogle Scholar
  8. S. Campanoni, G. Agosta, and S. C. Reghizzi. A parallel dynamic compiler for CIL bytecode. SIGPLAN Not., 43: 11--20, April 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. Bharadwaj Yadavalli, and J. Yates. FX!32: a profile-directed binary translator. Micro, IEEE, 18 (2): 56--64, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Cmelik and D. Keppel. Shade: a fast instruction-set simulator for execution profiling. In Proceedings of the 1994 ACM SIGMETRICS Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '94, pages 128--137, New York, NY, USA, 1994. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Cohen and E. Rohou. Processor virtualization and split compilation for heterogeneous multicore embedded systems. In Proceedings of the 47th Design Automation Conference, DAC '10, pages 102--107, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Farfeleder, A. Krall, and N. Horspool. Ultra fast cycle-accurate compiled emulation of inorder pipelined architectures. J. Syst. Archit., 53: 501--510, August 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Gal, C. W. Probst, and M. Franz. HotpathVM: an effective JIT compiler for resource-constrained devices. In Proceedings of the 2nd International Conference on Virtual Execution Environments, VEE '06, pages 144--153, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Gal, M. Bebenita, M. Chang, and M. Franz. Making the compilation "pipeline" explicit: Dynamic compilation using trace tree serialization. Technical Report 07-12, University of California, Irvine, 2007.Google ScholarGoogle Scholar
  15. A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In Proceedings of the 2009 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '09, pages 465--478, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Ha, M. Haghighat, S. Cong, and K. McKinley. A concurrent trace-based just-in-time compiler for single-threaded JavaScript. In Proceedings of the Workshop on Parallel Execution of Sequential Programs on Multicore Architectures, PESPMA'09, 2009.Google ScholarGoogle Scholar
  17. M. Hauswirth, P. F. Sweeney, A. Diwan, and M. Hind. Vertical profiling: understanding the behavior of object-priented applications. In Proceedings of the 19th annual ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '04, pages 251--269, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. L. Henning. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News, 34: 1--17, September 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Jones and N. Topham. High speed CPU simulation using LTU dynamic binary translation. In High Performance Embedded Architectures and Compilers, volume 5409 of Lecture Notes in Computer Science, pages 50--64. Springer Berlin Heidelberg, 2009. Google ScholarGoogle Scholar
  20. C. J. Krintz, D. Grove, V. Sarkar, and B. Calder. Reducing the overhead of dynamic compilation. Software: Practice and Experience, 31 (8): 717--738, July 2001.Google ScholarGoogle ScholarCross RefCross Ref
  21. R. Lantz. Fast functional simulation with parallel Embra. In Proceedings of the 4th Annual Workshop on Modeling, Benchmarking and Simulation, MOBS'08, 2008.Google ScholarGoogle Scholar
  22. C. Lattner and V. Adve. LLVM: A compilation framework for lifelong program analysis & transformation. In Proceedings of the International Symposium on Code Generation and Optimization, CGO '04, page 75, Washington, DC, USA, 2004. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. J. Reddi, and K. Hazelwood. PIN: building customized program analysis tools with dynamic instrumentation. In Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '05, pages 190--200, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. May. MIMIC: a fast System/370 simulator. In Papers of the Symposium on Interpreters and Interpretive Techniques, SIGPLAN '87, pages 1--13, New York, NY, USA, 1987. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Miller, H. Kasture, G. Kurian, C. Gruenwald, N. Beckmann, C. Celio, J. Eastep, and A. Agarwal. Graphite: A distributed parallel simulator for multicores. In Proceedings of the IEEE 16th International Symposium on High Performance Computer Architecture, HPCA'10, pages 1--12, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  26. Mozilla Foundation. Tamarin project, 17 November 2010. URL http://www.mozilla.org/projects/tamarin/.Google ScholarGoogle Scholar
  27. Mozilla Foundation. Tracemonkey, 17 November 2010. URL https://wiki.mozilla.org/JavaScript:TraceMonkey.Google ScholarGoogle Scholar
  28. A. Nohl, G. Braun, O. Schliebusch, R. Leupers, H. Meyr, and A. Hoffmann. A universal technique for fast and flexible instruction-set architecture simulation. In Proceedings of the 39th annual Design Automation Conference, DAC '02, pages 22--27, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Oracle Corporation. HotSpot VM, 17 November 2010. URL http://java.sun.com/performance/reference/whitepapers/.Google ScholarGoogle Scholar
  30. W. Qin, J. D'Errico, and X. Zhu. A multiprocessing approach to accelerate retargetable and portable dynamic-compiled instruction-set simulation. In Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis, CODES Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. M. Reshadi, P. Mishra, and N. Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In Proceedings of the 40th annual Design Automation Conference, DAC '03, pages 758--763, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Reshadi, P. Mishra, and N. Dutt. Instruction set compiled simulation: a technique for fast and flexible instruction set simulation. In Proceedings of the 40th annual Design Automation Conference, DAC '03, pages 758--763, New York, NY, USA, 2003. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. E. Stahl and M. Anand. A comparison of PowerVM and x86-based virtualization performance. Technical Report WP101574, IBM Techdocs White Papers, 2010.Google ScholarGoogle Scholar
  34. T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for dynamic compilers. ACM Trans. Program. Lang. Syst., 28: 134--174, January 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. V. Tan. Asynchronous just-in-time compilation. United States Patent Application, WO/2007/055883, 2007.Google ScholarGoogle Scholar
  36. The LLVM Compiler Infrastructure. Users of LLVM JIT compilation engine, 17 November 2010. URL http://llvm.org/Users.html.Google ScholarGoogle Scholar
  37. The WebKit Open Source Project. SquirrelFish, 17 November 2010. URL http://trac.webkit.org/wiki/SquirrelFish.Google ScholarGoogle Scholar
  38. N. Topham and D. Jones. High speed CPU simulation using JIT binary translation. In Proceedings of the Annual Workshop on Modelling, Benchmarking and Simulation, MOBS '07, 2007.Google ScholarGoogle Scholar
  39. P. Unnikrishnan, M. Kandemir, and F. Li. Reducing dynamic compilation overhead by overlapping compilation and execution. In Proceedings of the 2006 Asia and South Pacific Design Automation Conference, ASP-DAC '06, pages 929--934, Piscataway, NJ, USA, 2006. IEEE Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. K. Wang, Y. Zhang, H. Wang, and X. Shen. Parallelization of IBM Mambo system simulator in functional modes. SIGOPS Oper. Syst. Rev., 42: 71--76, January 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. J. Whaley. Partial method compilation using dynamic profile information. In Proceedings of the 16th ACM SIGPLAN Conference on Object-oriented Programming, Systems, Languages, and Applications, OOPSLA '01, pages 166--179, New York, NY, USA, 2001. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. B. Yee, D. Sehr, G. Dardyk, J. Chen, R. Muth, T. Ormandy, S. Okasaka, N. Narula, and N. Fullagar. Native Client: A sandbox for portable, untrusted x86 native code. In 30th IEEE Symposium on Security and Privacy, pages 79--93, May 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. M. Zaleski, A. D. Brown, and K. Stoodley. YETI: a gradually extensible trace interpreter. In Proceedings of the 3rd International Conference on Virtual Execution Environments, VEE '07, pages 83--93, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. C. Zheng and C. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. Computer, 33 (3): 47--52, Mar. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. J. Zhu and D. D. Gajski. A retargetable, ultra-fast instruction set simulator. In Proceedings of the Conference on Design, Automation and Test in Europe, DATE '99, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!