skip to main content
research-article

Efficient code generation in a region-based dynamic binary translator

Published:12 June 2014Publication History
Skip Abstract Section

Abstract

Region-based JIT compilation operates on translation units comprising multiple basic blocks and, possibly cyclic or conditional, control flow between these. It promises to reconcile aggressive code optimisation and low compilation latency in performance-critical dynamic binary translators. Whilst various region selection schemes and isolated code optimisation techniques have been investigated it remains unclear how to best exploit such regions for efficient code generation. Complex interactions with indirect branch tables and translation caches can have adverse effects on performance if not considered carefully. In this paper we present a complete code generation strategy for a region-based dynamic binary translator, which exploits branch type and control flow profiling information to improve code quality for the common case. We demonstrate that using our code generation strategy a competitive region-based dynamic compiler can be built on top of the LLVM JIT compilation framework. For the ARM-V5T target ISA and SPEC CPU 2006 benchmarks we achieve execution rates of, on average, 867 MIPS and up to 1323 MIPS on a standard X86 host machine, outperforming state-of-the-art QEMU-ARM by delivering a speedup of 264%.

References

  1. J. Aycock. A brief history of just-in-time. ACM Comput. Surv., 35 (2): 97--113, June 2003. ISSN 0360-0300. 10.1145/857076.857077. URL http://doi.acm.org/10.1145/857076.857077. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 1--12, New York, NY, USA, 2000. ACM. ISBN 1--58113--199--2. 10.1145/349299.349303. URL http://doi.acm.org/10.1145/349299.349303. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. F. Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '05, pages 41--41, Berkeley, CA, USA, 2005. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1247360.1247401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. et al.(2011)Böhm, Edler von Koch, Kyle, Franke, and Topham}Bohm:2011:GJT:1993498.1993508I. Böhm, T. J. Edler von Koch, S. C. Kyle, B. Franke, and N. Topham. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 74--85, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0663--8. 10.1145/1993498.1993508. URL http://doi.acm.org/10.1145/1993498.1993508. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Bruening and E. Duesterwald. Exploring optimal compilation unit shapes for an embedded just-in-time compiler. In In Proceedings of the 2000 ACM Workshop on Feedback-Directed and Dynamic Optimization FDDO-3, pages 13--20, 2000.Google ScholarGoogle Scholar
  6. D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In phProceedings of the international symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO '03, pages 265--275, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--1913-X. URL http://dl.acm.org/citation.cfm?id=776261.776290. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. Cifuentes and M. V. Emmerik. Recovery of jump table case statements from binary code. In phProceedings of the 7th International Workshop on Program Comprehension, IWPC '99, pages 192--, Washington, DC, USA, 1999. IEEE Computer Society. ISBN 0--7695-0179--6. URL http://dl.acm.org/citation.cfm?id=520033.858247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. C. Cifuentes and M. V. Emmerik. UQBT: Adaptive binary translation at low cost. phIEEE Computer, 33 (3): 60--66, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. B. Dhanasekaran and K. Hazelwood. Improving indirect branch translation in dynamic binary translators. In phProceedings of the ASPLOS Workshop on Runtime Environments, Systems, Layering, and Virtualized Environments, RESoLVE'11, pages 11--18, 2011.Google ScholarGoogle Scholar
  10. B. Guo, Y. Wu, C. Wang, M. J. Bridges, G. Ottoni, N. Vachharajani, J. Chang, and D. I. August. Selective runtime memory disambiguation in a dynamic binary translator. In phProceedings of the 15th International Conference on Compiler Construction, CC'06, pages 65--79, Berlin, Heidelberg, 2006. Springer-Verlag. ISBN 3--540--33050-X, 978--3--540--33050--9. 10.1007/11688839_6. URL http://dx.doi.org/10.1007/11688839_6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. D. Hiniker, K. Hazelwood, and M. D. Smith. Improving region selection in dynamic optimization systems. In phProceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 38, pages 141--154, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0--7695--2440-0. 10.1109/MICRO.2005.22. URL http://dx.doi.org/10.1109/MICRO.2005.22. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Hiser, Kumar, Zhao, Zhou, Childers, Davidson, and Soffa}Hiser:2006:TTD:1898699.1898797J. D. Hiser, N. Kumar, M. Zhao, S. Zhou, B. R. Childers, J. W. Davidson, and M. L. Soffa. Techniques and tools for dynamic optimization. In phProceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS'06, pages 279--279, Washington, DC, USA, 2006\natexlaba. IEEE Computer Society. ISBN 1--4244-0054--6. URL http://dl.acm.org/citation.cfm?id=1898699.1898797. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hiser, Williams, Filipi, Davidson, and Childers}Hiser:2006:EFC:1134760.1134778J. D. Hiser, D. Williams, A. Filipi, J. W. Davidson, and B. R. Childers. Evaluating fragment construction policies for SDT systems. In phProceedings of the 2nd International Conference on Virtual Execution Environments, VEE '06, pages 122--132, New York, NY, USA, 2006\natexlabb. ACM. ISBN 1--59593--332--8. 10.1145/1134760.1134778. URL http://doi.acm.org/10.1145/1134760.1134778. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. D. Hiser, D. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating indirect branch handling mechanisms in software dynamic translation systems. In phProceedings of the International Symposium on Code Generation and Optimization, CGO '07, pages 61--73, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0--7695--2764--7. 10.1109/CGO.2007.10. URL http://dx.doi.org/10.1109/CGO.2007.10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D.-Y. Hong, C.-C. Hsu, P.-C. Yew, J.-J. Wu, W.-C. Hsu, P. Liu, C.-M. Wang, and Y.-C. Chung. HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores. In phProceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 104--113, New York, NY, USA, 2012. ACM. ISBN 978--1--4503--1206--6. 10.1145/2259016.2259030. URL http://doi.acm.org/10.1145/2259016.2259030. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C.-C. Hsu, P. Liu, C.-M. Wang, J.-J. Wu, D.-Y. Hong, P.-C. Yew, and W.-C. Hsu. LnQ: Building high performance dynamic binary translators with existing compiler backends. In Proceedings of the 2011 International Conference on Parallel Processing, ICPP '11, pages 226--234, Washington, DC, USA, 2011. IEEE Computer Society. ISBN 978-0--7695--4510--3. 10.1109/ICPP.2011.57. URL http://dx.doi.org/10.1109/ICPP.2011.57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C.-C. Hsu, P. Liu, J.-J. Wu, P.-C. Yew, D.-Y. Hong, W.-C. Hsu, and C.-M. Wang. Improving dynamic binary optimization through early-exit guided code region formation. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 23--32, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--1266-0. 10.1145/2451512.2451519. URL http://doi.acm.org/10.1145/2451512.2451519. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. N. Jia, C. Yang, J. Wang, D. Tong, and K. Wang. SPIRE: improving dynamic binary translation through SPC-indexed indirect branch redirecting. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 1--12, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--1266-0. 10.1145/2451512.2451516. URL http://doi.acm.org/10.1145/2451512.2451516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Jones and N. Topham. High speed CPU simulation using LTU dynamic binary translation. In Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC '09, pages 50--64, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978--3--540--92989--5. 10.1007/978--3--540--92990--1_6. URL http://dx.doi.org/10.1007/978--3--540--92990--1_6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. R. Joshi, M. D. Bond, and C. Zilles. Targeted path profiling: Lower overhead path profiling for staged dynamic optimization systems. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO '04, pages 239--, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0--7695--2102--9. URL http://dl.acm.org/citation.cfm?id=977395.977660. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Kaufmann and R. G. Spallek. Superblock compilation and other optimization techniques for a Java-based DBT machine emulator. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 33--40, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--1266-0. 10.1145/2451512.2451521. URL http://doi.acm.org/10.1145/2451512.2451521. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Kinder, F. Zuleger, and H. Veith. An abstract interpretation-based framework for control flow reconstruction from binaries. In Proceedings of the 10th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI '09, pages 214--228, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978--3--540--93899--6. 10.1007/978--3--540--93900--9_19. URL http://dx.doi.org/10.1007/978--3--540--93900--9_19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. Koju, X. Tong, A. I. Sheikh, M. Ohara, and T. Nakatani. Optimizing indirect branches in a system-level dynamic binary translator. In Proceedings of the 5th Annual International Systems and Storage Conference, SYSTOR '12, pages 5:1--5:12, New York, NY, USA, 2012. ACM. ISBN 978--1--4503--1448-0. 10.1145/2367589.2367599. URL http://doi.acm.org/10.1145/2367589.2367599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. W. Moore, J. A. Baiocchi, B. R. Childers, J. W. Davidson, and J. D. Hiser. Addressing the challenges of DBT for the ARM architecture. In Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES '09, pages 147--156, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--356--3. 10.1145/1542452.1542472. URL http://doi.acm.org/10.1145/1542452.1542472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. E. Stahl and M. Anand. A comparison of PowerVM and x86-based virtualization performance. Technical Report WP101574, IBM Techdocs White Papers, 2010.Google ScholarGoogle Scholar
  26. T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for a Java just-in-time compiler. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI '03, pages 312--323, New York, NY, USA, 2003. ACM. ISBN 1--58113--662--5. 10.1145/781131.781166. URL http://doi.acm.org/10.1145/781131.781166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for dynamic compilers. ACM Trans. Program. Lang. Syst., 28 (1): 134--174, Jan. 2006. ISSN 0164-0925. 10.1145/1111596.1111600. URL http://doi.acm.org/10.1145/1111596.1111600. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. D. Ung and C. Cifuentes. Optimising hot paths in a dynamic binary translator. SIGARCH Comput. Archit. News, 29 (1): 55--65, Mar. 2001. ISSN 0163--5964. 10.1145/373574.373590. URL http://doi.acm.org/10.1145/373574.373590. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. H. Wagstaff, M. Gould, B. Franke, and N. Topham. Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description. In Proceedings of the Annual Design Automation Conference, DAC '13, pages 21:1--21:6, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--2071--9. 10.1145/2463209.2488760. URL http://doi.acm.org/10.1145/2463209.2488760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J. Whaley. Partial method compilation using dynamic profile information. In phProceedings of the 16th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA '01, pages 166--179, New York, NY, USA, 2001. ACM. ISBN 1--58113--335--9. 10.1145/504282.504295. URL http://doi.acm.org/10.1145/504282.504295. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. L. Yin, J. Haitao, S. Guangzhong, J. Guojie, and C. Guoliang. Improve indirect branch prediction with private cache in dynamic binary translation. In International Conference on High Performance Computing and Communication and International Conference on Embedded Software and Systems (HPCC-ICESS), pages 280--286, 2012. 10.1109/HPCC.2012.45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. C. Zheng and C. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. Computer, 33 (3): 47 --52, Mar. 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient code generation in a region-based dynamic binary translator

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!