skip to main content
article

Low overhead dynamic binary translation on ARM

Published:14 June 2017Publication History
Skip Abstract Section

Abstract

The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity.

We present MAMBO-X64, a dynamic binary translator for Linux which executes 32-bit ARM binaries using only the AArch64 instruction set. We have evaluated the performance of MAMBO-X64 on three existing ARMv8 processors which support both AArch32 and AArch64 instruction sets. The performance was measured by comparing the running time of 32-bit benchmarks running under MAMBO-X64 with the same benchmark running natively. On SPEC CPU2006, we achieve a geometric mean overhead of less than 7.5% on in-order Cortex-A53 processors and a performance improvement of 1% on out-of-order X-Gene 1 processors.

MAMBO-X64 achieves such low overhead by novel optimizations to map AArch32 floating-point registers to AArch64 registers dynamically, handle overflowing address calculations efficiently, generate traces that harness hardware return address prediction, and handle operating system signals accurately.

References

  1. Apple. Apple — Rosetta, 2006.Google ScholarGoogle Scholar
  2. Cortex-A57 Software Optimization Guide. ARM, 2016.Google ScholarGoogle Scholar
  3. V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1–12. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems. In Proceedings of the 36th Annual International Symposium on Microarchitecture, pages 191–204. ACM/IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. D. Boggs, G. Brown, N. Tuck, and K. S. Venkatraman. Denver: Nvidia’s first 64-bit ARM processor. IEEE Micro, 35(2):46–55, 2015.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Bruening, V. Kiriansky, T. Garnett, and S. Banerji. Threadshared software code caches. In Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), pages 28–38. IEEE Computer Society, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. D. L. Bruening. Efficient, transparent, and comprehensive runtime code manipulation. PhD thesis, Massachusetts Institute of Technology, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. B. Yadavalli, and J. Yates. FX! 32: A profile-directed binary translator. IEEE Micro, (2):56–64, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. A. d’Antras, C. Gorgovan, J. D. Garside, and M. Luján. Optimizing indirect branches in dynamic binary translators. ACM Transactions on Architecture and Code Optimization, 13(1):7, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. d’Antras, C. Gorgovan, J. Garside, J. Goodacre, and M. Luján. HyperMAMBO-X64: Using virtualization to support highperformance transparent binary translation. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE 2017, pages 228–241. ACM, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. J. C. Dehnert, B. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The Transmeta code morphing software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), pages 15–24. IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Duesterwald and V. Bala. Software profiling for hot path prediction: Less is more. In ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 202– 211. ACM Press, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. R. Grisenthwaite. ARMv8 Technology Preview, 2011.Google ScholarGoogle Scholar
  15. K. M. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proceedings of the 8th International Symposium on Memory Management, ISMM 2009, pages 20–29. ACM, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Hiser, D. W. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating indirect branch handling mechanisms in software dynamic translation systems. In Fifth International Symposium on Code Generation and Optimization (CGO 2007), pages 61–73. IEEE Computer Society, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. R. J. Hookway and M. A. Herdeg. DIGITAL fx!32: Combining emulation and binary translation. Digital Technical Journal, 9(1), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Kim and J. E. Smith. Hardware support for control transfers in code caches. In Proceedings of the 36th Annual International Symposium on Microarchitecture, pages 253–264. ACM/IEEE Computer Society, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. C. Luk, R. S. Cohn, R. Muth, H. Patil, A. Klauser, P. G. Lowney, S. Wallace, V. J. Reddi, and K. M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 190–200. ACM, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, pages 89–100. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. M. Payer and T. R. Gross. Generating low-overhead dynamic binary translators. In Proceedings of of SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference. ACM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. M. Payer, E. Kravina, and T. R. Gross. Lightweight memory tracing. In 2013 USENIX Annual Technical Conference, pages 115–126. USENIX Association, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. S. Sridhar, J. S. Shapiro, and P. P. Bungale. Hdtrans: a lowoverhead dynamic translator. SIGARCH Computer Architecture News, 35(1):135–140, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Standard Performance Evaluation Corporation. SPEC CPU2006. http://www.spec.org/cpu2006/.Google ScholarGoogle Scholar
  25. C. Wang, S. Hu, H. Kim, S. R. Nair, M. B. Jr., Z. Ying, and Y. Wu. StarDBT: An efficient multi-platform dynamic binary translation system. In Advances in Computer Systems Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Proceedings, volume 4697 of Lecture Notes in Computer Science, pages 4–15. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. C. Zheng and C. L. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. IEEE Computer, 33(3):47–52, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Low overhead dynamic binary translation on ARM

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 52, Issue 6
      PLDI '17
      June 2017
      708 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/3140587
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2017
        708 pages
        ISBN:9781450349888
        DOI:10.1145/3062341

      Copyright © 2017 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 June 2017

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!