Abstract
The ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations of ARMv8 processors support both AArch32 and AArch64, which comes at a cost in hardware complexity.
We present MAMBO-X64, a dynamic binary translator for Linux which executes 32-bit ARM binaries using only the AArch64 instruction set. We have evaluated the performance of MAMBO-X64 on three existing ARMv8 processors which support both AArch32 and AArch64 instruction sets. The performance was measured by comparing the running time of 32-bit benchmarks running under MAMBO-X64 with the same benchmark running natively. On SPEC CPU2006, we achieve a geometric mean overhead of less than 7.5% on in-order Cortex-A53 processors and a performance improvement of 1% on out-of-order X-Gene 1 processors.
MAMBO-X64 achieves such low overhead by novel optimizations to map AArch32 floating-point registers to AArch64 registers dynamically, handle overflowing address calculations efficiently, generate traces that harness hardware return address prediction, and handle operating system signals accurately.
- Apple. Apple — Rosetta, 2006.Google Scholar
- Cortex-A57 Software Optimization Guide. ARM, 2016.Google Scholar
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the 2000 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), pages 1–12. ACM, 2000. Google Scholar
Digital Library
- L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. IA-32 execution layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium-based systems. In Proceedings of the 36th Annual International Symposium on Microarchitecture, pages 191–204. ACM/IEEE Computer Society, 2003. Google Scholar
Digital Library
- C. Bienia. Benchmarking Modern Multiprocessors. PhD thesis, Princeton University, January 2011. Google Scholar
Digital Library
- D. Boggs, G. Brown, N. Tuck, and K. S. Venkatraman. Denver: Nvidia’s first 64-bit ARM processor. IEEE Micro, 35(2):46–55, 2015.Google Scholar
Digital Library
- D. Bruening, V. Kiriansky, T. Garnett, and S. Banerji. Threadshared software code caches. In Fourth IEEE/ACM International Symposium on Code Generation and Optimization (CGO 2006), pages 28–38. IEEE Computer Society, 2006. Google Scholar
Digital Library
- D. L. Bruening. Efficient, transparent, and comprehensive runtime code manipulation. PhD thesis, Massachusetts Institute of Technology, 2004. Google Scholar
Digital Library
- A. Chernoff, M. Herdeg, R. Hookway, C. Reeve, N. Rubin, T. Tye, S. B. Yadavalli, and J. Yates. FX! 32: A profile-directed binary translator. IEEE Micro, (2):56–64, 1998. Google Scholar
Digital Library
- A. d’Antras, C. Gorgovan, J. D. Garside, and M. Luján. Optimizing indirect branches in dynamic binary translators. ACM Transactions on Architecture and Code Optimization, 13(1):7, 2016. Google Scholar
Digital Library
- A. d’Antras, C. Gorgovan, J. Garside, J. Goodacre, and M. Luján. HyperMAMBO-X64: Using virtualization to support highperformance transparent binary translation. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE 2017, pages 228–241. ACM, 2017. Google Scholar
Digital Library
- J. C. Dehnert, B. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The Transmeta code morphing software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In 1st IEEE / ACM International Symposium on Code Generation and Optimization (CGO 2003), pages 15–24. IEEE Computer Society, 2003. Google Scholar
Digital Library
- E. Duesterwald and V. Bala. Software profiling for hot path prediction: Less is more. In ASPLOS-IX Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 202– 211. ACM Press, 2000. Google Scholar
Digital Library
- R. Grisenthwaite. ARMv8 Technology Preview, 2011.Google Scholar
- K. M. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In Proceedings of the 8th International Symposium on Memory Management, ISMM 2009, pages 20–29. ACM, 2009. Google Scholar
Digital Library
- J. Hiser, D. W. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating indirect branch handling mechanisms in software dynamic translation systems. In Fifth International Symposium on Code Generation and Optimization (CGO 2007), pages 61–73. IEEE Computer Society, 2007. Google Scholar
Digital Library
- R. J. Hookway and M. A. Herdeg. DIGITAL fx!32: Combining emulation and binary translation. Digital Technical Journal, 9(1), 1997. Google Scholar
Digital Library
- H. Kim and J. E. Smith. Hardware support for control transfers in code caches. In Proceedings of the 36th Annual International Symposium on Microarchitecture, pages 253–264. ACM/IEEE Computer Society, 2003. Google Scholar
Digital Library
- C. Luk, R. S. Cohn, R. Muth, H. Patil, A. Klauser, P. G. Lowney, S. Wallace, V. J. Reddi, and K. M. Hazelwood. Pin: building customized program analysis tools with dynamic instrumentation. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, pages 190–200. ACM, 2005. Google Scholar
Digital Library
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, pages 89–100. ACM, 2007. Google Scholar
Digital Library
- M. Payer and T. R. Gross. Generating low-overhead dynamic binary translators. In Proceedings of of SYSTOR 2010: The 3rd Annual Haifa Experimental Systems Conference. ACM, 2010. Google Scholar
Digital Library
- M. Payer, E. Kravina, and T. R. Gross. Lightweight memory tracing. In 2013 USENIX Annual Technical Conference, pages 115–126. USENIX Association, 2013. Google Scholar
Digital Library
- S. Sridhar, J. S. Shapiro, and P. P. Bungale. Hdtrans: a lowoverhead dynamic translator. SIGARCH Computer Architecture News, 35(1):135–140, 2007. Google Scholar
Digital Library
- Standard Performance Evaluation Corporation. SPEC CPU2006. http://www.spec.org/cpu2006/.Google Scholar
- C. Wang, S. Hu, H. Kim, S. R. Nair, M. B. Jr., Z. Ying, and Y. Wu. StarDBT: An efficient multi-platform dynamic binary translation system. In Advances in Computer Systems Architecture, 12th Asia-Pacific Conference, ACSAC 2007, Proceedings, volume 4697 of Lecture Notes in Computer Science, pages 4–15. Springer, 2007. Google Scholar
Digital Library
- C. Zheng and C. L. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. IEEE Computer, 33(3):47–52, 2000. Google Scholar
Digital Library
Index Terms
Low overhead dynamic binary translation on ARM
Recommendations
Low overhead dynamic binary translation on ARM
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationThe ARMv8 architecture introduced AArch64, a 64-bit execution mode with a new instruction set, while retaining binary compatibility with previous versions of the ARM architecture through AArch32, a 32-bit execution mode. Most hardware implementations ...
A Retargetable Static Binary Translator for the ARM Architecture
Machines designed with new but incompatible Instruction Set Architecture (ISA) may lack proper applications. Binary translation can address this incompatibility by migrating applications from one legacy ISA to a new one, although binary translation has ...
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation
Single instruction multiple data (SIMD) has been adopted for decades because of its superior performance and power efficiency. The SIMD capability (i.e., width, number of registers, and advanced instructions) has diverged rapidly on different SIMD ...






Comments