Abstract
System-level Dynamic Binary Translation (DBT) provides the capability to boot an Operating System (OS) and execute programs compiled for an Instruction Set Architecture (ISA) different from that of the host machine. Due to their performance-critical nature, system-level DBT frameworks are typically hand-coded and heavily optimized, both for their guest and host architectures. While this results in good performance of the DBT system, engineering costs for supporting a new architecture or extending an existing architecture are high. In this article, we develop a novel, retargetable DBT hypervisor, which includes guest-specific modules generated from high-level guest machine specifications. Our system simplifies retargeting of the DBT, but it also delivers performance levels in excess of existing manually created DBT solutions. We achieve this by combining offline and online optimizations and exploiting the freedom of a Just-in-time (JIT) compiler operating in a bare-metal environment provided by a Virtual Machine (VM) hypervisor. We evaluate our DBT using both targeted micro-benchmarks as well as standard application benchmarks, and we demonstrate its ability to outperform the de facto standard QEMU DBT system. Our system delivers an average speedup of 2.21× over QEMU across SPEC CPU2006 integer benchmarks running in a full-system Linux OS environment, compiled for the 64-bit ARMv8-A ISA and hosted on an x86-64 platform. For floating-point applications the speedup is even higher, reaching 6.49× on average. We demonstrate that our system-level DBT system significantly reduces the effort required to support a new ISA while delivering outstanding performance.
- Rodolfo Azevedo, Sandro Rigo, Marcus Bartholomeu, Guido Araujo, Cristiano Araujo, and Edna Barros. 2005. The ArchC architecture description language and tools. Int. J. Parallel Program. 33, 5 (01 Oct. 2005), 453--484. DOI:https://doi.org/10.1007/s10766-005-7301-0Google Scholar
Digital Library
- Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM, New York, NY, 1--12. DOI:https://doi.org/10.1145/349299.349303Google Scholar
Digital Library
- Sorav Bansal and Alex Aiken. 2008. Binary translation using peephole superoptimizers. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, 177--192. Retrieved from http://dl.acm.org/citation.cfm?id=1855741.1855754.Google Scholar
Digital Library
- Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the Conference on USENIX Annual Technical Conference (ATEC’05). USENIX Association, Berkeley, CA, 41--41. Retrieved from http://dl.acm.org/citation.cfm?id=1247360.1247401.Google Scholar
Digital Library
- Igor Böhm, Tobias J. K. Edler von Koch, Stephen C. Kyle, Björn Franke, and Nigel Topham. 2011. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 74--85. DOI:https://doi.org/10.1145/1993498.1993508Google Scholar
- Florian Brandner, Andreas Fellnhofer, Andreas Krall, and David Riegler. 2008. Fast and accurate simulation using the LLVM compiler framework. In Proceedings of the Workshop on Rapid Simulation and Performance Evalution: Methods and Tools (RAPIDO’08).Google Scholar
- Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO’03). IEEE Computer Society, Washington, DC, 265--275. Retrieved from http://dl.acm.org/citation.cfm?id=776261.776290.Google Scholar
Digital Library
- Sebastian Buchwald, Andreas Fried, and Sebastian Hack. 2018. Synthesizing an instruction selection rule library from semantic specifications. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’18). ACM, New York, NY, 300--313. DOI:https://doi.org/10.1145/3168821Google Scholar
- Z. Cai, A. Liang, Z. Qi, L. Jiang, X. Li, H. Guan, and Y. Chen. 2009. Performance comparison of register allocation algorithms in dynamic binary translation. In Proceedings of the International Conference on Knowledge and Systems Engineering. 113--119. DOI:https://doi.org/10.1109/KSE.2009.16Google Scholar
- Matthew Chapman, Daniel J. Magenheimer, and Parthasarathy Ranganathan. 2007. MagiXen: Combining Binary Translation and Virtualization. Technical Report HPL-2007-77. Enterprise Systems and Software Laboratory, HP Laboratories, Palo Alto, CA.Google Scholar
- Anton Chernoff, Mark Herdeg, Ray Hookway, Chris Reeve, Norman Rubin, Tony Tye, S. Bharadwaj Yadavalli, and John Yates. 1998. FX!32: A profile-directed binary translator. IEEE Micro 18, 2 (Mar. 1998), 56--64. DOI:https://doi.org/10.1109/40.671403Google Scholar
Digital Library
- Cristina Cifuentes, Brian Lewis, and David Ung. 2002. Walkabout: A Retargetable Dynamic Binary Translation Framework. Technical Report. Sun Microsystems, Inc., Mountain View, CA.Google Scholar
- Robert F. Cmelik and David Keppel. 1993. Shade: A Fast Instruction Set Simulator for Execution Profiling. Technical Report. Sun Microsystems, Inc., Mountain View, CA.Google Scholar
- Emilio G. Cota, Paolo Bonzini, Alex Bennée, and Luca P. Carloni. 2017. Cross-ISA machine emulation for multicores. In Proceedings of the International Symposium on Code Generation and Optimization, (CGO’17), Vijay Janapa Reddi, Aaron Smith, and Lingjia Tang (Eds.). ACM, 210--220. Retrieved from http://dl.acm.org/citation.cfm?id=3049855.Google Scholar
- Amanieu d’Antras, Cosmin Gorgovan, Jim Garside, John Goodacre, and Mikel Luján. 2017. HyperMAMBO-X64: Using virtualization to support high-performance transparent binary translation. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’17). ACM, New York, NY, 228--241. DOI:https://doi.org/10.1145/3050748.3050756Google Scholar
Digital Library
- Amanieu D’Antras, Cosmin Gorgovan, Jim Garside, and Mikel Luján. 2017. Low overhead dynamic binary translation on ARM. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’17). ACM, New York, NY, 333--346. DOI:https://doi.org/10.1145/3062341.3062371Google Scholar
Digital Library
- James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The Transmeta Code Morphing software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO’03). IEEE Computer Society, Washington, DC, 15--24. Retrieved from http://dl.acm.org/citation.cfm?id=776261.776263.Google Scholar
Cross Ref
- Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, and Yeh-Ching Chung. 2011. PQEMU: A parallel system emulator based on QEMU. In Proceedings of the IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS’11). IEEE Computer Society, Washington, DC, 276--283. DOI:https://doi.org/10.1109/ICPADS.2011.102Google Scholar
Digital Library
- Kemal Ebcioğlu and Erik R. Altman. 1997. DAISY: Dynamic compilation for 100% architectural compatibility. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA’97). ACM, New York, NY, 26--37. DOI:https://doi.org/10.1145/264107.264126Google Scholar
- Byron Hawkins, Brian Demsky, Derek Bruening, and Qin Zhao. 2015. Optimizing binary translation of dynamically generated code. In Proceedings of the 13th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’15). IEEE Computer Society, Washington, DC, 68--78. Retrieved from http://dl.acm.org/citation.cfm?id=2738600.2738610.Google Scholar
Digital Library
- Ding-Yong Hong, Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu, Wei-Chung Hsu, Pangfeng Liu, Chien-Min Wang, and Yeh-Ching Chung. 2012. HQEMU: A multi-threaded and retargetable dynamic binary translator on multicores. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 104--113. DOI:https://doi.org/10.1145/2259016.2259030Google Scholar
Digital Library
- Ding-Yong Hong, Yu-Ping Liu, Sheng-Yu Fu, Jan-Jan Wu, and Wei-Chung Hsu. 2018. Improving SIMD parallelism via dynamic binary translation. ACM Trans. Embed. Comput. Syst. 17, 3 (Feb. 2018), 61:1–61:27. DOI:https://doi.org/10.1145/3173456Google Scholar
Digital Library
- Intel. 2018. Intel XED. Retrieved from https://intelxed.github.io/.Google Scholar
- Daniel Jones and Nigel Topham. 2009. High speed CPU simulation using LTU dynamic binary translation. In Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). Springer-Verlag, Berlin, 50--64. DOI:https://doi.org/10.1007/978-3-540-92990-1_6Google Scholar
Digital Library
- Piyus Kedia and Sorav Bansal. 2013. Fast dynamic binary translation for the kernel. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 101--115. DOI:https://doi.org/10.1145/2517349.2522718Google Scholar
Digital Library
- Paul Knowles. 2008. Transitive and QuickTransit Overview. Retrieved from https://www.linux-kvm.org/images/9/98/KvmForum2008%24kdf2008_2.pdf.Google Scholar
- Rajeev Krishna and Todd Austin. 2001. Efficient software decoder design. Tech. Commit. Comput. Archit. Newslett. (Oct. 2001).Google Scholar
- Jianhui Li, Qi Zhang, Shu Xu, and Bo Huang. 2006. Optimizing dynamic binary translation for SIMD instructions. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). IEEE Computer Society, Washington, DC, 269--280. DOI:https://doi.org/10.1109/CGO.2006.27Google Scholar
Digital Library
- D. Lockhart, B. Ilbeyi, and C. Batten. 2015. Pydgin: Generating fast instruction set simulators from simple architecture descriptions with meta-tracing JIT compilers. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’15). 256--267. DOI:https://doi.org/10.1109/ISPASS.2015.7095811Google Scholar
- Ryan W. Moore, José A. Baiocchi, Bruce R. Childers, Jack W. Davidson, and Jason D. Hiser. 2009. Addressing the challenges of DBT for the ARM architecture. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’09). ACM, New York, NY, 147--156. DOI:https://doi.org/10.1145/1542452.1542472Google Scholar
- Guilherme Ottoni, Thomas Hartin, Christopher Weaver, Jason Brandt, Belliappa Kuttanna, and Hong Wang. 2011. Harmonia: A transparent, efficient, and harmonious dynamic binary translator targeting the Intel architecture. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF’11). ACM, New York, NY, 26:1–26:10. DOI:https://doi.org/10.1145/2016604.2016635Google Scholar
Digital Library
- M. Probst, A. Krall, and B. Scholz. 2002. Register liveness analysis for optimizing dynamic binary translation. In Proceedings of the 9th Working Conference on Reverse Engineering. 35--44. DOI:https://doi.org/10.1109/WCRE.2002.1173062Google Scholar
- S. Rokicki, E. Rohou, and S. Derrien. 2017. Hardware-accelerated dynamic binary translation. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’17). 1062--1067. DOI:https://doi.org/10.23919/DATE.2017.7927147Google Scholar
- Kevin Scott and Jack Davidson. 2001. Strata: A Software Dynamic Translation Infrastructure. Technical Report. University of Virginia, Charlottesville, VA.Google Scholar
Digital Library
- K. Scott, N. Kumar, S. Velusamy, B. Childers, J. W. Davidson, and M. L. Soffa. 2003. Retargetable and reconfigurable software dynamic translation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO’03). IEEE Computer Society, Washington, DC, 36--47. Retrieved from http://dl.acm.org/citation.cfm?id=776261.776265.Google Scholar
- K. Shigenobu, K. Ootsu, T. Ohkawa, and T. Yokota. 2018. A translation method of ARM machine code to LLVM-IR for binary code parallelization and optimization. In Proceedings of the 5th International Symposium on Computing and Networking (CANDAR’18), Vol. 00. 575--579. DOI:https://doi.org/10.1109/CANDAR.2017.75Google Scholar
- R. A. Sokolov and A. V. Ermolovich. 2012. Background optimization in full system binary translation. Program. Comput. Softw. 38, 3 (01 June 2012), 119--126. DOI:https://doi.org/10.1134/S0361768812030073Google Scholar
- Maxwell Souza, Daniel Nicácio, and Guido Araújo. 2012. ISAMAP: Instruction mapping driven by dynamic binary translation. In Proceedings of the International Conference on Computer Architecture (ISCA’10). Springer-Verlag, Berlin, 117--138. DOI:https://doi.org/10.1007/978-3-642-24322-6_11Google Scholar
- Tom Spink, Harry Wagstaff, and Björn Franke. 2016. Hardware-accelerated cross-architecture full-system virtualization. ACM Trans. Archit. Code Optim. 13, 4 (Oct. 2016), 36:1–36:25. DOI:https://doi.org/10.1145/2996798Google Scholar
Digital Library
- Tom Spink, Harry Wagstaff, Björn Franke, and Nigel P. Topham. 2015. Efficient dual-ISA support in a retargetable, asynchronous dynamic binary translator. In Proceedings of the International Conference/Workshop on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS’15). 103--112.Google Scholar
- Michael Spreitzenbarth, Thomas Schreck, Florian Echtler, Daniel Arp, and Johannes Hoffmann. 2015. Mobile-sandbox: Combining static and dynamic analysis with machine-learning techniques. Int. J. Inf. Secur. 14, 2 (Apr. 2015), 141--153. DOI:https://doi.org/10.1007/s10207-014-0250-0Google Scholar
Digital Library
- Amitabh Srivastava, Andrew Edwards, and Hoi Vo. 2001. Vulcan: Binary Transformation in a Distributed Environment. Technical Report. Microsoft Research. 12 pages. Retrieved from https://www.microsoft.com/en-us/research/publication/vulcan-binary-transformation-in-a-distributed-environment/.Google Scholar
- Henrik Theiling. 2001. Generating decision trees for decoding binaries. In Proceedings of the ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems (OM’01). ACM, New York, NY, 112--120. DOI:https://doi.org/10.1145/384198.384213Google Scholar
Digital Library
- Jens Tröger. 2005. Specification-driven Dynamic Binary Translation. Ph.D. Dissertation. Queensland University of Technology. Retrieved from https://eprints.qut.edu.au/16007/.Google Scholar
- David Ung and Cristina Cifuentes. 2000. Machine-adaptable dynamic binary translation. In Proceedings of the ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (DYNAMO’00). ACM, New York, NY, 41--51. DOI:https://doi.org/10.1145/351397.351414Google Scholar
Digital Library
- H. Wagstaff, B. Bodin, T. Spink, and B. Franke. 2017. SimBench: A portable benchmarking methodology for full-system simulators. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’17). 217--226. DOI:https://doi.org/10.1109/ISPASS.2017.7975293Google Scholar
- H. Wagstaff, M. Gould, B. Franke, and N. Topham. 2013. Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description. In Proceedings of the 50th ACM/EDAC/IEEE Design Automation Conference (DAC’13). 1--6. DOI:https://doi.org/10.1145/2463209.2488760Google Scholar
- Cheng Wang, Shiliang Hu, Ho-seop Kim, Sreekumar R. Nair, Mauricio Breternitz, Zhiwei Ying, and Youfeng Wu. 2007. StarDBT: An efficient multi-platform dynamic binary translation system. In Proceedings of the 12th Asia-Pacific Conference on Advances in Computer Systems Architecture (ACSAC’07). Springer-Verlag, Berlin, 4--15. Retrieved from http://dl.acm.org/citation.cfm?id=2392163.2392166.Google Scholar
Digital Library
- Wenwen Wang, Stephen McCamant, Antonia Zhai, and Pen-Chung Yew. 2018. Enhancing cross-ISA DBT through automatically learned translation rules. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’18). ACM, New York, NY, 84--97. DOI:https://doi.org/10.1145/3173162.3177160Google Scholar
Digital Library
- Wenwen Wang, Pen-Chung Yew, Antonia Zhai, and Stephen McCamant. 2016. A general persistent code caching framework for dynamic binary translation (DBT). In Proceedings of the Usenix Annual Technical Conference (USENIX ATC’16). USENIX Association, Berkeley, CA, 591--603. Retrieved from http://dl.acm.org/citation.cfm?id=3026959.3027013.Google Scholar
- Zhe Wang, Jianjun Li, Chenggang Wu, Dongyan Yang, Zhenjiang Wang, Wei-Chung Hsu, Bin Li, and Yong Guan. 2015. HSPT: Practical implementation and efficient management of embedded shadow page tables for cross-ISA system virtual machines. ACM SIGPLAN Not., Vol. 50. ACM, 53--64.Google Scholar
Digital Library
- Tom Warren. 2015. Microsoft built an Xbox 360 emulator to make games run on the Xbox One. Retrieved from https://www.theverge.com/2015/6/15/8785955/microsoft-xbox-one-xbox-360-emulator-software.Google Scholar
- Emmett Witchel and Mendel Rosenblum. 1996. Embra: Fast and flexible machine simulation. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’96). ACM, New York, NY, 68--79. DOI:https://doi.org/10.1145/233013.233025Google Scholar
Digital Library
- Chaohao Xu, Jianhui Li, Tao Bao, Yun Wang, and Bo Huang. 2007. Metadata driven memory optimizations in dynamic binary translator. In Proceedings of the 3rd International Conference on Virtual Execution Environments (VEE’07). ACM, New York, NY, 148--157. DOI:https://doi.org/10.1145/1254810.1254831Google Scholar
Digital Library
- Xiaochun Zhang, Qi Guo, Yunji Chen, Tianshi Chen, and Weiwu Hu. 2015. HERMES: A fast cross-ISA binary translator with post-optimization. In Proceedings of the 13th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’15). IEEE Computer Society, Washington, DC, 246--256. Retrieved from http://dl.acm.org/citation.cfm?id=2738600.2738631.Google Scholar
Cross Ref
Index Terms
A Retargetable System-level DBT Hypervisor
Recommendations
Virtual Machine Migration Method between Different Hypervisor Implementations and Its Evaluation
WAINA '12: Proceedings of the 2012 26th International Conference on Advanced Information Networking and Applications WorkshopsVirtualization technologies are an important building block for cloud services. Each service will run on virtual machines (VMs) deployed over different hyper visors in the future. Therefore, a VM migration method between different hyper visor ...
A retargetable system-level DBT hypervisor
USENIX ATC '19: Proceedings of the 2019 USENIX Conference on Usenix Annual Technical ConferenceSystem-level Dynamic Binary Translation (DBT) provides the capability to boot an Operating System (OS) and execute programs compiled for an Instruction Set Architecture (ISA) different to that of the host machine. Due to their performance-critical ...
Fast and live hypervisor replacement
VEE 2019: Proceedings of the 15th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution EnvironmentsHypervisors are increasingly complex and must be often updated for applying security patches, bug fixes, and feature upgrades. However, in a virtualized cloud infrastructure, updates to an operational hypervisor can be highly disruptive. Before being ...






Comments