skip to main content
research-article

A Retargetable System-level DBT Hypervisor

Published:30 May 2020Publication History
Skip Abstract Section

Abstract

System-level Dynamic Binary Translation (DBT) provides the capability to boot an Operating System (OS) and execute programs compiled for an Instruction Set Architecture (ISA) different from that of the host machine. Due to their performance-critical nature, system-level DBT frameworks are typically hand-coded and heavily optimized, both for their guest and host architectures. While this results in good performance of the DBT system, engineering costs for supporting a new architecture or extending an existing architecture are high. In this article, we develop a novel, retargetable DBT hypervisor, which includes guest-specific modules generated from high-level guest machine specifications. Our system simplifies retargeting of the DBT, but it also delivers performance levels in excess of existing manually created DBT solutions. We achieve this by combining offline and online optimizations and exploiting the freedom of a Just-in-time (JIT) compiler operating in a bare-metal environment provided by a Virtual Machine (VM) hypervisor. We evaluate our DBT using both targeted micro-benchmarks as well as standard application benchmarks, and we demonstrate its ability to outperform the de facto standard QEMU DBT system. Our system delivers an average speedup of 2.21× over QEMU across SPEC CPU2006 integer benchmarks running in a full-system Linux OS environment, compiled for the 64-bit ARMv8-A ISA and hosted on an x86-64 platform. For floating-point applications the speedup is even higher, reaching 6.49× on average. We demonstrate that our system-level DBT system significantly reduces the effort required to support a new ISA while delivering outstanding performance.

References

  1. Rodolfo Azevedo, Sandro Rigo, Marcus Bartholomeu, Guido Araujo, Cristiano Araujo, and Edna Barros. 2005. The ArchC architecture description language and tools. Int. J. Parallel Program. 33, 5 (01 Oct. 2005), 453--484. DOI:https://doi.org/10.1007/s10766-005-7301-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. 2000. Dynamo: A transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’00). ACM, New York, NY, 1--12. DOI:https://doi.org/10.1145/349299.349303Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Sorav Bansal and Alex Aiken. 2008. Binary translation using peephole superoptimizers. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). USENIX Association, Berkeley, CA, 177--192. Retrieved from http://dl.acm.org/citation.cfm?id=1855741.1855754.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Fabrice Bellard. 2005. QEMU, a fast and portable dynamic translator. In Proceedings of the Conference on USENIX Annual Technical Conference (ATEC’05). USENIX Association, Berkeley, CA, 41--41. Retrieved from http://dl.acm.org/citation.cfm?id=1247360.1247401.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Igor Böhm, Tobias J. K. Edler von Koch, Stephen C. Kyle, Björn Franke, and Nigel Topham. 2011. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’11). ACM, New York, NY, 74--85. DOI:https://doi.org/10.1145/1993498.1993508Google ScholarGoogle Scholar
  6. Florian Brandner, Andreas Fellnhofer, Andreas Krall, and David Riegler. 2008. Fast and accurate simulation using the LLVM compiler framework. In Proceedings of the Workshop on Rapid Simulation and Performance Evalution: Methods and Tools (RAPIDO’08).Google ScholarGoogle Scholar
  7. Derek Bruening, Timothy Garnett, and Saman Amarasinghe. 2003. An infrastructure for adaptive dynamic optimization. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO’03). IEEE Computer Society, Washington, DC, 265--275. Retrieved from http://dl.acm.org/citation.cfm?id=776261.776290.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Sebastian Buchwald, Andreas Fried, and Sebastian Hack. 2018. Synthesizing an instruction selection rule library from semantic specifications. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’18). ACM, New York, NY, 300--313. DOI:https://doi.org/10.1145/3168821Google ScholarGoogle Scholar
  9. Z. Cai, A. Liang, Z. Qi, L. Jiang, X. Li, H. Guan, and Y. Chen. 2009. Performance comparison of register allocation algorithms in dynamic binary translation. In Proceedings of the International Conference on Knowledge and Systems Engineering. 113--119. DOI:https://doi.org/10.1109/KSE.2009.16Google ScholarGoogle Scholar
  10. Matthew Chapman, Daniel J. Magenheimer, and Parthasarathy Ranganathan. 2007. MagiXen: Combining Binary Translation and Virtualization. Technical Report HPL-2007-77. Enterprise Systems and Software Laboratory, HP Laboratories, Palo Alto, CA.Google ScholarGoogle Scholar
  11. Anton Chernoff, Mark Herdeg, Ray Hookway, Chris Reeve, Norman Rubin, Tony Tye, S. Bharadwaj Yadavalli, and John Yates. 1998. FX!32: A profile-directed binary translator. IEEE Micro 18, 2 (Mar. 1998), 56--64. DOI:https://doi.org/10.1109/40.671403Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cristina Cifuentes, Brian Lewis, and David Ung. 2002. Walkabout: A Retargetable Dynamic Binary Translation Framework. Technical Report. Sun Microsystems, Inc., Mountain View, CA.Google ScholarGoogle Scholar
  13. Robert F. Cmelik and David Keppel. 1993. Shade: A Fast Instruction Set Simulator for Execution Profiling. Technical Report. Sun Microsystems, Inc., Mountain View, CA.Google ScholarGoogle Scholar
  14. Emilio G. Cota, Paolo Bonzini, Alex Bennée, and Luca P. Carloni. 2017. Cross-ISA machine emulation for multicores. In Proceedings of the International Symposium on Code Generation and Optimization, (CGO’17), Vijay Janapa Reddi, Aaron Smith, and Lingjia Tang (Eds.). ACM, 210--220. Retrieved from http://dl.acm.org/citation.cfm?id=3049855.Google ScholarGoogle Scholar
  15. Amanieu d’Antras, Cosmin Gorgovan, Jim Garside, John Goodacre, and Mikel Luján. 2017. HyperMAMBO-X64: Using virtualization to support high-performance transparent binary translation. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’17). ACM, New York, NY, 228--241. DOI:https://doi.org/10.1145/3050748.3050756Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Amanieu D’Antras, Cosmin Gorgovan, Jim Garside, and Mikel Luján. 2017. Low overhead dynamic binary translation on ARM. In Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’17). ACM, New York, NY, 333--346. DOI:https://doi.org/10.1145/3062341.3062371Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. James C. Dehnert, Brian K. Grant, John P. Banning, Richard Johnson, Thomas Kistler, Alexander Klaiber, and Jim Mattson. 2003. The Transmeta Code Morphing software: Using speculation, recovery, and adaptive retranslation to address real-life challenges. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO’03). IEEE Computer Society, Washington, DC, 15--24. Retrieved from http://dl.acm.org/citation.cfm?id=776261.776263.Google ScholarGoogle ScholarCross RefCross Ref
  18. Jiun-Hung Ding, Po-Chun Chang, Wei-Chung Hsu, and Yeh-Ching Chung. 2011. PQEMU: A parallel system emulator based on QEMU. In Proceedings of the IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS’11). IEEE Computer Society, Washington, DC, 276--283. DOI:https://doi.org/10.1109/ICPADS.2011.102Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Kemal Ebcioğlu and Erik R. Altman. 1997. DAISY: Dynamic compilation for 100% architectural compatibility. In Proceedings of the 24th International Symposium on Computer Architecture (ISCA’97). ACM, New York, NY, 26--37. DOI:https://doi.org/10.1145/264107.264126Google ScholarGoogle Scholar
  20. Byron Hawkins, Brian Demsky, Derek Bruening, and Qin Zhao. 2015. Optimizing binary translation of dynamically generated code. In Proceedings of the 13th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’15). IEEE Computer Society, Washington, DC, 68--78. Retrieved from http://dl.acm.org/citation.cfm?id=2738600.2738610.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Ding-Yong Hong, Chun-Chen Hsu, Pen-Chung Yew, Jan-Jan Wu, Wei-Chung Hsu, Pangfeng Liu, Chien-Min Wang, and Yeh-Ching Chung. 2012. HQEMU: A multi-threaded and retargetable dynamic binary translator on multicores. In Proceedings of the 10th International Symposium on Code Generation and Optimization (CGO’12). ACM, New York, NY, 104--113. DOI:https://doi.org/10.1145/2259016.2259030Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ding-Yong Hong, Yu-Ping Liu, Sheng-Yu Fu, Jan-Jan Wu, and Wei-Chung Hsu. 2018. Improving SIMD parallelism via dynamic binary translation. ACM Trans. Embed. Comput. Syst. 17, 3 (Feb. 2018), 61:1–61:27. DOI:https://doi.org/10.1145/3173456Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Intel. 2018. Intel XED. Retrieved from https://intelxed.github.io/.Google ScholarGoogle Scholar
  24. Daniel Jones and Nigel Topham. 2009. High speed CPU simulation using LTU dynamic binary translation. In Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers (HiPEAC’09). Springer-Verlag, Berlin, 50--64. DOI:https://doi.org/10.1007/978-3-540-92990-1_6Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Piyus Kedia and Sorav Bansal. 2013. Fast dynamic binary translation for the kernel. In Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13). ACM, New York, NY, 101--115. DOI:https://doi.org/10.1145/2517349.2522718Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Paul Knowles. 2008. Transitive and QuickTransit Overview. Retrieved from https://www.linux-kvm.org/images/9/98/KvmForum2008%24kdf2008_2.pdf.Google ScholarGoogle Scholar
  27. Rajeev Krishna and Todd Austin. 2001. Efficient software decoder design. Tech. Commit. Comput. Archit. Newslett. (Oct. 2001).Google ScholarGoogle Scholar
  28. Jianhui Li, Qi Zhang, Shu Xu, and Bo Huang. 2006. Optimizing dynamic binary translation for SIMD instructions. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’06). IEEE Computer Society, Washington, DC, 269--280. DOI:https://doi.org/10.1109/CGO.2006.27Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. Lockhart, B. Ilbeyi, and C. Batten. 2015. Pydgin: Generating fast instruction set simulators from simple architecture descriptions with meta-tracing JIT compilers. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’15). 256--267. DOI:https://doi.org/10.1109/ISPASS.2015.7095811Google ScholarGoogle Scholar
  30. Ryan W. Moore, José A. Baiocchi, Bruce R. Childers, Jack W. Davidson, and Jason D. Hiser. 2009. Addressing the challenges of DBT for the ARM architecture. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES’09). ACM, New York, NY, 147--156. DOI:https://doi.org/10.1145/1542452.1542472Google ScholarGoogle Scholar
  31. Guilherme Ottoni, Thomas Hartin, Christopher Weaver, Jason Brandt, Belliappa Kuttanna, and Hong Wang. 2011. Harmonia: A transparent, efficient, and harmonious dynamic binary translator targeting the Intel architecture. In Proceedings of the 8th ACM International Conference on Computing Frontiers (CF’11). ACM, New York, NY, 26:1–26:10. DOI:https://doi.org/10.1145/2016604.2016635Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. Probst, A. Krall, and B. Scholz. 2002. Register liveness analysis for optimizing dynamic binary translation. In Proceedings of the 9th Working Conference on Reverse Engineering. 35--44. DOI:https://doi.org/10.1109/WCRE.2002.1173062Google ScholarGoogle Scholar
  33. S. Rokicki, E. Rohou, and S. Derrien. 2017. Hardware-accelerated dynamic binary translation. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’17). 1062--1067. DOI:https://doi.org/10.23919/DATE.2017.7927147Google ScholarGoogle Scholar
  34. Kevin Scott and Jack Davidson. 2001. Strata: A Software Dynamic Translation Infrastructure. Technical Report. University of Virginia, Charlottesville, VA.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. K. Scott, N. Kumar, S. Velusamy, B. Childers, J. W. Davidson, and M. L. Soffa. 2003. Retargetable and reconfigurable software dynamic translation. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization (CGO’03). IEEE Computer Society, Washington, DC, 36--47. Retrieved from http://dl.acm.org/citation.cfm?id=776261.776265.Google ScholarGoogle Scholar
  36. K. Shigenobu, K. Ootsu, T. Ohkawa, and T. Yokota. 2018. A translation method of ARM machine code to LLVM-IR for binary code parallelization and optimization. In Proceedings of the 5th International Symposium on Computing and Networking (CANDAR’18), Vol. 00. 575--579. DOI:https://doi.org/10.1109/CANDAR.2017.75Google ScholarGoogle Scholar
  37. R. A. Sokolov and A. V. Ermolovich. 2012. Background optimization in full system binary translation. Program. Comput. Softw. 38, 3 (01 June 2012), 119--126. DOI:https://doi.org/10.1134/S0361768812030073Google ScholarGoogle Scholar
  38. Maxwell Souza, Daniel Nicácio, and Guido Araújo. 2012. ISAMAP: Instruction mapping driven by dynamic binary translation. In Proceedings of the International Conference on Computer Architecture (ISCA’10). Springer-Verlag, Berlin, 117--138. DOI:https://doi.org/10.1007/978-3-642-24322-6_11Google ScholarGoogle Scholar
  39. Tom Spink, Harry Wagstaff, and Björn Franke. 2016. Hardware-accelerated cross-architecture full-system virtualization. ACM Trans. Archit. Code Optim. 13, 4 (Oct. 2016), 36:1–36:25. DOI:https://doi.org/10.1145/2996798Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Tom Spink, Harry Wagstaff, Björn Franke, and Nigel P. Topham. 2015. Efficient dual-ISA support in a retargetable, asynchronous dynamic binary translator. In Proceedings of the International Conference/Workshop on Embedded Computer Systems: Architectures, Modeling and Simulation (SAMOS’15). 103--112.Google ScholarGoogle Scholar
  41. Michael Spreitzenbarth, Thomas Schreck, Florian Echtler, Daniel Arp, and Johannes Hoffmann. 2015. Mobile-sandbox: Combining static and dynamic analysis with machine-learning techniques. Int. J. Inf. Secur. 14, 2 (Apr. 2015), 141--153. DOI:https://doi.org/10.1007/s10207-014-0250-0Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Amitabh Srivastava, Andrew Edwards, and Hoi Vo. 2001. Vulcan: Binary Transformation in a Distributed Environment. Technical Report. Microsoft Research. 12 pages. Retrieved from https://www.microsoft.com/en-us/research/publication/vulcan-binary-transformation-in-a-distributed-environment/.Google ScholarGoogle Scholar
  43. Henrik Theiling. 2001. Generating decision trees for decoding binaries. In Proceedings of the ACM SIGPLAN Workshop on Optimization of Middleware and Distributed Systems (OM’01). ACM, New York, NY, 112--120. DOI:https://doi.org/10.1145/384198.384213Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jens Tröger. 2005. Specification-driven Dynamic Binary Translation. Ph.D. Dissertation. Queensland University of Technology. Retrieved from https://eprints.qut.edu.au/16007/.Google ScholarGoogle Scholar
  45. David Ung and Cristina Cifuentes. 2000. Machine-adaptable dynamic binary translation. In Proceedings of the ACM SIGPLAN Workshop on Dynamic and Adaptive Compilation and Optimization (DYNAMO’00). ACM, New York, NY, 41--51. DOI:https://doi.org/10.1145/351397.351414Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. H. Wagstaff, B. Bodin, T. Spink, and B. Franke. 2017. SimBench: A portable benchmarking methodology for full-system simulators. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’17). 217--226. DOI:https://doi.org/10.1109/ISPASS.2017.7975293Google ScholarGoogle Scholar
  47. H. Wagstaff, M. Gould, B. Franke, and N. Topham. 2013. Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description. In Proceedings of the 50th ACM/EDAC/IEEE Design Automation Conference (DAC’13). 1--6. DOI:https://doi.org/10.1145/2463209.2488760Google ScholarGoogle Scholar
  48. Cheng Wang, Shiliang Hu, Ho-seop Kim, Sreekumar R. Nair, Mauricio Breternitz, Zhiwei Ying, and Youfeng Wu. 2007. StarDBT: An efficient multi-platform dynamic binary translation system. In Proceedings of the 12th Asia-Pacific Conference on Advances in Computer Systems Architecture (ACSAC’07). Springer-Verlag, Berlin, 4--15. Retrieved from http://dl.acm.org/citation.cfm?id=2392163.2392166.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Wenwen Wang, Stephen McCamant, Antonia Zhai, and Pen-Chung Yew. 2018. Enhancing cross-ISA DBT through automatically learned translation rules. In Proceedings of the 23rd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’18). ACM, New York, NY, 84--97. DOI:https://doi.org/10.1145/3173162.3177160Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wenwen Wang, Pen-Chung Yew, Antonia Zhai, and Stephen McCamant. 2016. A general persistent code caching framework for dynamic binary translation (DBT). In Proceedings of the Usenix Annual Technical Conference (USENIX ATC’16). USENIX Association, Berkeley, CA, 591--603. Retrieved from http://dl.acm.org/citation.cfm?id=3026959.3027013.Google ScholarGoogle Scholar
  51. Zhe Wang, Jianjun Li, Chenggang Wu, Dongyan Yang, Zhenjiang Wang, Wei-Chung Hsu, Bin Li, and Yong Guan. 2015. HSPT: Practical implementation and efficient management of embedded shadow page tables for cross-ISA system virtual machines. ACM SIGPLAN Not., Vol. 50. ACM, 53--64.Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Tom Warren. 2015. Microsoft built an Xbox 360 emulator to make games run on the Xbox One. Retrieved from https://www.theverge.com/2015/6/15/8785955/microsoft-xbox-one-xbox-360-emulator-software.Google ScholarGoogle Scholar
  53. Emmett Witchel and Mendel Rosenblum. 1996. Embra: Fast and flexible machine simulation. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS’96). ACM, New York, NY, 68--79. DOI:https://doi.org/10.1145/233013.233025Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Chaohao Xu, Jianhui Li, Tao Bao, Yun Wang, and Bo Huang. 2007. Metadata driven memory optimizations in dynamic binary translator. In Proceedings of the 3rd International Conference on Virtual Execution Environments (VEE’07). ACM, New York, NY, 148--157. DOI:https://doi.org/10.1145/1254810.1254831Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Xiaochun Zhang, Qi Guo, Yunji Chen, Tianshi Chen, and Weiwu Hu. 2015. HERMES: A fast cross-ISA binary translator with post-optimization. In Proceedings of the 13th IEEE/ACM International Symposium on Code Generation and Optimization (CGO’15). IEEE Computer Society, Washington, DC, 246--256. Retrieved from http://dl.acm.org/citation.cfm?id=2738600.2738631.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. A Retargetable System-level DBT Hypervisor

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Published in

            cover image ACM Transactions on Computer Systems
            ACM Transactions on Computer Systems  Volume 36, Issue 4
            Section: Best of ATC 2019 and Regular Paper
            November 2018
            115 pages
            ISSN:0734-2071
            EISSN:1557-7333
            DOI:10.1145/3394910
            Issue’s Table of Contents

            Copyright © 2020 ACM

            Publisher

            Association for Computing Machinery

            New York, NY, United States

            Publication History

            • Published: 30 May 2020
            • Online AM: 7 May 2020
            • Accepted: 1 February 2020
            • Received: 1 November 2019
            Published in tocs Volume 36, Issue 4

            Permissions

            Request permissions about this article.

            Request Permissions

            Check for updates

            Qualifiers

            • research-article
            • Research
            • Refereed

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!