Abstract
Region-based JIT compilation operates on translation units comprising multiple basic blocks and, possibly cyclic or conditional, control flow between these. It promises to reconcile aggressive code optimisation and low compilation latency in performance-critical dynamic binary translators. Whilst various region selection schemes and isolated code optimisation techniques have been investigated it remains unclear how to best exploit such regions for efficient code generation. Complex interactions with indirect branch tables and translation caches can have adverse effects on performance if not considered carefully. In this paper we present a complete code generation strategy for a region-based dynamic binary translator, which exploits branch type and control flow profiling information to improve code quality for the common case. We demonstrate that using our code generation strategy a competitive region-based dynamic compiler can be built on top of the LLVM JIT compilation framework. For the ARM-V5T target ISA and SPEC CPU 2006 benchmarks we achieve execution rates of, on average, 867 MIPS and up to 1323 MIPS on a standard X86 host machine, outperforming state-of-the-art QEMU-ARM by delivering a speedup of 264%.
- J. Aycock. A brief history of just-in-time. ACM Comput. Surv., 35 (2): 97--113, June 2003. ISSN 0360-0300. 10.1145/857076.857077. URL http://doi.acm.org/10.1145/857076.857077. Google Scholar
Digital Library
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Proceedings of the ACM SIGPLAN 2000 Conference on Programming Language Design and Implementation, PLDI '00, pages 1--12, New York, NY, USA, 2000. ACM. ISBN 1--58113--199--2. 10.1145/349299.349303. URL http://doi.acm.org/10.1145/349299.349303. Google Scholar
Digital Library
- F. Bellard. QEMU, a fast and portable dynamic translator. In Proceedings of the Annual Conference on USENIX Annual Technical Conference, ATEC '05, pages 41--41, Berkeley, CA, USA, 2005. USENIX Association. URL http://dl.acm.org/citation.cfm?id=1247360.1247401. Google Scholar
Digital Library
- et al.(2011)Böhm, Edler von Koch, Kyle, Franke, and Topham}Bohm:2011:GJT:1993498.1993508I. Böhm, T. J. Edler von Koch, S. C. Kyle, B. Franke, and N. Topham. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '11, pages 74--85, New York, NY, USA, 2011. ACM. ISBN 978--1--4503-0663--8. 10.1145/1993498.1993508. URL http://doi.acm.org/10.1145/1993498.1993508. Google Scholar
Digital Library
- D. Bruening and E. Duesterwald. Exploring optimal compilation unit shapes for an embedded just-in-time compiler. In In Proceedings of the 2000 ACM Workshop on Feedback-Directed and Dynamic Optimization FDDO-3, pages 13--20, 2000.Google Scholar
- D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In phProceedings of the international symposium on Code Generation and Optimization: Feedback-directed and Runtime Optimization, CGO '03, pages 265--275, Washington, DC, USA, 2003. IEEE Computer Society. ISBN 0--7695--1913-X. URL http://dl.acm.org/citation.cfm?id=776261.776290. Google Scholar
Digital Library
- C. Cifuentes and M. V. Emmerik. Recovery of jump table case statements from binary code. In phProceedings of the 7th International Workshop on Program Comprehension, IWPC '99, pages 192--, Washington, DC, USA, 1999. IEEE Computer Society. ISBN 0--7695-0179--6. URL http://dl.acm.org/citation.cfm?id=520033.858247. Google Scholar
Digital Library
- C. Cifuentes and M. V. Emmerik. UQBT: Adaptive binary translation at low cost. phIEEE Computer, 33 (3): 60--66, 2000. Google Scholar
Digital Library
- B. Dhanasekaran and K. Hazelwood. Improving indirect branch translation in dynamic binary translators. In phProceedings of the ASPLOS Workshop on Runtime Environments, Systems, Layering, and Virtualized Environments, RESoLVE'11, pages 11--18, 2011.Google Scholar
- B. Guo, Y. Wu, C. Wang, M. J. Bridges, G. Ottoni, N. Vachharajani, J. Chang, and D. I. August. Selective runtime memory disambiguation in a dynamic binary translator. In phProceedings of the 15th International Conference on Compiler Construction, CC'06, pages 65--79, Berlin, Heidelberg, 2006. Springer-Verlag. ISBN 3--540--33050-X, 978--3--540--33050--9. 10.1007/11688839_6. URL http://dx.doi.org/10.1007/11688839_6. Google Scholar
Digital Library
- D. Hiniker, K. Hazelwood, and M. D. Smith. Improving region selection in dynamic optimization systems. In phProceedings of the 38th Annual IEEE/ACM International Symposium on Microarchitecture, MICRO 38, pages 141--154, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0--7695--2440-0. 10.1109/MICRO.2005.22. URL http://dx.doi.org/10.1109/MICRO.2005.22. Google Scholar
Digital Library
- Hiser, Kumar, Zhao, Zhou, Childers, Davidson, and Soffa}Hiser:2006:TTD:1898699.1898797J. D. Hiser, N. Kumar, M. Zhao, S. Zhou, B. R. Childers, J. W. Davidson, and M. L. Soffa. Techniques and tools for dynamic optimization. In phProceedings of the 20th International Conference on Parallel and Distributed Processing, IPDPS'06, pages 279--279, Washington, DC, USA, 2006\natexlaba. IEEE Computer Society. ISBN 1--4244-0054--6. URL http://dl.acm.org/citation.cfm?id=1898699.1898797. Google Scholar
Digital Library
- Hiser, Williams, Filipi, Davidson, and Childers}Hiser:2006:EFC:1134760.1134778J. D. Hiser, D. Williams, A. Filipi, J. W. Davidson, and B. R. Childers. Evaluating fragment construction policies for SDT systems. In phProceedings of the 2nd International Conference on Virtual Execution Environments, VEE '06, pages 122--132, New York, NY, USA, 2006\natexlabb. ACM. ISBN 1--59593--332--8. 10.1145/1134760.1134778. URL http://doi.acm.org/10.1145/1134760.1134778. Google Scholar
Digital Library
- J. D. Hiser, D. Williams, W. Hu, J. W. Davidson, J. Mars, and B. R. Childers. Evaluating indirect branch handling mechanisms in software dynamic translation systems. In phProceedings of the International Symposium on Code Generation and Optimization, CGO '07, pages 61--73, Washington, DC, USA, 2007. IEEE Computer Society. ISBN 0--7695--2764--7. 10.1109/CGO.2007.10. URL http://dx.doi.org/10.1109/CGO.2007.10. Google Scholar
Digital Library
- D.-Y. Hong, C.-C. Hsu, P.-C. Yew, J.-J. Wu, W.-C. Hsu, P. Liu, C.-M. Wang, and Y.-C. Chung. HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores. In phProceedings of the Tenth International Symposium on Code Generation and Optimization, CGO '12, pages 104--113, New York, NY, USA, 2012. ACM. ISBN 978--1--4503--1206--6. 10.1145/2259016.2259030. URL http://doi.acm.org/10.1145/2259016.2259030. Google Scholar
Digital Library
- C.-C. Hsu, P. Liu, C.-M. Wang, J.-J. Wu, D.-Y. Hong, P.-C. Yew, and W.-C. Hsu. LnQ: Building high performance dynamic binary translators with existing compiler backends. In Proceedings of the 2011 International Conference on Parallel Processing, ICPP '11, pages 226--234, Washington, DC, USA, 2011. IEEE Computer Society. ISBN 978-0--7695--4510--3. 10.1109/ICPP.2011.57. URL http://dx.doi.org/10.1109/ICPP.2011.57. Google Scholar
Digital Library
- C.-C. Hsu, P. Liu, J.-J. Wu, P.-C. Yew, D.-Y. Hong, W.-C. Hsu, and C.-M. Wang. Improving dynamic binary optimization through early-exit guided code region formation. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 23--32, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--1266-0. 10.1145/2451512.2451519. URL http://doi.acm.org/10.1145/2451512.2451519. Google Scholar
Digital Library
- N. Jia, C. Yang, J. Wang, D. Tong, and K. Wang. SPIRE: improving dynamic binary translation through SPC-indexed indirect branch redirecting. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 1--12, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--1266-0. 10.1145/2451512.2451516. URL http://doi.acm.org/10.1145/2451512.2451516. Google Scholar
Digital Library
- D. Jones and N. Topham. High speed CPU simulation using LTU dynamic binary translation. In Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers, HiPEAC '09, pages 50--64, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978--3--540--92989--5. 10.1007/978--3--540--92990--1_6. URL http://dx.doi.org/10.1007/978--3--540--92990--1_6. Google Scholar
Digital Library
- R. Joshi, M. D. Bond, and C. Zilles. Targeted path profiling: Lower overhead path profiling for staged dynamic optimization systems. In Proceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization, CGO '04, pages 239--, Washington, DC, USA, 2004. IEEE Computer Society. ISBN 0--7695--2102--9. URL http://dl.acm.org/citation.cfm?id=977395.977660. Google Scholar
Digital Library
- M. Kaufmann and R. G. Spallek. Superblock compilation and other optimization techniques for a Java-based DBT machine emulator. In Proceedings of the 9th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, VEE '13, pages 33--40, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--1266-0. 10.1145/2451512.2451521. URL http://doi.acm.org/10.1145/2451512.2451521. Google Scholar
Digital Library
- J. Kinder, F. Zuleger, and H. Veith. An abstract interpretation-based framework for control flow reconstruction from binaries. In Proceedings of the 10th International Conference on Verification, Model Checking, and Abstract Interpretation, VMCAI '09, pages 214--228, Berlin, Heidelberg, 2009. Springer-Verlag. ISBN 978--3--540--93899--6. 10.1007/978--3--540--93900--9_19. URL http://dx.doi.org/10.1007/978--3--540--93900--9_19. Google Scholar
Digital Library
- T. Koju, X. Tong, A. I. Sheikh, M. Ohara, and T. Nakatani. Optimizing indirect branches in a system-level dynamic binary translator. In Proceedings of the 5th Annual International Systems and Storage Conference, SYSTOR '12, pages 5:1--5:12, New York, NY, USA, 2012. ACM. ISBN 978--1--4503--1448-0. 10.1145/2367589.2367599. URL http://doi.acm.org/10.1145/2367589.2367599. Google Scholar
Digital Library
- R. W. Moore, J. A. Baiocchi, B. R. Childers, J. W. Davidson, and J. D. Hiser. Addressing the challenges of DBT for the ARM architecture. In Proceedings of the 2009 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems, LCTES '09, pages 147--156, New York, NY, USA, 2009. ACM. ISBN 978--1--60558--356--3. 10.1145/1542452.1542472. URL http://doi.acm.org/10.1145/1542452.1542472. Google Scholar
Digital Library
- E. Stahl and M. Anand. A comparison of PowerVM and x86-based virtualization performance. Technical Report WP101574, IBM Techdocs White Papers, 2010.Google Scholar
- T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for a Java just-in-time compiler. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Language Design and Implementation, PLDI '03, pages 312--323, New York, NY, USA, 2003. ACM. ISBN 1--58113--662--5. 10.1145/781131.781166. URL http://doi.acm.org/10.1145/781131.781166. Google Scholar
Digital Library
- T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for dynamic compilers. ACM Trans. Program. Lang. Syst., 28 (1): 134--174, Jan. 2006. ISSN 0164-0925. 10.1145/1111596.1111600. URL http://doi.acm.org/10.1145/1111596.1111600. Google Scholar
Digital Library
- D. Ung and C. Cifuentes. Optimising hot paths in a dynamic binary translator. SIGARCH Comput. Archit. News, 29 (1): 55--65, Mar. 2001. ISSN 0163--5964. 10.1145/373574.373590. URL http://doi.acm.org/10.1145/373574.373590. Google Scholar
Digital Library
- H. Wagstaff, M. Gould, B. Franke, and N. Topham. Early partial evaluation in a JIT-compiled, retargetable instruction set simulator generated from a high-level architecture description. In Proceedings of the Annual Design Automation Conference, DAC '13, pages 21:1--21:6, New York, NY, USA, 2013. ACM. ISBN 978--1--4503--2071--9. 10.1145/2463209.2488760. URL http://doi.acm.org/10.1145/2463209.2488760. Google Scholar
Digital Library
- J. Whaley. Partial method compilation using dynamic profile information. In phProceedings of the 16th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA '01, pages 166--179, New York, NY, USA, 2001. ACM. ISBN 1--58113--335--9. 10.1145/504282.504295. URL http://doi.acm.org/10.1145/504282.504295. Google Scholar
Digital Library
- L. Yin, J. Haitao, S. Guangzhong, J. Guojie, and C. Guoliang. Improve indirect branch prediction with private cache in dynamic binary translation. In International Conference on High Performance Computing and Communication and International Conference on Embedded Software and Systems (HPCC-ICESS), pages 280--286, 2012. 10.1109/HPCC.2012.45. Google Scholar
Digital Library
- C. Zheng and C. Thompson. PA-RISC to IA-64: transparent execution, no recompilation. Computer, 33 (3): 47 --52, Mar. 2000. Google Scholar
Digital Library
Index Terms
Efficient code generation in a region-based dynamic binary translator
Recommendations
Processor-Tracing Guided Region Formation in Dynamic Binary Translation
Region formation is an important step in dynamic binary translation to select hot code regions for translation and optimization. The quality of the formed regions determines the extent of optimizations and thus determines the final execution ...
Efficient code generation in a region-based dynamic binary translator
LCTES '14: Proceedings of the 2014 SIGPLAN/SIGBED conference on Languages, compilers and tools for embedded systemsRegion-based JIT compilation operates on translation units comprising multiple basic blocks and, possibly cyclic or conditional, control flow between these. It promises to reconcile aggressive code optimisation and low compilation latency in performance-...
Improving dynamic binary optimization through early-exit guided code region formation
VEE '13Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of ...







Comments