Abstract
Most dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of the dynamically formed code regions determines the extent and the types of optimization opportunities that can be exposed to DBTs and DBOs, and thus, determines the ultimate quality of the final optimized code. The Next-Executing-Tail (NET) trace formation method used in HP Dynamo is an early example of such techniques. Many existing trace formation schemes are variants of NET. They work very well for most binary traces, but they also suffer a major problem: the formed traces may contain a large number of early exits that could be branched out during the execution. If this happens frequently, the program execution will spend more time in the slow binary interpreter or in the unoptimized code regions than in the optimized traces in code cache. The benefit of the trace optimization is thus lost. Traces/regions with frequently taken early-exits are called delinquent traces/regions. Our empirical study shows that at least 8 of the 12 SPEC CPU2006 integer benchmarks have delinquent traces.
In this paper, we propose a light-weight region formation technique called Early-Exit Guided Region Formation (EEG) to improve the quality of the formed traces/regions. It iteratively identifies and merges delinquent regions into larger code regions. We have implemented our EEG algorithm in two LLVM-based multi-threaded DBTs targeting ARM and IA32 instruction set architecture (ISA), respectively. Using SPEC CPU2006 benchmark suite with reference inputs, our results show that compared to an NET-variant currently used in QEMU, a state-of-the-art retargetable DBT, EEG can achieve a significant performance improvement of up to 72% (27% on average), and to 49% (23% on average) for IA32 and ARM, respectively.
- Low Level Virtual Machine (LLVM). http://llvm.org.Google Scholar
- QEMU. http://qemu.org.Google Scholar
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In PLDI '00, pages 1--12. ACM, 2000. Google Scholar
Digital Library
- L. Baraz, T. Devor, O. Etzion, S. Goldenberg, A. Skaletsky, Y. Wang, and Y. Zemach. Ia-32 execution layer: a two-phase dynamic translator designed to support ia-32 applications on itanium-based systems. In MICRO-36, pages 191--201, Dec. 2003. Google Scholar
Digital Library
- M. Bebenita, F. Brandner, M. Fahndrich, F. Logozzo, W. Schulte, N. Tillmann, and H. Venter. Spur: a trace-based jit compiler for cil. SIGPLAN Not., 45:708--725, October 2010. Google Scholar
Digital Library
- I. Bohm, T. E. von Koch, S. Kyle, B. Franke, and N. Topham. Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator. In Proc. PLDI, 2011. Google Scholar
Digital Library
- D. Bruening. Efficient, Transparent, and Comprehensive Runtime Code Manipulation. Ph.d. thesis, Massachusetts Institute of Technology, Cambridge, MA, Sep 2004. Google Scholar
Digital Library
- J. C. Dehnert, B. K. Grant, J. P. Banning, R. Johnson, T. Kistler, A. Klaiber, and J. Mattson. The transmeta code morphing#8482; software: using speculation, recovery, and adaptive retranslation to address real-life challenges. In CGO '03: Proceedings of the international symposium on Code generation and optimization, pages 15--24, Washington, DC, USA, 2003. IEEE Computer Society. Google Scholar
Digital Library
- K. Ebcioglu, E. Altman, M. Gschwind, and S. Sathaye. Dynamic binary translation and optimization. IEEE Trans. Comput., 50(6):529--548, 2001. Google Scholar
Digital Library
- A. Gal, B. Eich, M. Shaver, D. Anderson, D. Mandelin, M. R. Haghighat, B. Kaplan, G. Hoare, B. Zbarsky, J. Orendorff, J. Ruderman, E. W. Smith, R. Reitmaier, M. Bebenita, M. Chang, and M. Franz. Trace-based just-in-time type specialization for dynamic languages. In PLDI, pages 465--478, 2009. Google Scholar
Digital Library
- H. Hayashizaki, P. Wu, H. Inoue, M. J. Serrano, and T. Nakatani. Improving the performance of trace-based systems by false loop filtering. In ASPLOS, pages 405--418, 2011. Google Scholar
Digital Library
- D. Hiniker, K. Hazelwood, and M. D. Smith. Improving region selection in dynamic optimization systems. In MICRO 38, pages 141--154, Washington, DC, USA, 2005. IEEE Computer Society. Google Scholar
Digital Library
- D.-Y. Hong, C.-C. Hsu, P. Liu, C.-M. Wang, J.-J. Wu, , P.-C. Yew, and W.-C. Hsu. Hqemu: A multi-threaded and retargetable dynamic binary translator on multicores. In CGO '12: Proceedings of the 10th annual IEEE/ACM international symposium on Code generation and optimization, 2012. Google Scholar
Digital Library
- C.-C. Hsu, P. Liu, C.-M. Wang, J.-J. Wu, D.-Y. Hong, P.-C. Yew, and W.-C. Hsu. Lnq: Building high performance dynamic binary translators with existing compiler backends. In ICPP, pages 226--234, 2011. Google Scholar
Digital Library
- W.-M. W. Hwu, S. A. Mahlke, W. Y. Chen, P. P. Chang, N. J. Warter, R. A. Bringmann, R. G. Ouellette, R. E. Hank, T. Kiyohara, G. E. Haab, J. G. Holm, and D. M. Lavery. The superblock: an effective technique for vliw and superscalar compilation. J. Supercomput., 7(1-2):229--248, May 1993. Google Scholar
Digital Library
- H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani. A trace-based java jit compiler retrofitted from a method-based compiler. In CGO'11, pages 246--256, 2011. Google Scholar
Digital Library
- H. Inoue, H. Hayashizaki, P. Wu, and T. Nakatani. Adaptive multi-level compilation in a trace-based java jit compiler. In Proceedings of the ACM international conference on Object oriented programming systems languages and applications, OOPSLA '12, pages 179--194, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- J. Lu, H. Chen, P.-C. Yew, and W. chung Hsu. Design and implementation of a lightweight dynamic optimization system. Journal of Instruction-Level Parallelism, 6:2004, 2004.Google Scholar
- M. M. Michael and M. L. Scott. Simple, fast, and practical non-blocking and blocking concurrent queue algorithms. In 15th Annual ACM Symposium on Principles of Distributed Computing, 1996. Google Scholar
Digital Library
- M. Paleczny, C. Vick, and C. Click. The java hotspot(tm) server compiler. In In USENIX Java Virtual Machine Research and Technology Symposium, pages 1--12, 2001. Google Scholar
Digital Library
- perfmon2. http://perfmon2.sourceforge.net.Google Scholar
- J. E. Smith and R. Nair. Virtual Machines: Versatile Platforms for Systems and Processes. Morgan Kaufman, 2005. Google Scholar
Digital Library
- T. Suganuma, T. Yasue, and T. Nakatani. A region-based compilation technique for a java just-in-time compiler. In PLDI '03, pages 312--323. ACM, 2003. Google Scholar
Digital Library
- V. Sundaresan, D. Maier, P. Ramarao, and M. Stoodley. Experiences with multi-threading and dynamic class loading in a java just-in-time compiler. In CGO '06, pages 87--97, Washington, DC, USA, 2006. IEEE Computer Society. Google Scholar
Digital Library
- C. Wang, S. Hu, H.-S. Kim, S. R. Nair, M. B. Jr., Z. Ying, and Y. Wu. Stardbt: An efficient multi-platform dynamic binary translation system. In ACSAC'07, pages 4--15, 2007. Google Scholar
Digital Library
- P. Wu, H. Hayashizaki, H. Inoue, and T. Nakatani. Reducing trace selection footprint for large-scale java applications without performance loss. In OOPSLA '11, pages 789--804, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- C. Zhao, Y. Wu, J. G. Steffan, and C. Amza. Lengthening traces to improve opportunities for dynamic optimization. In Proceedings of the Workshop on Interaction between Compilers and Computer Architectures, 2008.Google Scholar
Index Terms
Improving dynamic binary optimization through early-exit guided code region formation
Recommendations
Processor-Tracing Guided Region Formation in Dynamic Binary Translation
Region formation is an important step in dynamic binary translation to select hot code regions for translation and optimization. The quality of the formed regions determines the extent of optimizations and thus determines the final execution ...
Improving dynamic binary optimization through early-exit guided code region formation
VEE '13: Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environmentsMost dynamic binary translators (DBT) and optimizers (DBO) target binary traces, i.e. frequently executed paths, as code regions to be translated and optimized. Code region formation is the most important first step in all DBTs and DBOs. The quality of ...
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation
Single instruction multiple data (SIMD) has been adopted for decades because of its superior performance and power efficiency. The SIMD capability (i.e., width, number of registers, and advanced instructions) has diverged rapidly on different SIMD ...







Comments