Abstract
Dynamic binary translators(DBTs) provide powerful platforms for building dynamic program monitoring and adaptation tools. DBTs, however, have high memory demands because they cache translated code and auxiliary code to a software code cache and must also maintain data structures to support the code cache. The high memory demands make it difficult for memory-constrained embedded systems to take advantage of DBT-based tools. Previous research on DBT memory management focused on the translated code and auxiliary code only. However, we found that data structures are comparable to the code cache in size. We show that the translated code size, auxiliary code size and the data structure size interact in a complex manner, depending on the path selection (trace selection and link formation) strategy. Therefore, holistic memory efficiency (comprising translated code, auxiliary code and data structures) cannot be improved by focusing on the code cache only. In this paper, we use path selection for improving holistic memory efficiency which in turn impacts performance in memory-constrained environments. Although there has been previous research on path selection, such research only considered performance in memory-unconstrained environments.
The challenge for holistic memory efficiency is that the path selection strategy results in complex interactions between the memory demand components. Also, individual aspects of path selection and the holistic memory efficiency may impact performance in complex ways. We explore these interactions to motivate path selection targeting holistic memory demand. We enumerate all the aspects involved in a path selection design and evaluate a comprehensive set of approaches for each aspect. Finally, we propose a path selection strategy that reduces memory demands by 20% and at the same time improves performance by 5-20% compared to an industrial-strength DBT.
- J. Baiocchi, B. R. Childers, J. W. Davidson, J. D. Hiser, and J. Misurda. Fragment cache management for dynamic binary translators in embedded systems with scratchpad. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 75--84, Salzburg, Austria, 2007. Google Scholar
Digital Library
- J. A. Baiocchi, B. R. Childers, J. W. Davidson, and J. D. Hiser. Reducing pressure in bounded DBT code caches. In International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pages 109--118, Atlanta, GA, USA, 2008. Google Scholar
Digital Library
- V. Bala, E. Duesterwald, and S. Banerjia. Dynamo: a transparent dynamic optimization system. In Conference on Programming Language Design and Implementation, pages 1--12, Vancouver, British Columbia, Canada, 2000. Google Scholar
Digital Library
- D. Bruening and S. Amarasinghe. Maintaining consistency and bounding capacity of software code caches. In International Symposium on Code Generation and Optimization, pages 74--85, San Jose, California, 2005. Google Scholar
Digital Library
- D. Bruening and E. Duesterwald. Exploring optimal compilation unit shapes for an embedded just-in-time compiler. In In Proceedings of the 2000 ACM Workshop on Feedback-Directed and Dynamic Optimization FDDO-3, pages 13--20, 2000.Google Scholar
- D. Bruening, T. Garnett, and S. Amarasinghe. An infrastructure for adaptive dynamic optimization. In International Symposium on Code Generation and Optimization, pages 265--275, San Francisco, California, 2003. Google Scholar
Digital Library
- D. Bruening, V. Kiriansky, T. Garnett, and S. Banerji. Thread-shared software code caches. In 4th Int'l Symposium on Code Generation and Optimization, pages 28--38, Manhattan, New York, NY, March 2006. Google Scholar
Digital Library
- D. L. Bruening. Efficient, Transparent and Comprehensive Runtime Code Manipulation. PhD thesis, Massachusetts Institute of Technology, Cambridge, MA, September 2004. Google Scholar
Digital Library
- G. Desoli, N. Mateev, E. Duesterwald, P. Faraboschi, and J. A. Fisher. Deli: a new run-time control point. In 35th International Symposium on Microarchitecture, pages 257--268, Istanbul, Turkey, 2002. Google Scholar
Digital Library
- E. Duesterwald and V. Bala. Software profiling for hot path prediction: less is more. In ASPLOS-IX: Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pages 202--211, Cambridge, Massachusetts, United States, 2000. Google Scholar
Digital Library
- A. Guha, K. Hazelwood, and M. L. Soffa. Reducing exit stub memory consumption in code caches. In International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC), pages 87--101, Ghent, Belgium, January 2007. Google Scholar
Digital Library
- A. Guha, K. Hazelwood, and M. L. Soffa. Code lifetime based memory reduction for virtual execution environments. In 6th Workshop on Optimizations for DSP and Embedded Systems (ODES), Boston, MA, March 2008.Google Scholar
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. Mibench : A free, commercially representative embedded benchmark suite. In Workshop on Workload Characterization, pages 3--14, 2001. Google Scholar
Digital Library
- K. Hazelwood and A. Klauser. A dynamic binary instrumentation engine for the ARM architecture. In International Conference on Compilers, Architecture, and Synthesis for Embedded Systems, pages 261--270, Seoul, Korea, 2006. Google Scholar
Digital Library
- K. Hazelwood, G. Lueck, and R. Cohn. Scalable support for multithreaded applications on dynamic binary instrumentation systems. In ISMM '09: Proceedings of the 2009 international symposium on Memory management, pages 20--29, Dublin, Ireland, 2009. Google Scholar
Digital Library
- K. Hazelwood and M. D. Smith. Managing bounded code caches in dynamic binary optimization systems. Transactions on Code Generation and Optimization (TACO), 3(3):263--294, September 2006. Google Scholar
Digital Library
- J. L. Henning. Spec cpu2000: Measuring CPU performance in the new millennium. Computer, 2000. Google Scholar
Digital Library
- D. J. Hiniker, K. Hazelwood, and M. D. Smith. Improving region selection in dynamic optimization systems. In 38th International Symposium on Microarchitecture, pages 141--154, Barcelona, Spain, November 2005. Google Scholar
Digital Library
- J. D. Hiser, D.Williams, A. Filipi, J.W. Davidson, and B. R. Childers. Evaluating fragment construction policies for SDT systems. In Conference on Virtual Execution Environments, pages 122--132, Ottawa, Ontario, Canada, 2006. Google Scholar
Digital Library
- W. Hu, J. Hiser, D.Williams, A. Filipi, J.W. Davidson, D. Evans, J. C.Knight, A. Nguyen-Tuong, and J. Rowanhill. Secure and practical defense against code-injection attacks using software dynamic translation. In Conference on Virtual Execution Environments, pages 2--12, Ottawa, Canada, 2006. Google Scholar
Digital Library
- V. Janapareddi, D. Connors, R. Cohn, and M. D. Smith. Persistent code caching: Exploiting code reuse across executions and applications. In International Symposium on Code Generation and Optimization, pages 74--88, San Jose, California, 2007. Google Scholar
Digital Library
- W. ke Chen, S. Lerner, R. Chaiken, and D. Gilles. Mojo: A dynamic optimization system. In Proceedings of the 4th ACM Workshop on Feedback-Directed and Dynamic Optimization, pages 81--90, 2000.Google Scholar
- V. Kiriansky, D. Bruening, and S. Amarasinghe. Secure execution via program shepherding. In 11th USENIX Security Symposium, pages 191--206, San Francisco, CA, 2002. Google Scholar
Digital Library
- N. Kumar, B. R. Childers, D. Williams, J. W. Davidson, and M. L.Soffa. Compile-time planning for overhead reduction in software dynamic translators. Int. J. Parallel Program., 33(2):103--114, 2005. Google Scholar
Digital Library
- C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V. Janapareddi, and K. Hazelwood. Pin: Building customized program analysis tools with dynamic instrumentation. In Conference on Programming Language Design and Implementation, pages 190--200, Chicago, IL, June 2005. Google Scholar
Digital Library
- R.W. Moore, J. A. Baiocchi, B. R. Childers, J.W. Davidson, and J. D.Hiser. Addressing the challenges of DBT for the ARM architecture. In LCTES '09: Proceedings of the 2009 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 147--156, Dublin, Ireland, 2009. Google Scholar
Digital Library
- N. Nethercote and J. Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In PLDI '07: Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation, pages 89--100, San Diego, California, USA, 2007. Google Scholar
Digital Library
- K. Scott, N. Kumar, S. Velusamy, B. Childers, J. Davidson, and M. L.Soffa. Reconfigurable and retargetable software dynamic translation. In 1st Int'l Symposium on Code Generation and Optimization, pages 36--47, San Francisco, California, March 2003. Google Scholar
Digital Library
Index Terms
DBT path selection for holistic memory efficiency and performance
Recommendations
DBT path selection for holistic memory efficiency and performance
VEE '10: Proceedings of the 6th ACM SIGPLAN/SIGOPS international conference on Virtual execution environmentsDynamic binary translators(DBTs) provide powerful platforms for building dynamic program monitoring and adaptation tools. DBTs, however, have high memory demands because they cache translated code and auxiliary code to a software code cache and must ...
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and designChip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Tiered Memory: An Iso-Power Memory Architecture to Address the Memory Power Wall
Moore's Law improvement in transistor density is driving a rapid increase in the number of cores per processor. DRAM device capacity and energy efficiency are increasing at a slower pace, so the importance of DRAM power is increasing. This problem ...







Comments