skip to main content
article

Link-time compaction and optimization of ARM executables

Published:01 February 2007Publication History
Skip Abstract Section

Abstract

The overhead in terms of code size, power consumption, and execution time caused by the use of precompiled libraries and separate compilation is often unacceptable in the embedded world, where real-time constraints, battery life-time, and production costs are of critical importance. In this paper, we present our link-time optimizer for the ARM architecture. We discuss how we can deal with the peculiarities of the ARM architecture related to its visible program counter and how the introduced overhead can to a large extent be eliminated. Our link-time optimizer is evaluated with four tool chains, two proprietary ones from ARM and two open ones based on GNU GCC. When used with proprietary tool chains from ARM Ltd., our link-time optimizer achieved average code size reductions of 16.0 and 18.5%, while the programs have become 12.8 and 12.3% faster, and 10.7 to 10.1% more energy efficient. Finally, we show how the incorporation of link-time optimization in tool chains may influence library interface design.

References

  1. Angiolini, F., Menichelli, F., Ferrero, A., Benini, L., and Olivieri, M. 2004. A post-compiler approach to scratchpad mapping of code. In Proceedings of the 2004 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 259--267.]] Google ScholarGoogle Scholar
  2. ARM Ltd. 1995. An Introduction to Thumb. ARM Ltd.]]Google ScholarGoogle Scholar
  3. ARM Ltd. 2005. ELF for the ARM Architecture. ARM Ltd.]]Google ScholarGoogle Scholar
  4. Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. Computer 35, 2, 59--67.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Beszédes, A., Ferenc, R., Gyimóthy, T., Dolenc, A., and Karsisto, K. 2003. Survey of code-size reduction methods. ACM Comput. Surv. 35, 3, 223--267.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chanet, D., De Sutter, B., De Bus, B., Van Put, L., and De Bosschere, K. 2005. System-wide compaction and specialization of the Linux kernel. In Proceedings of the 2005 ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES). 95--104, ACM Press.]] Google ScholarGoogle Scholar
  7. Chen, G. and Kandemir, M. 2005. Optimizing address code generation for array-intensive DSP applications. In Proc. of the International Symposium on Code Generation and Optimization. 141--152.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cohn, R., Goodwin, D., Lowney, P., and Rubin, N. 1997. Spike: An optimizer for Alpha/NT executables. In Proceedings of the USENIX Windows NT Workshop. 17--24.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Corliss, M., Lewis, E., and Roth, A. 2003. A DISE implementation of dynamic code decompression. In Proceedings of the ACM SIGPLAN 2003 Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'03). 232--243.]] Google ScholarGoogle Scholar
  10. De Bus, B. 2005. Reliable, retargetable and extensible link-time program rewriting. Ph.D. thesis, Ghent University.]]Google ScholarGoogle Scholar
  11. De Bus, B., Kästner, D., Chanet, D., Van Put, L., and De Sutter, B. 2003. Post-pass compaction techniques. Communications of the ACM 46, 8 (8), 41--46.]] Google ScholarGoogle Scholar
  12. De Bus, B., Chanet, D., De Sutter, B., Van Put, L., and De Bosschere, K. 2004. The design of FIT, a flexible instrumentation toolkit. In Proceedings of the 2004 ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering (PASTE'04). 29--34.]] Google ScholarGoogle Scholar
  13. De Sutter, B., De Bus, B., De Bosschere, K., Keyngnaert, P., and Demoen, B. 2000. On the static analysis of indirect control transfers in binaries. In Proc. of the International Conference on Parallel and Distributed Processing Techniques and Applications. 1013--1019.]]Google ScholarGoogle Scholar
  14. De Sutter, B., De Bus, B., De Bosschere, K., and Debray, S. 2001. Combining global code and data compaction. In Proc. of the ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems. 29--38.]] Google ScholarGoogle Scholar
  15. De Sutter, B., De Bus, B., and De Bosschere, K. 2002. Sifting out the mud: low level C++ code reuse. In Proceedings of the 17th ACM SIGPLAN conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). 275--291.]] Google ScholarGoogle Scholar
  16. De Sutter, B., De Bus, B., and De Bosschere, K. 2005b. Bidirectional liveness analysis, or how less than half of the alpha's registers are used. Journal of Systems Architecture, Elsevier, 52(10), 535--548. October 2006.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. De Sutter, B., De Bus, B., and De Bosschere, K. 2005a. Link-time binary rewriting techniques for program compaction. ACM Transactions on Programming Languages and Systems 27, 5 (9), 882--945.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. De Sutter, B., Vandierendonck, H., De Bus, B., and De Bosschere, K. 2003. On the side-effects of code abstraction. In Proceedings of the 2003 ACM SIGPLAN Conference on Languages, Compilers and Tools for Embedded Systems (LCTES'03). 245--253.]] Google ScholarGoogle Scholar
  19. Debray, S., Evans, W., Muth, R., and De Sutter, B. 2000. Compiler techniques for code compaction. ACM Transactions on Programming Languages and Systems 22, 2 (3), 378--415.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ernst, J., Evans, W., Fraser, C., Lucco, S., and Proebsting, T. 1997. Code compression. In Proceedings of the 1997 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'97). 358--365.]] Google ScholarGoogle Scholar
  21. Franz, M. 1997. Adaptive compression of syntax trees and iterative dynamic code optimization: Two basic technologies for mobile-object systems. In Mobile Object Systems: Towards the Programmable Internet, J. Vitek and C. Tschudin, Eds. Number 1222 in LNCS. Springer, New York. 263--276.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Franz, M. and Kistler, T. 1997. Slim binaries. Communications of the ACM 40, 12 (Dec.), 87--94.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Fraser, C. 1999. Automatic inference of models for statistical code compression. In Proceedings of the ACM SIGPLAN 1999 Conference on Programming Language Design and Implementation (PLDI'99). 242--246.]] Google ScholarGoogle Scholar
  24. Furber, S. 1996. ARM System Architecture. Addison Wesley, Reading, MA.]] Google ScholarGoogle Scholar
  25. Haber, G., Klausner, M., Eisenberg, V., Mendelson, B., and Gurevich, M. 2003. Optimization opportunities created by global data reordering. In Proc. of the International Symposium on Code Generation and Optimization. 228--237.]] Google ScholarGoogle Scholar
  26. Kästner, D. 2000. PROPAN: A retargetable system for postpass optimizations and analyses. In Proceedings of the 2000 ACM SIGPLAN Workshop on Languages, Compilers and Tools for Embedded Systems (LCTES'00).]] Google ScholarGoogle Scholar
  27. Kästner, D. and Wilhelm, S. 2002. Generic control-flow reconstruction from assembly code. In Proceedings of the joint conference on Languages, Compilers and Tools for Embedded Systems (LCTES): Software and Compilers for Embedded Systems (SCOPES). 46--55.]] Google ScholarGoogle Scholar
  28. Kemp, T. M., Montoye, R. M., Harper, J. D., Palmer, J. D., and Auerbach, D. J. 1998. A decompression core for PowerPC. IBM J. Research and Development 42, 6 (Nov.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kirovski, D., Kin, J., and Mangione-Smith, W. H. 1997. Procedure based program compression. In Proceedings of the 30th Annual International Symposium on Microarchitecture (MICRO-30).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Lattner, C. and Adve, V. 2004. LLVM: A compilation framework for lifelong program analysis & transformation. In Proc. of the International Symposium on Code Generation and Optimization. 75--86.]] Google ScholarGoogle Scholar
  31. Lekatsas, H., Henkel, J., Chakradhar, S., Jakkula, V., and Sankaradass, M. 2003. Coco: a hardware/software platform for rapid prototyping of code compression technologies. In Proceedings of the 40th conference on Design Automation (DAC). 306--311.]] Google ScholarGoogle Scholar
  32. Levine, J. 2000. Linkers & Loaders. Morgan Kaufmann Publishers, San Mateo, CA.]]Google ScholarGoogle Scholar
  33. Luk, C.-K., Muth, R., Patil, H., Cohn, R., and Lowney, G. 2004. Ispike: A post-link optimizer for the Intel Itanium architecture. In Proc. of the International Symposium on Code Generation and Optimization. 15--26.]] Google ScholarGoogle Scholar
  34. Muth, R. 1999. Alto: A platform for object code modification. Ph.D. thesis, University Of Arizona.]] Google ScholarGoogle Scholar
  35. Muth, R., Debray, S. K., Watterson, S. A., and De Bosschere, K. 2001. alto: a link-time optimizer for the Compaq Alpha. Software---Practice and Experience 31, 1, 67--101.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Pugh, W. 1999. Compressing Java class files. In Proceedings of the 1999 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI'99). 247--258.]] Google ScholarGoogle Scholar
  37. Srivastava, A. and Wall, D. W. 1994. Link-time optimization of address calculation on a 64-bit architecture. In Proc. of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 49--60.]] Google ScholarGoogle Scholar

Index Terms

  1. Link-time compaction and optimization of ARM executables

    Recommendations

    Reviews

    William M. Waite

    An embedded system is constrained by both memory size and battery life. Link-time analysis of an entire program allows one to blur the boundaries between an application and the libraries it uses, deleting unused library code and specializing general calling conventions. The experiment described here illustrates the problems and payoffs of this technique. Diablo, a portable, retargetable framework for link-time code rewriting, underlies the authors' implementation. A program's object files are first linked with the appropriate libraries, and then the resulting program is disassembled and represented as a control flow graph augmented with additional information. That graph is optimized and linearized to produce an assembly language version of the complete program, which is assembled into an optimized executable for the embedded system. The authors provide an overview of their software architecture, lay out the challenges posed by the ARM architecture, show how they address those issues, and provide performance measures to support their arguments. They place special emphasis on optimizations appropriate for the ARM architecture, and on their methods for modeling address computations. This paper is intended for readers familiar with compilation in general and optimization in particular. Problems and techniques are described clearly, and convincing arguments are given for why these are appropriate strategies. Extensive references are provided for those who wish to go deeper into the details of Diablo and the particular optimizations described here. Online Computing Reviews Service

    Access critical reviews of Computing literature here

    Become a reviewer for Computing Reviews.

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!