Abstract
This paper examines the causes and extent of code size overhead caused by the ARM calling convention in Thumb-2 binaries. We show that binaries generated from C++ source files generally have higher amounts of calling convention overhead, and present a binary file optimizer to eliminate some of that overhead. Calling convention overhead can negatively impact power consumption, flash memory costs, and chip size in embedded or otherwise resource-constrained domains. This is particularly true on platforms using "compressed" instruction sets, such as the 16-bit ARM Thumb and Thumb-2 instruction sets, used in virtually all smartphones and in many other smaller-scale embedded devices. In this paper, we examine the extent of calling convention overhead in practical software, and compare the results of C and C++ programs, and find that C++ programs generally have a higher percentage of calling-convention overhead. Finally, we demonstrate a tool capable of eliminating some of this overhead, particularly in the case of C++ programs, by modifying the calling conventions on a per-procedure basis.
- F. E. Allen and J. Cocke. 1976. A Program Data Flow Analysis Procedure. Commun. ACM 19, 3 (March 1976), 137–. Google Scholar
Digital Library
- Talal Bonny and Jörg Henkel. 2010. Huffman-based Code Compression Techniques for Embedded Processors. ACM Trans. Des. Autom. Electron. Syst. 15, 4, Article 31 (Oct. 2010), 37 pages. Google Scholar
Digital Library
- Gregory J. Chaitin, Marc A. Auslander, Ashok K. Chandra, John Cocke, Martin E. Hopkins, and Peter W. Markstein. 1981. Register Allocation via Coloring. Comput. Lang. 6, 1 (Jan. 1981), 47–57. Google Scholar
Digital Library
- Bjorn De Sutter, Ludo Van Put, Dominique Chanet, Bruno De Bus, and Koen De Bosschere. 2007. Link-time Compaction and Optimization of ARM Executables. ACM Trans. Embed. Comput. Syst. 6, 1, Article 5 (Feb. 2007). Google Scholar
Digital Library
- Saumya K. Debray, William Evans, Robert Muth, and Bjorn De Sutter. 2000. Compiler Techniques for Code Compaction. ACM Trans. Program. Lang. Syst. 22, 2 (March 2000), 378–415. Google Scholar
Digital Library
- Milenko Drinić, Darko Kirovski, and Hoi Vo. 2007. PPMexe: Program Compression. ACM Trans. Program. Lang. Syst. 29, 1, Article 3 (Jan. 2007). Google Scholar
Digital Library
- L. Goudge and S. Segars. 1996. Thumb: reducing the cost of 32-bit RISC performance in portable and consumer applications. In Compcon ’96. ’Technologies for the Information Superhighway’ Digest of Papers . 176–181. Google Scholar
Cross Ref
- Darko Kirovski, Johnson Kin, and William H. Mangione-Smith. 1997. Procedure Based Program Compression. In Proceedings of the 30th Annual ACM/IEEE International Symposium on Microarchitecture (MI-CRO 30) . IEEE Computer Society, Washington, DC, USA, 204–213. http://dl.acm.org/citation.cfm?id=266800.266820 Google Scholar
Cross Ref
- Arvind Krishnaswamy and Rajiv Gupta. 2002. Profile Guided Selection of ARM and Thumb Instructions. SIGPLAN Not. 37, 7 (June 2002), 56–64. Google Scholar
Digital Library
- Arvind Krishnaswamy and Rajiv Gupta. 2005. Dynamic Coalescing for 16-bit Instructions. ACM Trans. Embed. Comput. Syst. 4, 1 (Feb. 2005), 3–37. Google Scholar
Digital Library
- Feipei Lai and Yung-Kuang Chao. 1994. The complementary relationship of interprocedural register allocation and inlining. In Computer Languages, 1994., Proceedings of the 1994 International Conference on . 253–264. Google Scholar
Digital Library
- Feipei Lai and Chia-Jung Hsieh. 1994. Reducing procedure call overhead: optimizing register usage at procedure calls. In Proceedings of 1994 International Conference on Parallel and Distributed Systems . 649– 654. Google Scholar
Cross Ref
- Christian Lindig. 2005. Random Testing of C Calling Conventions. In Proceedings of the Sixth International Symposium on Automated Analysis-driven Debugging (AADEBUG’05) . ACM, New York, NY, USA, 3–12. Google Scholar
Digital Library
- H. Lozano and M. Ito. 2016. Increasing the Code Density of Embedded RISC Applications. In 2016 IEEE 19th International Symposium on RealTime Distributed Computing (ISORC) . 182–189. Google Scholar
Cross Ref
- David W. Wall. 1986. Global Register Allocation at Link Time. SIGPLAN Not. 21, 7 (July 1986), 264–275. Google Scholar
Digital Library
- Vincent M. Weaver and Sally A. McKee. 2009. Code Density Concerns for New Architectures. In Proceedings of the 2009 IEEE International Conference on Computer Design (ICCD’09) . IEEE Press, Piscataway, NJ, USA, 459–464. http://dl.acm.org/citation.cfm?id=1792354.1792441 Google Scholar
Cross Ref
- Andrew Wolfe and Alex Chanin. 1992. Executing Compressed Programs on an Embedded RISC Architecture. SIGMICRO Newsl. 23, 1-2 (Dec. 1992), 81–91. Google Scholar
Digital Library
Index Terms
Reducing calling convention overhead in object-oriented programming on embedded ARM thumb-2 platforms
Recommendations
Reducing calling convention overhead in object-oriented programming on embedded ARM thumb-2 platforms
GPCE 2017: Proceedings of the 16th ACM SIGPLAN International Conference on Generative Programming: Concepts and ExperiencesThis paper examines the causes and extent of code size overhead caused by the ARM calling convention in Thumb-2 binaries. We show that binaries generated from C++ source files generally have higher amounts of calling convention overhead, and present a ...
Reducing procedure call bloat in ARM binaries
SPLASH Companion 2016: Companion Proceedings of the 2016 ACM SIGPLAN International Conference on Systems, Programming, Languages and Applications: Software for HumanityThe use of a standard calling convention throughout a binary can bloat code size and negatively impact power consumption, flash memory costs, and chip size in embedded or otherwise size-critical domains. This is particularly true in ”compressed” ...
Multithreading extension for Thumb ISA and decoder support
EHAC'06: Proceedings of the 5th WSEAS International Conference on Electronics, Hardware, Wireless and Optical CommunicationsDual width instruction set embedded processors such as ARM provide 16-bit instruction set in addition to the 32-bit instructions set for lower energy and memory cost. The combination of hardware multithreading technique with the 16-bit code design can ...







Comments