skip to main content
research-article

Architectural Support for Dynamic Linking

Published:14 March 2015Publication History
Skip Abstract Section

Abstract

All software in use today relies on libraries, including standard libraries (e.g., C, C++) and application-specific libraries (e.g., libxml, libpng). Most libraries are loaded in memory and dynamically linked when programs are launched, resolving symbol addresses across the applications and libraries. Dynamic linking has many benefits: It allows code to be reused between applications, conserves memory (because only one copy of a library is kept in memory for all the applications that share it), and allows libraries to be patched and updated without modifying programs, among numerous other benefits. However, these benefits come at the cost of performance. For every call made to a function in a dynamically linked library, a trampoline is used to read the function address from a lookup table and branch to the function, incurring memory load and branch operations. Static linking avoids this performance penalty, but loses all the benefits of dynamic linking. Given its myriad benefits, dynamic linking is the predominant choice today, despite the performance cost. In this work, we propose a speculative hardware mechanism to optimize dynamic linking by avoiding executing the trampolines for library function calls, providing the benefits of dynamic linking with the performance of static linking. Speculatively skipping the memory load and branch operations of the library call trampolines improves performance by reducing the number of executed instructions and gains additional performance by reducing pressure on the instruction and data caches, TLBs, and branch predictors. Because the indirect targets of library call trampolines do not change during program execution, our speculative mechanism never misspeculates in practice. We evaluate our technique on real hardware with production software and observe up to 4% speedup using only 1.5KB of on-chip storage.

References

  1. Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, July 1970. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Willem De Groef, Nick Nikiforakis, Yves Younan, and Frank Piessens. Jitsec: Just-in-time security for code injection attacks. In Benelux Workshop on Information and System Security (WISSEC), 2010.Google ScholarGoogle Scholar
  3. Djellel Eddine Difallah, Andrew Pavlo, Carlo Curino, and Philippe Cudre-Mauroux. Oltp-bench: An extensible testbed for benchmarking relational databases. PVLDB, 7(4):277--288, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Michael Ferdman, Almutaz Adileh, Onur Kocberber, Stavros Volos, Mohammad Alisafaee, Djordje Jevdjic, Cansu Kaynak, Adrian Daniel Popescu, Anastasia Ailamaki, and Babak Falsafi. Clearing the Clouds: A Study of Emerging Scale-out Workloads on Modern Hardware. In 17th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2012. recognized as Best Paper by the program committee and recognized as Top Pick of 2013 by IEEE Micro. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Firefox. https://www.mozilla.org/en-US/firefox/new/.Google ScholarGoogle Scholar
  6. Brad Fitzpatrick. Distributed caching with memcached. Linux J., 2004(124):5--, August 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Michael Franz. Dynamic linking of software components. Computer, 30(3):74--81, March 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jian Huang and David Lilja. Exploiting basic block value locality with block reuse. In Proceedings of the 5th International Symposium on High Performance Computer Architecture, HPCA '99, pages 106--, Washington, DC, USA, 1999. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Intel Corporation. Intel 64 and IA-32 Architectures Optimization Reference Manual}. Intel, March 2009.Google ScholarGoogle Scholar
  10. Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual. Intel, December 2009.Google ScholarGoogle Scholar
  11. Intel Xeon Processor E5450 (12M Cache, 3.00 GHz, 1333 MHz FSB). http://ark.intel.com/products/33083/Intel-Xeon-Processor-E5450--12M-Cache-3_00-GHz-1333-MHz-FSB.Google ScholarGoogle Scholar
  12. Daniel A. Jimenez. Reconsidering complex branch predictors. In Proceedings of the 9th International Symposium on High-Performance Computer Architecture, HPCA '03, pages 43--, Washington, DC, USA, 2003. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hyesoon Kim, Jos{\'e} A. Joao, Onur Mutlu, Chang Joo Lee, Yale N. Patt, and Robert Cohn. Vpc prediction: Reducing the cost of indirect branches via hardware-based dynamic devirtualization. In Proceedings of the 34th Annual International Symposium on Computer Architecture, ISCA '07, pages 424--435, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thomas Kistler and Michael Franz. Continuous program optimization: A case study. ACM Trans. Program. Lang. Syst., 25(4):500--548, July 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Pierre Michaud, Andr{\'e} Seznec, and Richard Uhlig. Trading conflict and capacity aliasing in conditional branch predictors. In Proceedings of the 24th Annual International Symposium on Computer Architecture, ISCA '97, pages 292--303, New York, NY, USA, 1997. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. S. Owicki and A. Agarwal. Evaluating the performance of software cache coherence. In Proceedings of the Third International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS III, pages 230--242, New York, NY, USA, 1989. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David A. Padua and Michael J. Wolfe. Advanced compiler optimizations for supercomputers. Commun. ACM, 29(12):1184--1201, December 1986. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Peacekeeper - The universal Browser Test. http://peacekeeper.futuremark.com/.Google ScholarGoogle Scholar
  19. Donald E. Porter, Silas Boyd-Wickizer, Jon Howell, Reuben Olinsky, and Galen C. Hunt. Rethinking the library os from the top down. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 291--304, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Parthasarathy Ranganathan, Kourosh Gharachorloo, Sarita V Adve, and Luiz Andre Barroso. Performance of database workloads on shared-memory systems with out-of-order processors. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, October 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan Boneh. On the effectiveness of address-space randomization. In Proceedings of the 11th ACM Conference on Computer and Communications Security, CCS '04, pages 298--307, New York, NY, USA, 2004. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Avinash Sodani and Gurindar S. Sohi. Dynamic instruction reuse. In Proceedings of the 24th Annual International Symposium on Computer Architecture, ISCA '97, pages 194--205, New York, NY, USA, 1997. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. SPEC - Standard Performance Evaluation Corporation. http://www.spec.org/.Google ScholarGoogle Scholar
  24. Tse-Yu Yeh and Yale N. Patt. Two-level adaptive training branch prediction. In Proceedings of the 24th Annual International Symposium on Microarchitecture, MICRO 24, pages 51--61, New York, NY, USA, 1991. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Architectural Support for Dynamic Linking

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 4
      ASPLOS '15
      April 2015
      676 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2775054
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '15: Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2015
        720 pages
        ISBN:9781450328357
        DOI:10.1145/2694344

      Copyright © 2015 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 March 2015

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!