skip to main content
article

Taming undefined behavior in LLVM

Published:14 June 2017Publication History
Skip Abstract Section

Abstract

A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis. In this paper we study an aspect of IR design that has received little attention: the role of undefined behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel's, and Microsoft's, supports one or more forms of undefined behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM's IR fails to justify some cases of loop unswitching, global value numbering, and other important "textbook" optimizations, causing long-standing bugs. We present solutions to the problems we have identified in LLVM's IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.

References

  1. C. S. Ananian. The static single information form. Master’s thesis, MIT, 1999.Google ScholarGoogle Scholar
  2. Atmel Inc. AVR32 architecture document, Apr. 2011.Google ScholarGoogle Scholar
  3. G. Barthe, D. Demange, and D. Pichardie. Formal verification of an SSA-based middle-end for CompCert. ACM Trans. Program. Lang. Syst., 36(1):4:1–4:35, Mar. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. Braun, S. Buchwald, and A. Zwinkau. Firm—a graph-based intermediate representation. Technical Report 35, Karlsruhe Institute of Technology, 2011.Google ScholarGoogle Scholar
  5. S. Chakraborty and V. Vafeiadis. Formalizing the concurrency semantics of an LLVM fragment. In CGO, 2017. Google ScholarGoogle ScholarCross RefCross Ref
  6. R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst., 13(4):451–490, Oct. 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. W. Dietz, P. Li, J. Regehr, and V. Adve. Understanding integer overflow in C/C++. In ICSE, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. V. D’Silva, M. Payer, and D. Song. The correctness-security gap in compiler optimization. In SPW, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. A. Ertl. What every compiler writer should know about programmers. In KPS, 2015.Google ScholarGoogle Scholar
  10. G. Gange, J. A. Navas, P. Schachte, H. Søndergaard, and P. J. Stuckey. Horn clauses as an intermediate representation for program analysis and transformation. Theory and Practice of Logic Programming, 15(4-5):526–542, July 2015.Google ScholarGoogle ScholarCross RefCross Ref
  11. S. Grebenshchikov, N. P. Lopes, C. Popeea, and A. Rybalchenko. Synthesizing software verifiers from proof rules. In PLDI, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. C. Hathhorn, C. Ellison, and G. Ro¸su. Defining the undefinedness of C. In PLDI, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. H. Jordan, S. Pellegrini, P. Thoman, K. Kofler, and T. Fahringer. INSPIRE: The Insieme parallel intermediate representation. In PACT, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Kang, C.-K. Hur, W. Mansky, D. Garbuzov, S. Zdancewic, and V. Vafeiadis. A formal C memory model supporting integer-pointer casts. In PLDI, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Khaldi, P. Jouvelot, F. Irigoin, C. Ancourt, and B. Chapman. LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications. In Workshop on the LLVM Compiler Infrastructure in HPC, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115, July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. N. P. Lopes, D. Menendez, S. Nagarakatte, and J. Regehr. Provably correct peephole optimizations with Alive. In PLDI, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. K. Memarian, J. Matthiesen, J. Lingard, K. Nienhuis, D. Chisnall, R. N. M. Watson, and P. Sewell. Into the depths of C: Elaborating the de facto standards. In PLDI, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. E. Mullen, D. Zuniga, Z. Tatlock, and D. Grossman. Verified peephole optimizations for CompCert. In PLDI, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. D. Novillo. Memory SSA – a unified approach for sparsely representing memory operations. In Proc. of the GCC Developers’ Summit, 2007.Google ScholarGoogle Scholar
  21. K. J. Ottenstein, R. A. Ballance, and A. B. MacCabe. The program dependence web: A representation supporting control-, data-, and demand-driven interpretation of imperative languages. In PLDI, 1990. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. F. Peschanski. Parallel computing with the pi-calculus. In DAMP, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T. B. Schardl, W. S. Moses, and C. E. Leiserson. Tapir: Embedding fork-join parallelism into LLVM’s intermediate representation. In LCPC, 2016.Google ScholarGoogle Scholar
  24. M. Sperber, R. K. Dybvig, M. Flatt, A. van Straaten, R. Kelsey, W. Clinger, J. Rees, R. B. Findler, and J. Matthews. Revised 6 report on the algorithmic language Scheme, Sept. 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. B. Steensgaard. Sparse functional stores for imperative programs. In ACM SIGPLAN Workshop on Intermediate Representations, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama. Towards optimization-safe systems: analyzing the impact of undefined behavior. In SOSP, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. Formalizing the LLVM intermediate representation for verified program transformations. In POPL, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Taming undefined behavior in LLVM

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 52, Issue 6
          PLDI '17
          June 2017
          708 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3140587
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2017
            708 pages
            ISBN:9781450349888
            DOI:10.1145/3062341

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2017

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!