Abstract
A central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis. In this paper we study an aspect of IR design that has received little attention: the role of undefined behavior. The IR for every optimizing compiler we have looked at, including GCC, LLVM, Intel's, and Microsoft's, supports one or more forms of undefined behavior (UB), not only to reflect the semantics of UB-heavy programming languages such as C and C++, but also to model inherently unsafe low-level operations such as memory stores and to avoid over-constraining IR semantics to the point that desirable transformations become illegal. The current semantics of LLVM's IR fails to justify some cases of loop unswitching, global value numbering, and other important "textbook" optimizations, causing long-standing bugs. We present solutions to the problems we have identified in LLVM's IR and show that most optimizations currently in LLVM remain sound, and that some desirable new transformations become permissible. Our solutions do not degrade compile time or performance of generated code.
- C. S. Ananian. The static single information form. Master’s thesis, MIT, 1999.Google Scholar
- Atmel Inc. AVR32 architecture document, Apr. 2011.Google Scholar
- G. Barthe, D. Demange, and D. Pichardie. Formal verification of an SSA-based middle-end for CompCert. ACM Trans. Program. Lang. Syst., 36(1):4:1–4:35, Mar. 2014. Google Scholar
Digital Library
- M. Braun, S. Buchwald, and A. Zwinkau. Firm—a graph-based intermediate representation. Technical Report 35, Karlsruhe Institute of Technology, 2011.Google Scholar
- S. Chakraborty and V. Vafeiadis. Formalizing the concurrency semantics of an LLVM fragment. In CGO, 2017. Google Scholar
Cross Ref
- R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck. Efficiently computing static single assignment form and the control dependence graph. ACM Trans. Program. Lang. Syst., 13(4):451–490, Oct. 1991. Google Scholar
Digital Library
- W. Dietz, P. Li, J. Regehr, and V. Adve. Understanding integer overflow in C/C++. In ICSE, 2012. Google Scholar
Digital Library
- V. D’Silva, M. Payer, and D. Song. The correctness-security gap in compiler optimization. In SPW, 2015. Google Scholar
Digital Library
- M. A. Ertl. What every compiler writer should know about programmers. In KPS, 2015.Google Scholar
- G. Gange, J. A. Navas, P. Schachte, H. Søndergaard, and P. J. Stuckey. Horn clauses as an intermediate representation for program analysis and transformation. Theory and Practice of Logic Programming, 15(4-5):526–542, July 2015.Google Scholar
Cross Ref
- S. Grebenshchikov, N. P. Lopes, C. Popeea, and A. Rybalchenko. Synthesizing software verifiers from proof rules. In PLDI, 2012. Google Scholar
Digital Library
- C. Hathhorn, C. Ellison, and G. Ro¸su. Defining the undefinedness of C. In PLDI, 2015. Google Scholar
Digital Library
- H. Jordan, S. Pellegrini, P. Thoman, K. Kofler, and T. Fahringer. INSPIRE: The Insieme parallel intermediate representation. In PACT, 2013. Google Scholar
Digital Library
- J. Kang, C.-K. Hur, W. Mansky, D. Garbuzov, S. Zdancewic, and V. Vafeiadis. A formal C memory model supporting integer-pointer casts. In PLDI, 2015. Google Scholar
Digital Library
- D. Khaldi, P. Jouvelot, F. Irigoin, C. Ancourt, and B. Chapman. LLVM parallel intermediate representation: design and evaluation using OpenSHMEM communications. In Workshop on the LLVM Compiler Infrastructure in HPC, 2015. Google Scholar
Digital Library
- X. Leroy. Formal verification of a realistic compiler. Commun. ACM, 52(7):107–115, July 2009. Google Scholar
Digital Library
- N. P. Lopes, D. Menendez, S. Nagarakatte, and J. Regehr. Provably correct peephole optimizations with Alive. In PLDI, 2015. Google Scholar
Digital Library
- K. Memarian, J. Matthiesen, J. Lingard, K. Nienhuis, D. Chisnall, R. N. M. Watson, and P. Sewell. Into the depths of C: Elaborating the de facto standards. In PLDI, 2016. Google Scholar
Digital Library
- E. Mullen, D. Zuniga, Z. Tatlock, and D. Grossman. Verified peephole optimizations for CompCert. In PLDI, 2016. Google Scholar
Digital Library
- D. Novillo. Memory SSA – a unified approach for sparsely representing memory operations. In Proc. of the GCC Developers’ Summit, 2007.Google Scholar
- K. J. Ottenstein, R. A. Ballance, and A. B. MacCabe. The program dependence web: A representation supporting control-, data-, and demand-driven interpretation of imperative languages. In PLDI, 1990. Google Scholar
Digital Library
- F. Peschanski. Parallel computing with the pi-calculus. In DAMP, 2011. Google Scholar
Digital Library
- T. B. Schardl, W. S. Moses, and C. E. Leiserson. Tapir: Embedding fork-join parallelism into LLVM’s intermediate representation. In LCPC, 2016.Google Scholar
- M. Sperber, R. K. Dybvig, M. Flatt, A. van Straaten, R. Kelsey, W. Clinger, J. Rees, R. B. Findler, and J. Matthews. Revised 6 report on the algorithmic language Scheme, Sept. 2007. Google Scholar
Digital Library
- B. Steensgaard. Sparse functional stores for imperative programs. In ACM SIGPLAN Workshop on Intermediate Representations, 1995. Google Scholar
Digital Library
- X. Wang, N. Zeldovich, M. F. Kaashoek, and A. Solar-Lezama. Towards optimization-safe systems: analyzing the impact of undefined behavior. In SOSP, 2013. Google Scholar
Digital Library
- J. Zhao, S. Nagarakatte, M. M. Martin, and S. Zdancewic. Formalizing the LLVM intermediate representation for verified program transformations. In POPL, 2012. Google Scholar
Digital Library
Index Terms
Taming undefined behavior in LLVM
Recommendations
A Differential Approach to Undefined Behavior Detection
This article studies undefined behavior arising in systems programming languages such as C/C++. Undefined behavior bugs lead to unpredictable and subtle systems behavior, and their effects can be further amplified by compiler optimizations. Undefined ...
SVF: interprocedural static value-flow analysis in LLVM
CC 2016: Proceedings of the 25th International Conference on Compiler ConstructionThis paper presents SVF, a tool that enables scalable and precise interprocedural Static Value-Flow analysis for C programs by leveraging recent advances in sparse analysis. SVF, which is fully implemented in LLVM, allows value-flow construction and ...
Taming undefined behavior in LLVM
PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and ImplementationA central concern for an optimizing compiler is the design of its intermediate representation (IR) for code. The IR should make it easy to perform transformations, and should also afford efficient and precise static analysis. In this paper we study an ...






Comments