Abstract
Type inference is a feature that is common to a variety of programming languages. While, in the past, it has been prominently present in functional ones (e.g., ML and Haskell), today, many object-oriented/multi-paradigm languages such as C# and C++ offer, to a certain extent, such a feature. Nevertheless, type inference still is an unexplored subject in the realm of C. In particular, it remains open whether it is possible to devise a technique that encompasses the idiosyncrasies of this language. The first difficulty encountered when tackling this problem is that parsing C requires, not only syntactic, but also semantic information. Yet, greater challenges emerge due to C’s intricate type system. In this work, we present a unification-based framework that lets us infer the missing struct, union, enum, and typedef declarations in a program.
As an application of our technique, we investigate the reconstruction of partial programs. Incomplete source code naturally appears in software development: during design and while evolving, testing, and analyzing programs; therefore, understanding it is a valuable asset. With a reconstructed well-typed program, one can: (i) enable static analysis tools in scenarios where components are absent; (ii) improve precision of “zero setup” static analysis tools; (iii) apply stub generators, symbolic executors, and testing tools on code snippets; and (iv) provide engineers with an assortment of compilable benchmarks for performance and correctness validation. We evaluate our technique on code from a variety of C libraries, including GNU’s Coreutils and on snippets from popular projects such as CPython, FreeBSD, and Git.
Supplemental Material
Available for Download
Supplemental movie, appendix, image and software files for, Type Inference for C: Applications to the Static Analysis of Incomplete Programs
- ANSI-Standard. 1989. ANSI X3.159-1989—The C Programming Language. American National Standards Institute (ANSI), Washington, D.C., USA.Google Scholar
- Amir Hossein Ashouri, Giovanni Mariani, Gianluca Palermo, Eunjung Park, John Cavazos, and Cristina Silvano. 2016. COBAYN: Compiler autotuning framework using Bayesian networks. Trans. Archit. Code Optim. 13, 2 (2016), 21:1--21:25. DOI:https://doi.org/10.1145/2928270 Google Scholar
Digital Library
- Sorav Bansal and Alex Aiken. 2008. Binary translation using peephole superoptimizers. In OSDI. USENIX Association, Berkeley, CA, 177--192. Google Scholar
Digital Library
- Gergö Barany. 2017. Liveness-driven random program generation. In LOPSTR. Springer, Germany, 112--127. DOI:https://doi.org/10.1007/978-3-319-94460-9_7Google Scholar
- Mark Batty, Alastair F. Donaldson, and John Wickerson. 2016. Overhauling SC atomics in C11 and OpenCL. In POPL, Vol. 51. ACM, 634--648. Google Scholar
Digital Library
- Michael Bayne, Richard Cook, and Michael D. Ernst. 2011. Always-available static and dynamic feedback. In ICSE. ACM, New York, NY, 521--530. DOI:https://doi.org/10.1145/1985793.1985864 Google Scholar
Digital Library
- Walter R. Bischofberger. 1993. Sniff: A pragmatic approach to a C++ programming environment (abstract). OOPS Messenger 4, 2 (1993), 229. Google Scholar
Digital Library
- Sandrine Blazy and Xavier Leroy. 2009. Mechanized semantics for the Clight subset of the C language. J. Autom. Reas. 43, 3 (2009), 263--288.Google Scholar
Cross Ref
- Alan H. Borning and Daniel H. H. Ingalls. 1982. A type declaration and inference system for smalltalk. In POPL. ACM, New York, NY, 133--141. DOI:https://doi.org/10.1145/582153.582168 Google Scholar
Digital Library
- Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H. S. Torr, and Pushmeet Kohli. 2017. Learning to superoptimize programs. In ICLR. OpenReview.Google Scholar
- Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI. USENIX, 209--224. Google Scholar
Digital Library
- Luca Cardelli. 1984. A semantics of multiple inheritance. In Semantics of Data Types. Springer, 51--67. Google Scholar
Digital Library
- Luca Cardelli and Peter Wegner. 1985. On understanding types, data abstraction, and polymorphism. ACM Comput. Surv. 17, 4 (1985), 471--523. Google Scholar
Digital Library
- Satish Chandra and Thomas Reps. 1999. Physical type checking for C. In ACM SIGSOFT Softw. Eng. Notes, Vol. 24. ACM, 66--75. Google Scholar
Digital Library
- Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing benchmarks for predictive modeling. In CGO. IEEE, Piscataway, NJ, 86--99. Google Scholar
Digital Library
- Pascal Cuoq, Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and Boris Yakobowski. 2012. Frama-c. In SEFM. Springer, 233--247. Google Scholar
Digital Library
- Barthélémy Dagenais and Laurie Hendren. 2008. Enabling static analysis for partial Java programs. In OOPSLA. ACM, 313--328. Google Scholar
Digital Library
- Luis Damas and Robin Milner. 1982. Principal type-schemes for functional programs. In POPL. ACM, 207--212. Google Scholar
Digital Library
- Tiago Cariolano de Souza Xavier and Anderson Faustino da Silva. 2018. Exploration of compiler optimization sequences using a hybrid approach. Comput. Inform. 37, 1 (2018), 165--185.Google Scholar
Cross Ref
- Stephen Dolan and Alan Mycroft. 2017. Polymorphism, subtyping, and type inference in MLsub. In POPL. ACM, 1--13. Google Scholar
Digital Library
- Catherine Dubois and Valerie Menissier-Morain. 1999. Certification of a type inference tool for ML: Damas--Milner within Coq. J. Auto. Reas. 23, 3 (1999), 319--346. Google Scholar
Digital Library
- Chucky Ellison and Grigore Rosu. 2012. An executable formal semantics of C with applications. In POPL, Vol. 47. ACM, 533--544. Google Scholar
Digital Library
- David Evans. 1996. Static detection of dynamic memory errors. In PLDI, Vol. 31. ACM, 44--53. Google Scholar
Digital Library
- Anderson Faustino, Bruno Kind, José Wesley Magalhães, Jerônimo Rocha, Breno Guimarães, and Fernando Magno Quintão Pereira. 2020. AnghaBench: A Synthetic Collection of Benchmarks Mined from Open-Source Repositories. Technical Report 01-2020. Universidade Federal de Minas Gerais.Google Scholar
- Karl-Filip Faxén. 2002. A static semantics for Haskell. J. Funct. Prog. 12, 5 (2002), 295--357. Google Scholar
Digital Library
- João Fabrício Filho, Luis Gustavo Araujo Rodriguez, and Anderson Faustino da Silva. 2018. Yet another intelligent code-generating system: A flexible and low-cost solution. J. Comput. Sci. Technol. 33, 5 (2018), 940--965. DOI:https://doi.org/10.1007/s11390-018-1867-7Google Scholar
Cross Ref
- Jeffrey S. Foster, Manuel Fähndrich, and Alexander Aiken. 1999. A theory of type qualifiers. ACM SIGPLAN Not. 34, 5 (1999), 192--203. Google Scholar
Digital Library
- You-Chin Fuh and Prateek Mishra. 1988. Type inference with subtypes. In ESOP. Springer, 94--114. Google Scholar
Digital Library
- Paul Gazzillo and Robert Grimm. 2012. SuperC: Parsing all of C by taming the preprocessor. ACM SIGPLAN Not. 47, 6 (2012), 323--334. Google Scholar
Digital Library
- Patrice Godefroid. 2014. Micro execution. In ICSE. ACM, 539--549. Google Scholar
Digital Library
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In PLDI. ACM, 213--223. Google Scholar
Digital Library
- Dan Grossman. 2006. Quantified types in an imperative language. ACM Trans. Prog. Lang. Syst. 28, 3 (May 2006), 429--475. DOI:https://doi.org/10.1145/1133651.1133653 Google Scholar
Digital Library
- Chris Hathhorn, Chucky Ellison, and Grigore Rosu. 2015. Defining the undefinedness of C. In PLDI. ACM, 336--345. Google Scholar
Digital Library
- Roger Hindley. 1969. The principal type-scheme of an object in combinatory logic. Trans. Amer. Math. Soc. 146 (1969), 29--60.Google Scholar
- ISO-Standard. 1990. ISO/IEC 9899:1990 - The C Programming Language. International Organization for Standardization (ISO), Geneva, Switzerland.Google Scholar
- ISO-Standard. 1999. ISO/IEC 9899:1999 - The C Programming Language. International Organization for Standardization (ISO), Geneva, Switzerland.Google Scholar
- ISO-Standard. 2011. ISO/IEC 9899:2011 - The C Programming Language. International Organization for Standardization (ISO), Geneva, Switzerland.Google Scholar
- Trevor Jim, J. Greg Morrisett, Dan Grossman, Michael W. Hicks, James Cheney, and Yanling Wang. 2002. Cyclone: A safe dialect of C. In ATEC. USENIX Association, 275--288. Google Scholar
Digital Library
- Simon Peyton Jones, Geoffrey Washburn, and Stephanie Weirich. 2004. Wobbly Types: Type Inference for Generalised Algebraic Data Types. Technical Report. Technical Report MS-CIS-05-26, University of Pennsylvania.Google Scholar
- Stefan Kaes. 1992. Type inference in the presence of overloading, subtyping and recursive types. In ACM SIGPLAN Lisp Pointers. ACM, 193--204. Google Scholar
Digital Library
- Marc A. Kaplan and Jeffrey D. Ullman. 1978. A general scheme for the automatic inference of variable types. In POPL. ACM, 60--75. Google Scholar
Digital Library
- Gregory Knapen, Bruno Laguë, Michel Dagenais, and Ettore Merlo. 1999. Parsing C++ despite missing declarations. In IWPC. IEEE, 114--125. Google Scholar
Digital Library
- Rainer Koppler. 1997. A systematic approach to fuzzy parsing. Softw. Pract. Exper. 27, 6 (1997), 637--649. Google Scholar
Cross Ref
- Robbert Krebbers. 2015. The C Standard Formalized in Coq. Ph.D. Dissertation. Radboud University, Nijmegen, NL.Google Scholar
- Robbert Krebbers and Freek Wiedijk. 2015. A Typed C11 Semantics for Interactive Theorem Proving. In CPP. ACM, 15--27. Google Scholar
Digital Library
- David Larochelle, David Evans, et al. 2001. Statically detecting likely buffer overflow vulnerabilities. In USENIX Security, Vol. 32. Washington DC. Google Scholar
Digital Library
- Alberto Martelli and Ugo Montanari. 1982. An efficient unification algorithm. ACM Trans. Prog. Lang. Syst. 4, 2 (1982), 258--282. Google Scholar
Digital Library
- Scott McPeak and George C. Necula. 2004. Elkhound: A fast, practical GLR parser generator. In CC. Springer, 73--88.Google Scholar
- Leandro T. C. Melo. 2020. Supplement to: Type Inference for C: Applications to the Static Analysis of Incomplete Programs. Retrieved from http://ltcmelo.com/resources/TypeInferenceForC_Supplement.pdf.Google Scholar
- Leandro T. C. Melo, Rodrigo G. Ribeiro, Marcus R. de Araujo, and Fernando Magno Quintao Pereira. 2017. Inference of static semantics for incomplete C programs. Proc. ACM Prog. Lang. 2, POPL (Dec. 2017). DOI:https://doi.org/10.1145/3158117 Google Scholar
Digital Library
- Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert N. M. Watson, and Peter Sewell. 2016. Into the depths of C: Elaborating the De Facto standards. In PLDI. ACM, 1--15. Google Scholar
Digital Library
- Robin Milner. 1978. A theory of type polymorphism in programming. J. Comput. Syst. Sci. 17, 3 (1978), 348--375.Google Scholar
Cross Ref
- John C. Mitchell. 1991. Type inference with simple subtypes. J. Funct. Prog. 1, 3 (1991), 245--285.Google Scholar
Cross Ref
- Leon Moonen. 2001. Generating robust parsers using island grammars. In WCRE. IEEE, 13--22. Google Scholar
Digital Library
- Alan Mycroft. 1999. Type-based decompilation (or program reconstruction via type reconstruction). In ESOP. Springer, 208--223. Google Scholar
Digital Library
- Mircea Namolaru, Albert Cohen, Grigori Fursin, Ayal Zaks, and Ari Freund. 2010. Practical aggregation of semantical program properties for machine learning based optimization. In CASES. ACM, New York, NY, 197--206. DOI:https://doi.org/10.1145/1878921.1878951 Google Scholar
Digital Library
- Wolfgang Naraschewski and Tobias Nipkow. 1999. Type inference verified: Algorithm W in Isabelle/HOL. J. Autom. Reas. 23, 3 (1999), 299--318. Google Scholar
Digital Library
- Henrique Nazaré, Izabela Maffra, Willer Santos, Leonardo Barbosa, Laure Gonnord, and Fernando Magno Quintao Pereira. 2014. Validation of memory accesses through symbolic analyses. In OOPSLA. ACM, 791--809. Google Scholar
Digital Library
- George C. Necula, Scott McPeak, Shree P. Rahul, and Westley Weimer. 2002b. CIL: Intermediate language and tools for analysis and transformation of C programs. In CC. Springer, 213--228. Google Scholar
Digital Library
- George C. Necula, Scott McPeak, and Westley Weimer. 2002a. CCured: Type-safe retrofitting of legacy code. In ACM SIGPLAN Not. 37 (2002). ACM, 128--139. Google Scholar
Digital Library
- Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. 2005. Principles of Program Analysis. Springer. Google Scholar
Digital Library
- Kyndylan Nienhuis, Kayvan Memarian, and Peter Sewell. 2016. An operational semantics for C/C++11 concurrency. In OOPSLA. 111--128. Google Scholar
Digital Library
- Matt Noonan, Alexey Loginov, and David Cok. 2016. Polymorphic type inference for machine code. In PLDI. ACM, 27--41. Google Scholar
Digital Library
- Marcelo Novaes, Vinícius Petrucci, Abdoulaye Gamatié, and Fernando Magno Quintão Pereira. 2019. Compiler-assisted adaptive program scheduling in Big.LITTLE systems: Poster. In PPoPP. ACM, New York, NY, 429--430. Google Scholar
Digital Library
- Martin Odersky, Martin Sulzmann, and Martin Wehr. 1999. Type inference with constrained types. Theor. Pract. Obj. Syst. 5, 1 (1999), 35--55. Google Scholar
Cross Ref
- Martin Odersky, Christoph Zenger, and Matthias Zenger. 2001. Colored local type inference. ACM SIGPLAN Not. 36, 3 (2001), 41--53. Google Scholar
Digital Library
- Yoann Padioleau. 2009. Parsing C/C++ code without pre-processing. In CC. Springer, 109--125. Google Scholar
Digital Library
- Jens Palsberg and Michael I. Schwartzbach. 1991. Obj.-orien. Type Inference. 26 (1991). ACM.Google Scholar
- Nikolaos S. Papaspyrou. 1998. A Formal Semantics for the C Programming Language. Ph.D. Dissertation. National Technical University of Athens. Athens, Greece.Google Scholar
- Nikolaos S. Papaspyrou. 2001. Denotational semantics of ANSI C. Comput. Stand. Interf. 23, 3 (2001), 169--185. Google Scholar
Digital Library
- Simon Peyton Jones et al. 2003. The Haskell 98 language and libraries: The revised report. J. Funct. Prog. 13, 1 (Jan. 2003), 0--255.Google Scholar
- Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, and Geoffrey Washburn. 2006. Simple unification-based type inference for GADTs. In ICFP, Vol. 41. ACM, 50--61. Google Scholar
Digital Library
- Phitchaya Mangpo Phothilimthana, Aditya Thakur, Rastislav Bodik, and Dinakar Dhurjati. 2016. Scaling Up superoptimization. In ASPLOS. ACM, New York, NY, 297--310. DOI:https://doi.org/10.1145/2872362.2872387 Google Scholar
Digital Library
- Benjamin C. Pierce. 2004. Types and Programming Languages (1st ed.). The MIT Press. Google Scholar
Digital Library
- Benjamin C. Pierce and David N. Turner. 2000. Local type inference. ACM Trans. Prog. Lang. Syst. 22, 1 (2000), 1--44. Google Scholar
Digital Library
- Gabriel Poesia, Breno Campos Ferreira Guimarães, Fabricio Ferracioli, and Fernando Magno Quintão Pereira. 2017. Static placement of computation on heterogeneous devices. In POPL 1, OOPSLA (2017), 50:1--50:28. Google Scholar
Digital Library
- François Pottier. 1996. Simplifying subtyping constraints. In ACM SIGPLAN Not. 31 (1996). ACM, 122--133. Google Scholar
Digital Library
- François Pottier. 1998. A framework for type inference with subtyping. In ACM SIGPLAN Not. 34 (1998). ACM, 228--238. Google Scholar
Digital Library
- François Pottier and Yann Régis-Gianas. 2006. Stratified type inference for generalized algebraic data types. In POPL. ACM, New York, NY, 232--244. DOI:https://doi.org/10.1145/1111037.1111058 Google Scholar
Digital Library
- François Pottier and Didier Rémy. 2005. The essence of ML type inference. In Advanced Topics in Types and Programming Languages, Benjamin C. Pierce (Ed.). The MIT, 389--489.Google Scholar
- Jakob Rehof. 1998. The Complexity of Simple Subtyping Systems. Ph.D. Dissertation. University of Copenhagen, Denmark.Google Scholar
- Didier Rémy. 1992. Extending ML Type System with a Sorted Equational Theory. Research Report 1766. Institut National de Recherche en Informatique et Automatisme, Rocquencourt, BP 105, 78 153 Le Chesnay Cedex, France.Google Scholar
- Didier Rémy. 2017. Type Systems for Programming Languages. Retrieved from http://pauillac.inria.fr/ remy/mpri/cours.pdf.Google Scholar
- J. A. Robinson. 1965. A machine-oriented logic based on the resolution principle. J. ACM 12, 1 (1965), 23--41. Google Scholar
Digital Library
- Raphael Ernani Rodrigues, Victor Hugo Sperle Campos, and Fernando Magno Quintao Pereira. 2013. A fast and low overhead technique to secure programs against integer overflows. In CGO. ACM, 1--11. Google Scholar
Digital Library
- Runtime Verification Inc. 2017. RV-Match. Retrieved from https://runtimeverification.com/match/.Google Scholar
- Eric Schkufza, Rahul Sharma, and Alex Aiken. 2016. Stochastic program optimization. Commun. ACM 59, 2 (2016), 114--122. DOI:https://doi.org/10.1145/2863701 Google Scholar
Digital Library
- Douglas Simon, John Cavazos, Christian Wimmer, and Sameer Kulkarni. 2013. Automatic construction of inlining heuristics using machine learning. In CGO. IEEE Computer Society, Washington, DC, 1--12. DOI:https://doi.org/10.1109/CGO.2013.6495004 Google Scholar
Digital Library
- Vincent Simonet. 2003. Type inference with structural subtyping: A faithful formalization of an efficient constraint solver. In APLAS. Springer, 283--302.Google Scholar
- Geoffrey Smith and Dennis Volpano. 1996. Towards an ML-style polymorphic type system for C. In ESOP. Springer, 341--355. Google Scholar
Digital Library
- Geoffrey S. Smith. 1994. Principal type schemes for functional programs with overloading and subtyping. Sci. Comput. Prog. 23, 2--3 (1994), 197--226. Google Scholar
Digital Library
- Jyothi Krishna Viswakaran Sreelatha, Shankar Balachandran, and Rupesh Nasre. 2018. CHOAMP: Cost based hardware optimization for asymmetric multicore processors. Trans. Multi-Scale Comput. Syst. 4, 2 (2018), 163--176.Google Scholar
Cross Ref
- Bjarne Steensgaard. 1996. Points-to analysis in almost linear time. In POPL. 32--41. Google Scholar
Digital Library
- Leon Sterling. 1994. The Art of Prolog (2nd ed.). The MIT Press.Google Scholar
- Zhendong Su, Alexander Aiken, Joachim Niehren, Tim Priesnitz, and Ralf Treinen. 2002. The First-order Theory of Subtyping Constraints. Vol. 37. ACM.Google Scholar
- The Qt Project. 2017. The Qt Creator IDE. Retrieved from https://www.qt.io/ide/.Google Scholar
- Nikolai Tillmann and Jonathan De Halleux. 2008. Pex: White box test generation for .NET. In TAP. Springer, 134--153. Google Scholar
Digital Library
- Jerzy Tiuryn and Mitchell Wand. 1993. Type reconstruction with recursive types and atomic subtyping. In CAAP. Springer, 686--701. Google Scholar
Digital Library
- Dimitrios Vytiniotis, Simon Peyton Jones, and José Pedro Magalhães. 2012. Equality proofs and deferred type errors: A compiler pearl. In ICFP. Association for Computing Machinery, New York, NY, 341--352. DOI:https://doi.org/10.1145/2364527.2364554 Google Scholar
Digital Library
- Philip Wadler and Robert Bruce Findler. 2009. Well-typed programs can’t be blamed. In ESOP. Springer, 1--16. Google Scholar
Digital Library
- Mitchell Wand. 1987a. Complete type inference for simple objects. In LICS.Google Scholar
- Mitchell Wand. 1987b. A simple algorithm and proof for type inference. Fundam. Informa. 10, 2 (1987), 115--121.Google Scholar
- Mitchell Wand. 1988. Corrigendum: Complete type inference for simple objects. In LCS.Google Scholar
- Nicky Williams, Bruno Marre, Patricia Mouy, and Muriel Roger. 2005. PathCrawler: Automatic generation of path tests by combining static and dynamic analysis. In EDCC. Springer, 281--292. Google Scholar
Digital Library
- Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In PLDI. ACM, New York, NY, 283--294. DOI:https://doi.org/10.1145/1993498.1993532 Google Scholar
Digital Library
- Peng Zhao and José Nelson Amaral. 2003. To inline or not to inline? Enhanced inlining decisions. In LCPC. Springer, Germany, 405--419.Google Scholar
Index Terms
Type Inference for C: Applications to the Static Analysis of Incomplete Programs
Recommendations
Inference of static semantics for incomplete C programs
Incomplete source code naturally emerges in software development: during the design phase, while evolving, testing and analyzing programs. Therefore, the ability to understand partial programs is a valuable asset. However, this problem is still unsolved ...
Kinded type inference for parameteric overloading
AbstractParameteric overloading refers to the combination of parameteric polymorphism and overloading of polymorphic operators. The formal basis for parametric overloading, proposed by Kaes and extended by Wadler and Blott, is based on type predicates. In ...
Polymorphic type inference and abstract data types
Many statically typed programming languages provide an abstract data type construct, such as the module in Modula-2. However, in most of these languages, implementations of abstract data types are not first-class values. Thus, they cannot be assigned to ...






Comments