Abstract
Incomplete source code naturally emerges in software development: during the design phase, while evolving, testing and analyzing programs. Therefore, the ability to understand partial programs is a valuable asset. However, this problem is still unsolved in the C programming language. Difficulties stem from the fact that parsing C requires, not only syntax, but also semantic information. Furthermore, inferring types so that they respect C's type system is a challenging task. In this paper we present a technique that lets us solve these problems. We provide a unification-based type inference capable of dealing with C intricacies. The ideas we present let us reconstruct partial C programs into complete well-typed ones. Such program reconstruction has several applications: enabling static analysis tools in scenarios where software components may be absent; improving static analysis tools that do not rely on build-specifications; allowing stub-generation and testing tools to work on snippets; and assisting programmers on the extraction of reusable data-structures out of the program parts that use them. Our evaluation is performed on source code from a variety of C libraries such as GNU's Coreutils, GNULib, GNOME's GLib, and GDSL; on implementations from Sedgewick's books; and on snippets from popular open-source projects like CPython, FreeBSD, and Git.
Supplemental Material
Available for Download
This is the artifact submitted to the POPL Artifact Evaluation Committee. Web location of this document: http://homepages.dcc.ufmg.br/~ltcmelo/POPL18/index.html. This artifact contains: (a) The source-code of psyche-c, (b) compiled binaries for both macOS Sierra and Ubuntu 16.04, (c) a Python script to conveniently run psyche-c, and (d) all the programs used in the evaluation section of our paper, separated by folders.
- ANSI-Standard. 1989. ANSI X3.159-1989 - The C Programming Language.Google Scholar
- The GDSL Authors. 2017. The Generic Data Structures Library. http://home.gna.org/gdsl/ .Google Scholar
- Mark Batty, Alastair F Donaldson, and John Wickerson. 2016. Overhauling SC atomics in C11 and OpenCL. In POPL, Vol. 51. ACM, 634–648.Google Scholar
Digital Library
- Walter R. Bischofberger. 1993. Sniff: a pragmatic approach to a C++ programming environment (abstract). OOPS Messenger 4, 2 (1993), 229.Google Scholar
Digital Library
- Sandrine Blazy and Xavier Leroy. 2009. Mechanized Semantics for the Clight Subset of the C Language. Journal of Automated Reasoning 43, 3 (2009), 263–288. Google Scholar
Cross Ref
- Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In OSDI. USENIX, 209–224.Google Scholar
- Ravi Chugh, Jeffrey A. Meister, Ranjit Jhala, and Sorin Lerner. 2009. Staged information flow for javascript. In PLDI. ACM, 50–62. Google Scholar
Digital Library
- Pascal Cuoq, Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and Boris Yakobowski. 2012. Frama-c. In International Conference on Software Engineering and Formal Methods. Springer, 233–247. Google Scholar
Digital Library
- Barthélémy Dagenais and Laurie Hendren. 2008. Enabling Static Analysis for Partial Java Programs. In OOPSLA. ACM, 313–328. Google Scholar
Digital Library
- Stephen Dolan and Alan Mycroft. 2017. Polymorphism, Subtyping, and Type Inference in MLsub. In POPL. ACM, 1–13. Google Scholar
Digital Library
- Chucky Ellison and Grigore Rosu. 2012. An Executable Formal Semantics of C with Applications. In POPL, Vol. 47. ACM, 533–544. Google Scholar
Digital Library
- David Evans. 1996. Static Detection of Dynamic Memory Errors. In PLDI, Vol. 31. ACM, 44–53. Google Scholar
Digital Library
- Karl-Filip Faxén. 2002. A Static Semantics for Haskell. J. Funct. Program. 12, 5 (2002), 295–357. Google Scholar
Digital Library
- The Free Software Foundation. 2017. Gnulib - The GNU Portability Library. https://www.gnu.org/software/gnulib/ .Google Scholar
- Patrice Godefroid. 2014. Micro Execution. In ICSE. ACM, 539–549. Google Scholar
Digital Library
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In PLDI. ACM, 213–223. Google Scholar
Digital Library
- Chris Hathhorn, Chucky Ellison, and Grigore Rosu. 2015. Defining the Undefinedness of C. In PLDI. ACM, 336–345. Google Scholar
Digital Library
- Sebastian Hunt and David Sands. 2006. On flow-sensitive security types. In POPL. ACM, 79–90. Google Scholar
Digital Library
- Runtime Verification Inc. 2017. RV-Match. https://runtimeverification.com/match/ .Google Scholar
- ISO-Standard. 1990. ISO/IEC 9899:1990 - The C Programming Language.Google Scholar
- ISO-Standard. 1999. ISO/IEC 9899:1999 - The C Programming Language.Google Scholar
- ISO-Standard. 2011. ISO/IEC 9899:2011 - The C Programming Language.Google Scholar
- Stefan Kaes. 1992. Type inference in the presence of overloading, subtyping and recursive types. In ACM SIGPLAN Lisp Pointers. ACM, 193–204. Google Scholar
Digital Library
- Gregory Knapen, Bruno Laguë, Michel Dagenais, and Ettore Merlo. 1999. Parsing C++ Despite Missing Declarations. In IWPC. IEEE, 114–125. Google Scholar
Cross Ref
- Rainer Koppler. 1997. A Systematic Approach to Fuzzy Parsing. Softw. Pract. Exper. 27, 6 (1997), 637–649. Google Scholar
Digital Library
- Robbert Krebbers. 2015. The C Standard Formalized in Coq. Ph.D. Dissertation. Radboud University Nijmegen.Google Scholar
- Robbert Krebbers and Freek Wiedijk. 2015. A Typed C11 Semantics for Interactive Theorem Proving. In Proceedings of the 2015 Conference on Certified Programs and Proofs. ACM, 15–27. Google Scholar
Digital Library
- David Larochelle, David Evans, et al. 2001. Statically Detecting Likely Buffer Overflow Vulnerabilities.. In USENIX Security Symposium, Vol. 32. Washington DC.Google Scholar
- Zohar Manna and Richard Waldinger. 1980. A Deductive Approach to Program Synthesis. TOPLAS 2, 1 (1980), 90–121. Google Scholar
Digital Library
- Alberto Martelli and Ugo Montanari. 1982. An efficient unification algorithm. ACM Transactions on Programming Languages and Systems (TOPLAS) 4, 2 (1982), 258–282. Google Scholar
Digital Library
- Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert NM Watson, and Peter Sewell. 2016. Into the Depths of C: Elaborating the De Facto Standards. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 1–15.Google Scholar
Digital Library
- Henrique Nazaré, Izabela Maffra, Willer Santos, Leonardo Barbosa, Laure Gonnord, and Fernando Magno Quintão Pereira. 2014. Validation of memory accesses through symbolic analyses. In OOPSLA. ACM, 791–809. Google Scholar
Digital Library
- Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. 2005. Principles of program analysis. Springer.Google Scholar
Digital Library
- Kyndylan Nienhuis, Kayvan Memarian, and Peter Sewell. 2016. An operational semantics for C/C++11 concurrency. In OOPSLA. 111–128.Google Scholar
- Matt Noonan, Alexey Loginov, and David Cok. 2016. Polymorphic type inference for machine code. In PLDI. ACM, 27–41. Google Scholar
Digital Library
- Yoann Padioleau. 2009. Parsing C/C++ code without pre-processing. In CC. Springer, 109–125. Google Scholar
Digital Library
- Nikolaos S Papaspyrou. 1998. A Formal Semantics for the C Programming Language. Ph.D. Dissertation. National Technical University of Athens. Athens (Greece).Google Scholar
- Nikolaos S Papaspyrou. 2001. Denotational Semantics of ANSI C. Computer Standards & Interfaces 23, 3 (2001), 169–185.Google Scholar
Digital Library
- Daniel Perelman, Sumit Gulwani, Thomas Ball, and Dan Grossman. 2012. Type-directed Completion of Partial Expressions. In PLDI. ACM, 275–286. Google Scholar
Digital Library
- Simon Peyton Jones et al. 2003. The Haskell 98 Language and Libraries: The Revised Report. Journal of Functional Programming 13, 1 (Jan 2003), 0–255.Google Scholar
Digital Library
- Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, and Geoffrey Washburn. 2006. Simple unification-based type inference for GADTs. In ICFP, Vol. 41. ACM, 50–61.Google Scholar
- François Pottier and Didier Rémy. 2005. The Essence of ML Type Inference. In Advanced Topics in Types and Programming Languages, Benjamin C. Pierce (Ed.). MIT Press, Chapter 10, 389–489.Google Scholar
- The GNOME Project. 2017a. The GNOME Library - GLib. https://developer.gnome.org/glib .Google Scholar
- The Qt Project. 2017b. The Qt Creator IDE. https://www.qt.io/ide/.Google Scholar
- Didier Rémy. 2013. Type Systems for Programming Languages.Google Scholar
- J. A. Robinson. 1965. A Machine-Oriented Logic Based on the Resolution Principle. J. ACM 12, 1 (1965), 23–41. Google Scholar
Digital Library
- Raphael Ernani Rodrigues, Fernando Magno Quintao Pereira, and Victor Hugo Sperle Campos. 2013. A Fast and Lowoverhead Technique to Secure Programs Against Integer Overflows. In CGO. IEEE, Washington, DC, USA, 1–11.Google Scholar
- Robert Sedgewick. 2002. Algorithms in C (3rd Edition). Addison Wesley.Google Scholar
- Geoffrey S Smith. 1994. Principal type schemes for functional programs with overloading and subtyping. Science of Computer Programming 23, 2-3 (1994), 197–226.Google Scholar
Digital Library
- Leon Sterling. 1994. The Art of Prolog (2nd ed.). MIT Press.Google Scholar
- Nikolai Tillmann and Jonathan De Halleux. 2008. Pex: White Box Test Generation for .NET. In TAP. Springer, 134–153. Google Scholar
Cross Ref
- Philip Wadler and Robert Bruce Findler. 2009. Well-Typed Programs Can’T Be Blamed. In ESOP. Springer, 1–16. Google Scholar
Digital Library
- Nicky Williams, Bruno Marre, Patricia Mouy, and Muriel Roger. 2005. PathCrawler: Automatic Generation of Path Tests by Combining Static and Dynamic Analysis. In EDCC. Springer, 281–292. Google Scholar
Digital Library
Index Terms
Inference of static semantics for incomplete C programs
Recommendations
Type Inference for C: Applications to the Static Analysis of Incomplete Programs
Type inference is a feature that is common to a variety of programming languages. While, in the past, it has been prominently present in functional ones (e.g., ML and Haskell), today, many object-oriented/multi-paradigm languages such as C# and C++ ...
Enabling static analysis for partial java programs
Software engineering tools often deal with the source code of programs retrieved from the web or source code repositories. Typically, these tools only have access to a subset of a program's source code (one file or a subset of files) which makes it ...
Enabling static analysis for partial java programs
OOPSLA '08: Proceedings of the 23rd ACM SIGPLAN conference on Object-oriented programming systems languages and applicationsSoftware engineering tools often deal with the source code of programs retrieved from the web or source code repositories. Typically, these tools only have access to a subset of a program's source code (one file or a subset of files) which makes it ...






Comments