skip to main content
research-article
Open Access
Artifacts Available
Artifacts Evaluated & Functional

Inference of static semantics for incomplete C programs

Published:27 December 2017Publication History
Skip Abstract Section

Abstract

Incomplete source code naturally emerges in software development: during the design phase, while evolving, testing and analyzing programs. Therefore, the ability to understand partial programs is a valuable asset. However, this problem is still unsolved in the C programming language. Difficulties stem from the fact that parsing C requires, not only syntax, but also semantic information. Furthermore, inferring types so that they respect C's type system is a challenging task. In this paper we present a technique that lets us solve these problems. We provide a unification-based type inference capable of dealing with C intricacies. The ideas we present let us reconstruct partial C programs into complete well-typed ones. Such program reconstruction has several applications: enabling static analysis tools in scenarios where software components may be absent; improving static analysis tools that do not rely on build-specifications; allowing stub-generation and testing tools to work on snippets; and assisting programmers on the extraction of reusable data-structures out of the program parts that use them. Our evaluation is performed on source code from a variety of C libraries such as GNU's Coreutils, GNULib, GNOME's GLib, and GDSL; on implementations from Sedgewick's books; and on snippets from popular open-source projects like CPython, FreeBSD, and Git.

Skip Supplemental Material Section

Supplemental Material

staticsemantics.webm

References

  1. ANSI-Standard. 1989. ANSI X3.159-1989 - The C Programming Language.Google ScholarGoogle Scholar
  2. The GDSL Authors. 2017. The Generic Data Structures Library. http://home.gna.org/gdsl/ .Google ScholarGoogle Scholar
  3. Mark Batty, Alastair F Donaldson, and John Wickerson. 2016. Overhauling SC atomics in C11 and OpenCL. In POPL, Vol. 51. ACM, 634–648.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Walter R. Bischofberger. 1993. Sniff: a pragmatic approach to a C++ programming environment (abstract). OOPS Messenger 4, 2 (1993), 229.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sandrine Blazy and Xavier Leroy. 2009. Mechanized Semantics for the Clight Subset of the C Language. Journal of Automated Reasoning 43, 3 (2009), 263–288. Google ScholarGoogle ScholarCross RefCross Ref
  6. Cristian Cadar, Daniel Dunbar, and Dawson Engler. 2008. KLEE: Unassisted and Automatic Generation of High-coverage Tests for Complex Systems Programs. In OSDI. USENIX, 209–224.Google ScholarGoogle Scholar
  7. Ravi Chugh, Jeffrey A. Meister, Ranjit Jhala, and Sorin Lerner. 2009. Staged information flow for javascript. In PLDI. ACM, 50–62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Pascal Cuoq, Florent Kirchner, Nikolai Kosmatov, Virgile Prevosto, Julien Signoles, and Boris Yakobowski. 2012. Frama-c. In International Conference on Software Engineering and Formal Methods. Springer, 233–247. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Barthélémy Dagenais and Laurie Hendren. 2008. Enabling Static Analysis for Partial Java Programs. In OOPSLA. ACM, 313–328. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Stephen Dolan and Alan Mycroft. 2017. Polymorphism, Subtyping, and Type Inference in MLsub. In POPL. ACM, 1–13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chucky Ellison and Grigore Rosu. 2012. An Executable Formal Semantics of C with Applications. In POPL, Vol. 47. ACM, 533–544. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Evans. 1996. Static Detection of Dynamic Memory Errors. In PLDI, Vol. 31. ACM, 44–53. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Karl-Filip Faxén. 2002. A Static Semantics for Haskell. J. Funct. Program. 12, 5 (2002), 295–357. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. The Free Software Foundation. 2017. Gnulib - The GNU Portability Library. https://www.gnu.org/software/gnulib/ .Google ScholarGoogle Scholar
  15. Patrice Godefroid. 2014. Micro Execution. In ICSE. ACM, 539–549. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In PLDI. ACM, 213–223. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Chris Hathhorn, Chucky Ellison, and Grigore Rosu. 2015. Defining the Undefinedness of C. In PLDI. ACM, 336–345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Sebastian Hunt and David Sands. 2006. On flow-sensitive security types. In POPL. ACM, 79–90. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Runtime Verification Inc. 2017. RV-Match. https://runtimeverification.com/match/ .Google ScholarGoogle Scholar
  20. ISO-Standard. 1990. ISO/IEC 9899:1990 - The C Programming Language.Google ScholarGoogle Scholar
  21. ISO-Standard. 1999. ISO/IEC 9899:1999 - The C Programming Language.Google ScholarGoogle Scholar
  22. ISO-Standard. 2011. ISO/IEC 9899:2011 - The C Programming Language.Google ScholarGoogle Scholar
  23. Stefan Kaes. 1992. Type inference in the presence of overloading, subtyping and recursive types. In ACM SIGPLAN Lisp Pointers. ACM, 193–204. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Gregory Knapen, Bruno Laguë, Michel Dagenais, and Ettore Merlo. 1999. Parsing C++ Despite Missing Declarations. In IWPC. IEEE, 114–125. Google ScholarGoogle ScholarCross RefCross Ref
  25. Rainer Koppler. 1997. A Systematic Approach to Fuzzy Parsing. Softw. Pract. Exper. 27, 6 (1997), 637–649. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Robbert Krebbers. 2015. The C Standard Formalized in Coq. Ph.D. Dissertation. Radboud University Nijmegen.Google ScholarGoogle Scholar
  27. Robbert Krebbers and Freek Wiedijk. 2015. A Typed C11 Semantics for Interactive Theorem Proving. In Proceedings of the 2015 Conference on Certified Programs and Proofs. ACM, 15–27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. David Larochelle, David Evans, et al. 2001. Statically Detecting Likely Buffer Overflow Vulnerabilities.. In USENIX Security Symposium, Vol. 32. Washington DC.Google ScholarGoogle Scholar
  29. Zohar Manna and Richard Waldinger. 1980. A Deductive Approach to Program Synthesis. TOPLAS 2, 1 (1980), 90–121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Alberto Martelli and Ugo Montanari. 1982. An efficient unification algorithm. ACM Transactions on Programming Languages and Systems (TOPLAS) 4, 2 (1982), 258–282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Kayvan Memarian, Justus Matthiesen, James Lingard, Kyndylan Nienhuis, David Chisnall, Robert NM Watson, and Peter Sewell. 2016. Into the Depths of C: Elaborating the De Facto Standards. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, 1–15.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Henrique Nazaré, Izabela Maffra, Willer Santos, Leonardo Barbosa, Laure Gonnord, and Fernando Magno Quintão Pereira. 2014. Validation of memory accesses through symbolic analyses. In OOPSLA. ACM, 791–809. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. 2005. Principles of program analysis. Springer.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Kyndylan Nienhuis, Kayvan Memarian, and Peter Sewell. 2016. An operational semantics for C/C++11 concurrency. In OOPSLA. 111–128.Google ScholarGoogle Scholar
  35. Matt Noonan, Alexey Loginov, and David Cok. 2016. Polymorphic type inference for machine code. In PLDI. ACM, 27–41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Yoann Padioleau. 2009. Parsing C/C++ code without pre-processing. In CC. Springer, 109–125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Nikolaos S Papaspyrou. 1998. A Formal Semantics for the C Programming Language. Ph.D. Dissertation. National Technical University of Athens. Athens (Greece).Google ScholarGoogle Scholar
  38. Nikolaos S Papaspyrou. 2001. Denotational Semantics of ANSI C. Computer Standards & Interfaces 23, 3 (2001), 169–185.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Daniel Perelman, Sumit Gulwani, Thomas Ball, and Dan Grossman. 2012. Type-directed Completion of Partial Expressions. In PLDI. ACM, 275–286. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Simon Peyton Jones et al. 2003. The Haskell 98 Language and Libraries: The Revised Report. Journal of Functional Programming 13, 1 (Jan 2003), 0–255.Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Simon Peyton Jones, Dimitrios Vytiniotis, Stephanie Weirich, and Geoffrey Washburn. 2006. Simple unification-based type inference for GADTs. In ICFP, Vol. 41. ACM, 50–61.Google ScholarGoogle Scholar
  42. François Pottier and Didier Rémy. 2005. The Essence of ML Type Inference. In Advanced Topics in Types and Programming Languages, Benjamin C. Pierce (Ed.). MIT Press, Chapter 10, 389–489.Google ScholarGoogle Scholar
  43. The GNOME Project. 2017a. The GNOME Library - GLib. https://developer.gnome.org/glib .Google ScholarGoogle Scholar
  44. The Qt Project. 2017b. The Qt Creator IDE. https://www.qt.io/ide/.Google ScholarGoogle Scholar
  45. Didier Rémy. 2013. Type Systems for Programming Languages.Google ScholarGoogle Scholar
  46. J. A. Robinson. 1965. A Machine-Oriented Logic Based on the Resolution Principle. J. ACM 12, 1 (1965), 23–41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Raphael Ernani Rodrigues, Fernando Magno Quintao Pereira, and Victor Hugo Sperle Campos. 2013. A Fast and Lowoverhead Technique to Secure Programs Against Integer Overflows. In CGO. IEEE, Washington, DC, USA, 1–11.Google ScholarGoogle Scholar
  48. Robert Sedgewick. 2002. Algorithms in C (3rd Edition). Addison Wesley.Google ScholarGoogle Scholar
  49. Geoffrey S Smith. 1994. Principal type schemes for functional programs with overloading and subtyping. Science of Computer Programming 23, 2-3 (1994), 197–226.Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Leon Sterling. 1994. The Art of Prolog (2nd ed.). MIT Press.Google ScholarGoogle Scholar
  51. Nikolai Tillmann and Jonathan De Halleux. 2008. Pex: White Box Test Generation for .NET. In TAP. Springer, 134–153. Google ScholarGoogle ScholarCross RefCross Ref
  52. Philip Wadler and Robert Bruce Findler. 2009. Well-Typed Programs Can’T Be Blamed. In ESOP. Springer, 1–16. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Nicky Williams, Bruno Marre, Patricia Mouy, and Muriel Roger. 2005. PathCrawler: Automatic Generation of Path Tests by Combining Static and Dynamic Analysis. In EDCC. Springer, 281–292. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Inference of static semantics for incomplete C programs

                  Recommendations

                  Comments

                  Login options

                  Check if you have access through your login credentials or your institution to get full access on this article.

                  Sign in

                  Full Access

                  PDF Format

                  View or Download as a PDF file.

                  PDF

                  eReader

                  View online with eReader.

                  eReader
                  About Cookies On This Site

                  We use cookies to ensure that we give you the best experience on our website.

                  Learn more

                  Got it!