skip to main content
research-article
Public Access

How to Build Static Checking Systems Using Orders of Magnitude Less Code

Published:25 March 2016Publication History
Skip Abstract Section

Abstract

Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and hard to retarget to new languages, thereby dramatically limiting their scope. This paper reduces checking system complexity by addressing a fundamental assumption, the assumption that checkers must depend on a full-blown language specification and compiler front end. Instead, our program checkers are based on drastically incomplete language grammars ("micro-grammars") that describe only portions of a language relevant to a checker. As a result, our implementation is tiny-roughly 2500 lines of code, about two orders of magnitude smaller than a typical system. We hope that this dramatic increase in simplicity will allow people to use more checkers on more systems in more languages.

We implement our approach in μchex, a language-agnostic framework for writing static bug checkers. We use it to build micro-grammar based checkers for six languages (C, the C preprocessor, C++, Java, JavaScript, and Dart) and find over 700 errors in real-world projects.

References

  1. clang: a C language family frontend for LLVM. http://clang.llvm.org/.Google ScholarGoogle Scholar
  2. Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. Compilers: Principles, Techniques, and Tools. 1986.Google ScholarGoogle Scholar
  3. Thomas Ball, Byron Cook, Vladimir Levin, and Sriram K. Rajamani. Slam and static driver verifier: Technology transfer of formal methods inside microsoft. In IFM., 2004.Google ScholarGoogle ScholarCross RefCross Ref
  4. Thomas Ball and Sriram K Rajamani. Bebop: A symbolic model checker for boolean programs. In Proceedings of the Seventh International SPIN Workshop on Model Checking of Software, 2000.Google ScholarGoogle ScholarCross RefCross Ref
  5. Thomas Ball and Sriram K. Rajamani. Automatically validating temporal safety properties of interfaces. In Proceedings of the Eighth International SPIN Workshop on Model Checking of Software, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM, 53(2), 2010.Google ScholarGoogle Scholar
  7. Robert S. Boyer, Bernard Elspas, and Karl N. Levitt. Select -- a formal system for testing and debugging programs by symbolic execution. ACM SIGPLAN Notices, 10(6), 1975.Google ScholarGoogle Scholar
  8. Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of Symp. on Operating Systems Design and Implementation, 2008.Google ScholarGoogle Scholar
  9. Erik Cambria and Bebo White. Jumping NLP curves: A review of natural language processing research. In IEEE Computational Intelligence Magazine, 2014.Google ScholarGoogle Scholar
  10. clang. Bug 4068 - compiling the linux kernel with clang. https://llvm.org/bugs/show_bug.cgi?id=4068, 2009.Google ScholarGoogle Scholar
  11. James Clause, Wanchun Li, and Alessandro Orso. Dytan: a generic dynamic taint analysis framework. In Proceedings of International Symp. on Software Testing and Analysis, 2007.Google ScholarGoogle Scholar
  12. Fernando J Corbató, Jerome H Saltzer, and Chris T Clingen. Multics: the first seven years. Proceedings AFIPS 1972 SJCC, 40, 1972.Google ScholarGoogle Scholar
  13. Jonathan Corbet. Fun with null pointers, part 1. http://lwn.net/Articles/342330/, 2009.Google ScholarGoogle Scholar
  14. James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Corina S. Pasareanu, Robby, and Hongjun Zheng. Bandera: Extracting finite-state models from Java source code. In Proceedings of the 22nd International Conference on Software Engineering, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Heming Cui, Gang Hu, Jingyue Wu, and Junfeng Yang. Verifying systems rules using rule-directed symbolic execution. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: path-sensitive program verification in polynomial time. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Robert DeLine and Manuel Fahndrich. Enforcing high-level protocols in low-level software. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Dawson Engler. Making finite verification of raw c code easier than writing a test case. Invited talk, International Conference on Runtime Verification, 2011.Google ScholarGoogle Scholar
  19. Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of Operating Systems Design and Implementation, 2000.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Kai Von Fintel. NPI-licensing, straws-on-entailment, and context-dependency. In Journal of Semantics, 1999.Google ScholarGoogle Scholar
  22. Cormac Flanagan, K Rustan M Leino, Mark Lillibridge, Greg Nelson, James B Saxe, and Raymie Stata. Extended static checking for Java. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jeffrey S Foster, Manuel Fahndrich, and Alexander Aiken. A theory of type qualifiers. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, 1999.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Edison Design Group. EDG C+ compiler front-end. http://www.edg.com.Google ScholarGoogle Scholar
  26. Dick Grune and Ceriel J.H. Jacobs. Parsing Techniques: A Practical Guide (2nd ed). 2008.Google ScholarGoogle Scholar
  27. Reed Hastings and Bob Joyce. Purify: Fast detection of memory leaks and access errors. In Proceedings of the USENIX Winter Technical Conference, 1992.Google ScholarGoogle Scholar
  28. Gerard J Holzmann and Margaret H Smith. Software model checking: Extracting verification models from source code. In Invited Paper. Proceedings PSTV/FORTE99, 1999.Google ScholarGoogle ScholarCross RefCross Ref
  29. John E. Hopcroft, Rajeev Motwani, and Jeffery Ullman. Introduction to Automata Theory, Languages, and Computation, Third Edition. Pearson Education, Inc, 2007.Google ScholarGoogle Scholar
  30. Intrinsa. A technical introduction to PREfix/Enterprise. Technical report, Intrinsa Corporation, 1998.Google ScholarGoogle Scholar
  31. Butler W. Lampson. Hints for computer system design. In Proceedings of the Eighth ACM Symposium on Operating Systems Principles, 1983.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Daan Leijen and Erik Meijer. Parsec: Direct style monadic parser combinators for the real world. 2002.Google ScholarGoogle Scholar
  33. K Rustan M Leino, Greg Nelson, and James B Saxe. ESC/Java user's manual. Technical note 2000-002, Compaq Systems Research Center, 2001.Google ScholarGoogle Scholar
  34. George C. Necula, Scott McPeak, S.P. Rahul, and Westley Weimer. Cil: Intermediate language and tools for analysis and transformation of c programs. In Proceedings of Conference on Compiler Construction, 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. George C. Necula, Scott McPeak, and Westley Weimer. Ccured: type-safe retrofitting of legacy code. In Proceedings of Symp. on Principles of Programming Languages, 2002.Google ScholarGoogle Scholar
  36. Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007.Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. James Newsome and Dawn Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In Proceedings of Network and Distributed Systems Security Symp., 2005.Google ScholarGoogle Scholar
  38. David A. Patterson. Reduced instruction set computers. Communications of the ACM, 21(1), 1985.Google ScholarGoogle Scholar
  39. Suzette Person, Guowei Yang, Neha Rungta, and Sarfraz Khurshid. Directed incremental symbolic execution. In Proceedings of ACM SIGPLAN Conf. on Programming Language Design and Implementation, 2011.Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Rob Pike, Dave Presotto, Ken Thompson, and Howard Trickey. Plan 9 from bell labs. In Proceedings of the Summer 1990 UKUUG Conference, 1990.Google ScholarGoogle Scholar
  41. Martin Rinard, Cristian Cadar, Daniel Dumitran, Daniel M. Roy, Tudor Leu, and Jr. William S. Beebee. Enhancing server availability and security through failure-oblivious computing. In Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation, 2004.Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D.M. Ritchie, K. Thompson, Dennis M Ritchie, and Ken Thompson. The UNIX time-sharing system. Communications of the ACM, 17(7), 1974.Google ScholarGoogle Scholar
  43. Jerome H Saltzer, David P Reed, and David D Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems, 2(4), 1984.Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Koushik Sen, Darko Marinov, and Gul Agha. CUTE: A concolic unit testing engine for C. In Proceedings of the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  45. David Wagner and Drew Dean. Intrusion detection via static analysis. In Proceedings of the 2001 IEEE Symposium on Security and Privacy, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. David Wagner, Jeffrey S. Foster, Eric A. Brewer, and Alexander Aiken. A first step towards automated detection of buffer overrun vulnerabilities. In Network and Distributed System Security Symposium, 2000.Google ScholarGoogle Scholar
  47. Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Yichen Xie and Alex Aiken. Context- and path-sensitive memory leak detection. In Proceedings of the International Symp. on Foundations of Software Engineering (FSE), 2005.Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Yichen Xie and Dawson Engler. Using redundancies to find errors. Proceedings of the 10th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 27(6), 2002.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. How to Build Static Checking Systems Using Orders of Magnitude Less Code

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 4
      ASPLOS '16
      April 2016
      774 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2954679
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems
        March 2016
        824 pages
        ISBN:9781450340915
        DOI:10.1145/2872362
        • General Chair:
        • Tom Conte,
        • Program Chair:
        • Yuanyuan Zhou

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 March 2016

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!