Abstract
Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and hard to retarget to new languages, thereby dramatically limiting their scope. This paper reduces checking system complexity by addressing a fundamental assumption, the assumption that checkers must depend on a full-blown language specification and compiler front end. Instead, our program checkers are based on drastically incomplete language grammars ("micro-grammars") that describe only portions of a language relevant to a checker. As a result, our implementation is tiny-roughly 2500 lines of code, about two orders of magnitude smaller than a typical system. We hope that this dramatic increase in simplicity will allow people to use more checkers on more systems in more languages.
We implement our approach in μchex, a language-agnostic framework for writing static bug checkers. We use it to build micro-grammar based checkers for six languages (C, the C preprocessor, C++, Java, JavaScript, and Dart) and find over 700 errors in real-world projects.
- clang: a C language family frontend for LLVM. http://clang.llvm.org/.Google Scholar
- Alfred V Aho, Ravi Sethi, and Jeffrey D Ullman. Compilers: Principles, Techniques, and Tools. 1986.Google Scholar
- Thomas Ball, Byron Cook, Vladimir Levin, and Sriram K. Rajamani. Slam and static driver verifier: Technology transfer of formal methods inside microsoft. In IFM., 2004.Google Scholar
Cross Ref
- Thomas Ball and Sriram K Rajamani. Bebop: A symbolic model checker for boolean programs. In Proceedings of the Seventh International SPIN Workshop on Model Checking of Software, 2000.Google Scholar
Cross Ref
- Thomas Ball and Sriram K. Rajamani. Automatically validating temporal safety properties of interfaces. In Proceedings of the Eighth International SPIN Workshop on Model Checking of Software, 2001.Google Scholar
Digital Library
- Al Bessey, Ken Block, Ben Chelf, Andy Chou, Bryan Fulton, Seth Hallem, Charles Henri-Gros, Asya Kamsky, Scott McPeak, and Dawson Engler. A few billion lines of code later: Using static analysis to find bugs in the real world. Commun. ACM, 53(2), 2010.Google Scholar
- Robert S. Boyer, Bernard Elspas, and Karl N. Levitt. Select -- a formal system for testing and debugging programs by symbolic execution. ACM SIGPLAN Notices, 10(6), 1975.Google Scholar
- Cristian Cadar, Daniel Dunbar, and Dawson Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In Proceedings of Symp. on Operating Systems Design and Implementation, 2008.Google Scholar
- Erik Cambria and Bebo White. Jumping NLP curves: A review of natural language processing research. In IEEE Computational Intelligence Magazine, 2014.Google Scholar
- clang. Bug 4068 - compiling the linux kernel with clang. https://llvm.org/bugs/show_bug.cgi?id=4068, 2009.Google Scholar
- James Clause, Wanchun Li, and Alessandro Orso. Dytan: a generic dynamic taint analysis framework. In Proceedings of International Symp. on Software Testing and Analysis, 2007.Google Scholar
- Fernando J Corbató, Jerome H Saltzer, and Chris T Clingen. Multics: the first seven years. Proceedings AFIPS 1972 SJCC, 40, 1972.Google Scholar
- Jonathan Corbet. Fun with null pointers, part 1. http://lwn.net/Articles/342330/, 2009.Google Scholar
- James C. Corbett, Matthew B. Dwyer, John Hatcliff, Shawn Laubach, Corina S. Pasareanu, Robby, and Hongjun Zheng. Bandera: Extracting finite-state models from Java source code. In Proceedings of the 22nd International Conference on Software Engineering, 2000.Google Scholar
Digital Library
- Heming Cui, Gang Hu, Jingyue Wu, and Junfeng Yang. Verifying systems rules using rule-directed symbolic execution. In Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems, 2013.Google Scholar
Digital Library
- Manuvir Das, Sorin Lerner, and Mark Seigle. Esp: path-sensitive program verification in polynomial time. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, 2002.Google Scholar
Digital Library
- Robert DeLine and Manuel Fahndrich. Enforcing high-level protocols in low-level software. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation, 2001.Google Scholar
Digital Library
- Dawson Engler. Making finite verification of raw c code easier than writing a test case. Invited talk, International Conference on Runtime Verification, 2011.Google Scholar
- Dawson Engler, Benjamin Chelf, Andy Chou, and Seth Hallem. Checking system rules using system-specific, programmer-written compiler extensions. In Proceedings of Operating Systems Design and Implementation, 2000.Google Scholar
Digital Library
- Dawson Engler, David Yu Chen, Seth Hallem, Andy Chou, and Benjamin Chelf. Bugs as deviant behavior: A general approach to inferring errors in systems code. In Proceedings of the 18th ACM Symposium on Operating Systems Principles, 2001.Google Scholar
Digital Library
- Kai Von Fintel. NPI-licensing, straws-on-entailment, and context-dependency. In Journal of Semantics, 1999.Google Scholar
- Cormac Flanagan, K Rustan M Leino, Mark Lillibridge, Greg Nelson, James B Saxe, and Raymie Stata. Extended static checking for Java. In Proceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation, 2002.Google Scholar
Digital Library
- Jeffrey S Foster, Manuel Fahndrich, and Alexander Aiken. A theory of type qualifiers. In Proceedings of the SIGPLAN '99 Conference on Programming Language Design and Implementation, 1999.Google Scholar
Digital Library
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. DART: Directed automated random testing. In Proceedings of the ACM SIGPLAN 2005 Conference on Programming Language Design and Implementation, 2005.Google Scholar
Digital Library
- Edison Design Group. EDG C+ compiler front-end. http://www.edg.com.Google Scholar
- Dick Grune and Ceriel J.H. Jacobs. Parsing Techniques: A Practical Guide (2nd ed). 2008.Google Scholar
- Reed Hastings and Bob Joyce. Purify: Fast detection of memory leaks and access errors. In Proceedings of the USENIX Winter Technical Conference, 1992.Google Scholar
- Gerard J Holzmann and Margaret H Smith. Software model checking: Extracting verification models from source code. In Invited Paper. Proceedings PSTV/FORTE99, 1999.Google Scholar
Cross Ref
- John E. Hopcroft, Rajeev Motwani, and Jeffery Ullman. Introduction to Automata Theory, Languages, and Computation, Third Edition. Pearson Education, Inc, 2007.Google Scholar
- Intrinsa. A technical introduction to PREfix/Enterprise. Technical report, Intrinsa Corporation, 1998.Google Scholar
- Butler W. Lampson. Hints for computer system design. In Proceedings of the Eighth ACM Symposium on Operating Systems Principles, 1983.Google Scholar
Digital Library
- Daan Leijen and Erik Meijer. Parsec: Direct style monadic parser combinators for the real world. 2002.Google Scholar
- K Rustan M Leino, Greg Nelson, and James B Saxe. ESC/Java user's manual. Technical note 2000-002, Compaq Systems Research Center, 2001.Google Scholar
- George C. Necula, Scott McPeak, S.P. Rahul, and Westley Weimer. Cil: Intermediate language and tools for analysis and transformation of c programs. In Proceedings of Conference on Compiler Construction, 2002.Google Scholar
Digital Library
- George C. Necula, Scott McPeak, and Westley Weimer. Ccured: type-safe retrofitting of legacy code. In Proceedings of Symp. on Principles of Programming Languages, 2002.Google Scholar
- Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation, 2007.Google Scholar
Digital Library
- James Newsome and Dawn Song. Dynamic taint analysis for automatic detection, analysis, and signature generation of exploits on commodity software. In Proceedings of Network and Distributed Systems Security Symp., 2005.Google Scholar
- David A. Patterson. Reduced instruction set computers. Communications of the ACM, 21(1), 1985.Google Scholar
- Suzette Person, Guowei Yang, Neha Rungta, and Sarfraz Khurshid. Directed incremental symbolic execution. In Proceedings of ACM SIGPLAN Conf. on Programming Language Design and Implementation, 2011.Google Scholar
Digital Library
- Rob Pike, Dave Presotto, Ken Thompson, and Howard Trickey. Plan 9 from bell labs. In Proceedings of the Summer 1990 UKUUG Conference, 1990.Google Scholar
- Martin Rinard, Cristian Cadar, Daniel Dumitran, Daniel M. Roy, Tudor Leu, and Jr. William S. Beebee. Enhancing server availability and security through failure-oblivious computing. In Proceedings of the 6th conference on Symposium on Operating Systems Design & Implementation, 2004.Google Scholar
Digital Library
- D.M. Ritchie, K. Thompson, Dennis M Ritchie, and Ken Thompson. The UNIX time-sharing system. Communications of the ACM, 17(7), 1974.Google Scholar
- Jerome H Saltzer, David P Reed, and David D Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems, 2(4), 1984.Google Scholar
Digital Library
- Koushik Sen, Darko Marinov, and Gul Agha. CUTE: A concolic unit testing engine for C. In Proceedings of the 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 2005.Google Scholar
Cross Ref
- David Wagner and Drew Dean. Intrusion detection via static analysis. In Proceedings of the 2001 IEEE Symposium on Security and Privacy, 2001.Google Scholar
Digital Library
- David Wagner, Jeffrey S. Foster, Eric A. Brewer, and Alexander Aiken. A first step towards automated detection of buffer overrun vulnerabilities. In Network and Distributed System Security Symposium, 2000.Google Scholar
- Xi Wang, Nickolai Zeldovich, M. Frans Kaashoek, and Armando Solar-Lezama. Towards optimization-safe systems: Analyzing the impact of undefined behavior. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, 2013.Google Scholar
Digital Library
- Yichen Xie and Alex Aiken. Context- and path-sensitive memory leak detection. In Proceedings of the International Symp. on Foundations of Software Engineering (FSE), 2005.Google Scholar
Digital Library
- Yichen Xie and Dawson Engler. Using redundancies to find errors. Proceedings of the 10th ACM SIGSOFT International Symposium on Foundations of Software Engineering, 27(6), 2002.Google Scholar
Digital Library
Index Terms
How to Build Static Checking Systems Using Orders of Magnitude Less Code
Recommendations
How to Build Static Checking Systems Using Orders of Magnitude Less Code
ASPLOS '16: Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating SystemsModern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and ...
How to Build Static Checking Systems Using Orders of Magnitude Less Code
ASPLOS'16Modern static bug finding tools are complex. They typically consist of hundreds of thousands of lines of code, and most of them are wedded to one language (or even one compiler). This complexity makes the systems hard to understand, hard to debug, and ...
Static consistency checking of web applications with WebDSL
Modern web application development frameworks provide web application developers with high-level abstractions to improve their productivity. However, their support for static verification of applications is limited. Inconsistencies in an application are ...







Comments