Abstract
We propose type-aware operator mutation, a simple, but unusually effective approach for testing SMT solvers. The key idea is to mutate operators of conforming types within the seed formulas to generate well-typed mutant formulas. These mutant formulas are then used as the test cases for SMT solvers. We realized type-aware operator mutation within the OpFuzz tool and used it to stress-test Z3 and CVC4, two state-of-the-art SMT solvers. Type-aware operator mutations are unusually effective: During one year of extensive testing with OpFuzz, we reported 1092 bugs on Z3’s and CVC4’s respective GitHub issue trackers, out of which 819 unique bugs were confirmed and 685 of the confirmed bugs were fixed by the developers. The detected bugs are highly diverse — we found bugs of many different types (soundness bugs, invalid model bugs, crashes, etc.), logics and solver configurations. We have further conducted an in-depth study of the bugs found by OpFuzz. The study results show that the bugs found by OpFuzz are of high quality. Many of them affect core components of the SMT solvers’ codebases, and some required major changes for the developers to fix. Among the 819 confirmed bugs found by OpFuzz,184 were soundness bugs, the most critical bugs in SMT solvers,and 489 were in the default modes of the solvers. Notably, OpFuzz found 27 critical soundness bugs in CVC4, which has proved to be a very stable SMT solver.
Supplemental Material
- Robert Brummayer and Armin Biere. 2009b. Fuzzing and delta-debugging SMT solvers. In SMT. 1-5.Google Scholar
- Alexandra Bugariu and Peter Müller. 2020. Automatically testing string solvers. In ICSE. 459-1470.Google Scholar
- Alexandra Bugariu, Valentin Wüstholz, Maria Christakis, and Peter Müller. 2018. Automatically testing implementations of numerical abstract domains. In ASE. 768-778.Google Scholar
- Cristian Cadar and Alastair Donaldson. 2016. Analysing the program analyser. In ICSE. 765-768.Google Scholar
- Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI. 209-224.Google Scholar
Digital Library
- Sang Kil Cha, Maverick Woo, and David Brumley. 2015. Program-adaptive mutational fuzzing. In SP. 725-741.Google Scholar
- Alessandro Cimatti, Alberto Griggio, Bastiaan Schaafsma, and Roberto Sebastiani. 2013. The MathSAT5 SMT solver. In TACAS. 93-107.Google Scholar
Digital Library
- The International SMT Competition. 2020. SMT-COMP. Retrieved 2020-05-15 from https://smt-comp.github.io/2019/index. htmlGoogle Scholar
- Jake Corina, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Shuang Hao, Christopher Kruegel, and Giovanni Vigna. 2017. Difuze: Interface aware fuzzing for kernel drivers. In CCS. 2123-2138.Google Scholar
- Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In ISSTA. 95-105.Google Scholar
- CVC4. 2020. CVC4 Regression Test Suite. Retrieved 2020-05-15 from https://github.com/CVC4/CVC4/tree/master/test/regress Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An eficient SMT solver. In TACAS. 337-340.Google Scholar
- Rob DeLine and Rustan Leino. 2005. BoogiePL: A typed procedural language for checking object-oriented programs. Technical Report.Google Scholar
- David Detlefs, Greg Nelson, and James B. Saxe. 2005. Simplify: A theorem prover for program checking. JACM ( 2005 ), 365-473.Google Scholar
- Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In PLDI. 213-223.Google Scholar
- HyungSeok Han and Sang Kil Cha. 2017. Imf: Inferred model-based fuzzer. In CCS. 2345-2358.Google Scholar
- Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woonhak Kang. 2019. APOLLO: Automatic detection and diagnosis of performance regressions in database systems. In VLDB. 57-70.Google Scholar
- Christian Klinger, Maria Christakis, and Valentin Wüstholz. 2019. Diferentially testing soundness and precision of program analyzers. In ISSTA. 239-250.Google Scholar
- Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage guided, property based testing. In OOPSLA. 181 : 1-181 : 29.Google Scholar
- Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In PLDI. 216-226.Google Scholar
- Caroline Lemieux and Koushik Sen. 2018. Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In ASE.Google Scholar
- Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: Program-state based binary fuzzing. In ESEC/FSE.Google Scholar
- Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F Donaldson. 2015. Many-core compiler fuzzing. In PLDI. 65-76.Google Scholar
Digital Library
- Chaitanya Mishra, Nick Koudas, and Calisto Zuzarte. 2008. Generating targeted queries for database testing. In SIGMOD. 499-510.Google Scholar
- Aina Niemetz and Armin Biere. 2013. ddSMT: A delta debugger for the SMT-LIB v2 format. In SMT. 36-45.Google Scholar
- Aina Niemetz, Mathias Preiner, and Armin Biere. 2017. Model-based API testing for SMT solvers. In SMT. 10.Google Scholar
- Mansur Numair, Maria Christakis, Valentin Wüstholz, and Fuyuan Zhang. 2020. Detecting critical bugs in SMT solvers using blackbox mutational fuzzing. arXiv e-prints ( April 2020 ), arXiv: 2004.05934.Google Scholar
- Felix Pauck, Eric Bodden, and Heike Wehrheim. 2018. Do Android taint analysis tools keep their promises?. In ESEC/FSE. 331-341.Google Scholar
- Van-Thuan Pham, Marcel Böhme, Andrew Edward Santosa, Alexandru Razvan Caciulescu, and Abhik Roychoudhury. 2019. Smart greybox fuzzing. TSE ( 2019 ).Google Scholar
- Lina Qiu, Yingying Wang, and Julia Rubin. 2018. Analyzing the analyzers: FlowDroid/IccTA, AmanDroid, and DroidSafe. In ISSTA. 176-186.Google Scholar
- John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-case reduction for C compiler bugs. In PLDI. 335-346.Google Scholar
- Andrew Reynolds, Morgan Deters, Viktor Kuncak, Clark W. Barrett, and Cesare Tinelli. 2015. On counterexample guided quantifier instantiation for synthesis in CVC4. In CAV.Google Scholar
- Manuel Rigger and Zhendong Su. 2020. Detecting optimization bugs in database engines via non-optimizing reference Engine Construction. In OOPSLA.Google Scholar
- Sergej Schumilo, Cornelius Aschermann, Robert Gawlik, Sebastian Schinzel, and Thorsten Holz. 2017. kAFL: Hardwareassisted feedback fuzzing for OS kernels. In USENIX Security. 167-182.Google Scholar
- Joseph Scott, Federico Mora, and Vijay Ganesh. 2020. BanditFuzz: Fuzzing SMT solvers with reinforcement learning. In CAV.Google Scholar
- Andreas Seltenreich. 2020. SQLSmith. Retrieved 2020-08-13 from https://github.com/anse1/sqlsmith SMT-LIB. 2020. SMT-LIB Benchmarks. Retrieved 2020-05-15 from http://smtlib.cs.uiowa.edu/benchmarks.shtml Armando Solar-Lezama. 2008. Program synthesis by sketching. Ph.D. Dissertation. UC Berkeley. https://people.csail.mit. edu/asolar/papers/thesis.pdfGoogle Scholar
- Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In PLDI. 530-541.Google Scholar
- Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-aware greybox fuzzing. In ICSE. 724-735.Google Scholar
- Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. Validating SMT solvers via semantic fusion. In PLDI. 718-730.Google Scholar
Digital Library
- Jingyue Wu, Gang Hu, Yang Tang, and Junfeng Yang. 2013. Efective dynamic detection of alias analysis errors. In ESEC/FSE. 279-289.Google Scholar
- Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In PLDI. 283-294.Google Scholar
- Z3. 2020. Z3 Regression Test Suite. Retrieved 2020-05-15 from https://github.com/Z3Prover/z3test Michal Zalewski. 2020. american fuzzy lop. Retrieved 2020-08-12 from https://lcamtuf.coredump.cx/afl/ Chengyu Zhang, Ting Su, Yichen Yan, Fuyuan Zhang, Geguang Pu, and Zhendong Su. 2019. Finding and understanding bugs in software model checkers. In ESEC/FSE. 763-773.Google Scholar
- Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal program enumeration for rigorous compiler testing. In PLDI. 347-361.Google Scholar
Index Terms
On the unusual effectiveness of type-aware operator mutations for testing SMT solvers
Recommendations
Fuzzing SMT solvers via two-dimensional input space exploration
ISSTA 2021: Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and AnalysisSatisfiability Modulo Theories (SMT) solvers serve as the core engine of many techniques, such as symbolic execution. Therefore, ensuring the robustness and correctness of SMT solvers is critical. While fuzzing is an efficient and effective method for ...
Generative type-aware mutation for testing SMT solvers
We propose Generative Type-Aware Mutation, an effective approach for testing SMT solvers. The key idea is to realize generation through the mutation of expressions rooted with parametric operators from the SMT-LIB specification. Generative Type-Aware ...
Detecting critical bugs in SMT solvers using blackbox mutational fuzzing
ESEC/FSE 2020: Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software EngineeringFormal methods use SMT solvers extensively for deciding formula satisfiability, for instance, in software verification, systematic test generation, and program synthesis. However, due to their complex implementations, solvers may contain critical bugs ...






Comments