skip to main content
research-article
Open Access

On the unusual effectiveness of type-aware operator mutations for testing SMT solvers

Published:13 November 2020Publication History
Skip Abstract Section

Abstract

We propose type-aware operator mutation, a simple, but unusually effective approach for testing SMT solvers. The key idea is to mutate operators of conforming types within the seed formulas to generate well-typed mutant formulas. These mutant formulas are then used as the test cases for SMT solvers. We realized type-aware operator mutation within the OpFuzz tool and used it to stress-test Z3 and CVC4, two state-of-the-art SMT solvers. Type-aware operator mutations are unusually effective: During one year of extensive testing with OpFuzz, we reported 1092 bugs on Z3’s and CVC4’s respective GitHub issue trackers, out of which 819 unique bugs were confirmed and 685 of the confirmed bugs were fixed by the developers. The detected bugs are highly diverse — we found bugs of many different types (soundness bugs, invalid model bugs, crashes, etc.), logics and solver configurations. We have further conducted an in-depth study of the bugs found by OpFuzz. The study results show that the bugs found by OpFuzz are of high quality. Many of them affect core components of the SMT solvers’ codebases, and some required major changes for the developers to fix. Among the 819 confirmed bugs found by OpFuzz,184 were soundness bugs, the most critical bugs in SMT solvers,and 489 were in the default modes of the solvers. Notably, OpFuzz found 27 critical soundness bugs in CVC4, which has proved to be a very stable SMT solver.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

We propose type-aware operator mutation, a simple, but unusually effective approach for testing SMT solvers. The key idea is to mutate operators of conforming types within the seed formulas to generate well-typed mutant formulas. These mutant formulas are then used for testing SMT solvers. We developed the OpFuzz tool based on this idea and stress-tested Z3 and CVC4, two state-of-the-art SMT solvers. Type-aware operator mutations are unusually effective: During one year, we reported 1,092 bugs, out of which 819 unique bugs were confirmed and 685 of the confirmed bugs were fixed by the developers. We found bugs of highly diverse types (soundness, invalid model, and crashes bugs etc.), logics and solver configurations. Among the confirmed bugs, 184 were soundness bugs, the most critical bugs in SMT solvers, and 489 were in the default modes of the solvers. Notably, OpFuzz found 27 critical soundness bugs in CVC4, which has been resilient against previous fuzzing campaigns.

References

  1. Robert Brummayer and Armin Biere. 2009b. Fuzzing and delta-debugging SMT solvers. In SMT. 1-5.Google ScholarGoogle Scholar
  2. Alexandra Bugariu and Peter Müller. 2020. Automatically testing string solvers. In ICSE. 459-1470.Google ScholarGoogle Scholar
  3. Alexandra Bugariu, Valentin Wüstholz, Maria Christakis, and Peter Müller. 2018. Automatically testing implementations of numerical abstract domains. In ASE. 768-778.Google ScholarGoogle Scholar
  4. Cristian Cadar and Alastair Donaldson. 2016. Analysing the program analyser. In ICSE. 765-768.Google ScholarGoogle Scholar
  5. Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs. In OSDI. 209-224.Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Sang Kil Cha, Maverick Woo, and David Brumley. 2015. Program-adaptive mutational fuzzing. In SP. 725-741.Google ScholarGoogle Scholar
  7. Alessandro Cimatti, Alberto Griggio, Bastiaan Schaafsma, and Roberto Sebastiani. 2013. The MathSAT5 SMT solver. In TACAS. 93-107.Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. The International SMT Competition. 2020. SMT-COMP. Retrieved 2020-05-15 from https://smt-comp.github.io/2019/index. htmlGoogle ScholarGoogle Scholar
  9. Jake Corina, Aravind Machiry, Christopher Salls, Yan Shoshitaishvili, Shuang Hao, Christopher Kruegel, and Giovanni Vigna. 2017. Difuze: Interface aware fuzzing for kernel drivers. In CCS. 2123-2138.Google ScholarGoogle Scholar
  10. Chris Cummins, Pavlos Petoumenos, Alastair Murray, and Hugh Leather. 2018. Compiler fuzzing through deep learning. In ISSTA. 95-105.Google ScholarGoogle Scholar
  11. CVC4. 2020. CVC4 Regression Test Suite. Retrieved 2020-05-15 from https://github.com/CVC4/CVC4/tree/master/test/regress Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An eficient SMT solver. In TACAS. 337-340.Google ScholarGoogle Scholar
  12. Rob DeLine and Rustan Leino. 2005. BoogiePL: A typed procedural language for checking object-oriented programs. Technical Report.Google ScholarGoogle Scholar
  13. David Detlefs, Greg Nelson, and James B. Saxe. 2005. Simplify: A theorem prover for program checking. JACM ( 2005 ), 365-473.Google ScholarGoogle Scholar
  14. Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: Directed automated random testing. In PLDI. 213-223.Google ScholarGoogle Scholar
  15. HyungSeok Han and Sang Kil Cha. 2017. Imf: Inferred model-based fuzzer. In CCS. 2345-2358.Google ScholarGoogle Scholar
  16. Jinho Jung, Hong Hu, Joy Arulraj, Taesoo Kim, and Woonhak Kang. 2019. APOLLO: Automatic detection and diagnosis of performance regressions in database systems. In VLDB. 57-70.Google ScholarGoogle Scholar
  17. Christian Klinger, Maria Christakis, and Valentin Wüstholz. 2019. Diferentially testing soundness and precision of program analyzers. In ISSTA. 239-250.Google ScholarGoogle Scholar
  18. Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage guided, property based testing. In OOPSLA. 181 : 1-181 : 29.Google ScholarGoogle Scholar
  19. Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In PLDI. 216-226.Google ScholarGoogle Scholar
  20. Caroline Lemieux and Koushik Sen. 2018. Fairfuzz: A targeted mutation strategy for increasing greybox fuzz testing coverage. In ASE.Google ScholarGoogle Scholar
  21. Yuekang Li, Bihuan Chen, Mahinthan Chandramohan, Shang-Wei Lin, Yang Liu, and Alwen Tiu. 2017. Steelix: Program-state based binary fuzzing. In ESEC/FSE.Google ScholarGoogle Scholar
  22. Christopher Lidbury, Andrei Lascu, Nathan Chong, and Alastair F Donaldson. 2015. Many-core compiler fuzzing. In PLDI. 65-76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chaitanya Mishra, Nick Koudas, and Calisto Zuzarte. 2008. Generating targeted queries for database testing. In SIGMOD. 499-510.Google ScholarGoogle Scholar
  24. Aina Niemetz and Armin Biere. 2013. ddSMT: A delta debugger for the SMT-LIB v2 format. In SMT. 36-45.Google ScholarGoogle Scholar
  25. Aina Niemetz, Mathias Preiner, and Armin Biere. 2017. Model-based API testing for SMT solvers. In SMT. 10.Google ScholarGoogle Scholar
  26. Mansur Numair, Maria Christakis, Valentin Wüstholz, and Fuyuan Zhang. 2020. Detecting critical bugs in SMT solvers using blackbox mutational fuzzing. arXiv e-prints ( April 2020 ), arXiv: 2004.05934.Google ScholarGoogle Scholar
  27. Felix Pauck, Eric Bodden, and Heike Wehrheim. 2018. Do Android taint analysis tools keep their promises?. In ESEC/FSE. 331-341.Google ScholarGoogle Scholar
  28. Van-Thuan Pham, Marcel Böhme, Andrew Edward Santosa, Alexandru Razvan Caciulescu, and Abhik Roychoudhury. 2019. Smart greybox fuzzing. TSE ( 2019 ).Google ScholarGoogle Scholar
  29. Lina Qiu, Yingying Wang, and Julia Rubin. 2018. Analyzing the analyzers: FlowDroid/IccTA, AmanDroid, and DroidSafe. In ISSTA. 176-186.Google ScholarGoogle Scholar
  30. John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-case reduction for C compiler bugs. In PLDI. 335-346.Google ScholarGoogle Scholar
  31. Andrew Reynolds, Morgan Deters, Viktor Kuncak, Clark W. Barrett, and Cesare Tinelli. 2015. On counterexample guided quantifier instantiation for synthesis in CVC4. In CAV.Google ScholarGoogle Scholar
  32. Manuel Rigger and Zhendong Su. 2020. Detecting optimization bugs in database engines via non-optimizing reference Engine Construction. In OOPSLA.Google ScholarGoogle Scholar
  33. Sergej Schumilo, Cornelius Aschermann, Robert Gawlik, Sebastian Schinzel, and Thorsten Holz. 2017. kAFL: Hardwareassisted feedback fuzzing for OS kernels. In USENIX Security. 167-182.Google ScholarGoogle Scholar
  34. Joseph Scott, Federico Mora, and Vijay Ganesh. 2020. BanditFuzz: Fuzzing SMT solvers with reinforcement learning. In CAV.Google ScholarGoogle Scholar
  35. Andreas Seltenreich. 2020. SQLSmith. Retrieved 2020-08-13 from https://github.com/anse1/sqlsmith SMT-LIB. 2020. SMT-LIB Benchmarks. Retrieved 2020-05-15 from http://smtlib.cs.uiowa.edu/benchmarks.shtml Armando Solar-Lezama. 2008. Program synthesis by sketching. Ph.D. Dissertation. UC Berkeley. https://people.csail.mit. edu/asolar/papers/thesis.pdfGoogle ScholarGoogle Scholar
  36. Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In PLDI. 530-541.Google ScholarGoogle Scholar
  37. Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: Grammar-aware greybox fuzzing. In ICSE. 724-735.Google ScholarGoogle Scholar
  38. Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. Validating SMT solvers via semantic fusion. In PLDI. 718-730.Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jingyue Wu, Gang Hu, Yang Tang, and Junfeng Yang. 2013. Efective dynamic detection of alias analysis errors. In ESEC/FSE. 279-289.Google ScholarGoogle Scholar
  40. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and understanding bugs in C compilers. In PLDI. 283-294.Google ScholarGoogle Scholar
  41. Z3. 2020. Z3 Regression Test Suite. Retrieved 2020-05-15 from https://github.com/Z3Prover/z3test Michal Zalewski. 2020. american fuzzy lop. Retrieved 2020-08-12 from https://lcamtuf.coredump.cx/afl/ Chengyu Zhang, Ting Su, Yichen Yan, Fuyuan Zhang, Geguang Pu, and Zhendong Su. 2019. Finding and understanding bugs in software model checkers. In ESEC/FSE. 763-773.Google ScholarGoogle Scholar
  42. Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal program enumeration for rigorous compiler testing. In PLDI. 347-361.Google ScholarGoogle Scholar

Index Terms

  1. On the unusual effectiveness of type-aware operator mutations for testing SMT solvers

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image Proceedings of the ACM on Programming Languages
      Proceedings of the ACM on Programming Languages  Volume 4, Issue OOPSLA
      November 2020
      3108 pages
      EISSN:2475-1421
      DOI:10.1145/3436718
      Issue’s Table of Contents

      Copyright © 2020 Owner/Author

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 November 2020
      Published in pacmpl Volume 4, Issue OOPSLA

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!