skip to main content

Generative type-aware mutation for testing SMT solvers

Published:15 October 2021Publication History
Skip Abstract Section

Abstract

We propose Generative Type-Aware Mutation, an effective approach for testing SMT solvers. The key idea is to realize generation through the mutation of expressions rooted with parametric operators from the SMT-LIB specification. Generative Type-Aware Mutation is a hybrid of mutation-based and grammar-based fuzzing and features an infinite mutation space—overcoming a major limitation of OpFuzz, the state-of-the-art fuzzer for SMT solvers. We have realized Generative Type-Aware Mutation in a practical SMT solver bug hunting tool, TypeFuzz. During our testing period with TypeFuzz, we reported over 237 bugs in the state-of-the-art SMT solvers Z3 and CVC4. Among these, 189 bugs were confirmed and 176 bugs were fixed. Most notably, we found 18 soundness bugs in CVC4’s default mode alone. Several of them were two years latent (7/18). CVC4 has been proved to be a very stable SMT solver and has resisted several fuzzing campaigns.

Skip Supplemental Material Section

Supplemental Material

Auxiliary Presentation Video

We propose Generative Type-Aware Mutation, an effective approach for testing SMT solvers. The key idea is to realize generation through the mutation of expressions rooted with parametric operators from the SMT-LIB specification. Generative Type-Aware Mutation is a hybrid of mutation-based and grammar-based fuzzing and features an infinite mutation space—overcoming a major limitation of OpFuzz, the state-of-the-art fuzzer for SMT solvers. We have realized Generative Type-Aware Mutation in a practical SMT solver bug hunting tool, TypeFuzz. During our testing period with TypeFuzz, we reported over 237 bugs in the state-of-the-art SMT solvers Z3 and CVC4. Among these, 189 bugs were confirmed and 176 bugs were fixed. Most notably, we found 18 soundness bugs in CVC4’s default mode alone. Several of them were two years latent (7/18). CVC4 has been proved to be a very stable SMT solver and has resisted several fuzzing campaigns.

References

  1. Domenico Amalfitano, Nicola Amatucci, Anna Rita Fasolino, Porfirio Tramontana, Emily Kowalczyk, and Atif M. Memon. 2015. Exploiting the Saturation Effect in Automatic Random Testing of Android Applications. In MOBILESoft ’15. 33–43. isbn:9781479919345 https://doi.org/10.1109/MobileSoft.2015.11 Google ScholarGoogle ScholarCross RefCross Ref
  2. Cornelius Aschermann, Tommaso Frassetto, Thorsten Holz, Patrick Jauernig, Ahmad-Reza Sadeghi, and Daniel Teuchert. 2019. NAUTILUS: Fishing for Deep Bugs with Grammars. In NDSS ’19. https://doi.org/10.14722/ndss.2019.23412 Google ScholarGoogle ScholarCross RefCross Ref
  3. Clark Barrett, Christopher L. Conway, Morgan Deters, Liana Hadarean, Dejan Jovanović, Tim King, Andrew Reynolds, and Cesare Tinelli. 2011. CVC4. In CAV ’11. 171–177. https://doi.org/10.1007/978-3-642-22110-1_14 Google ScholarGoogle ScholarCross RefCross Ref
  4. Clark Barrett, Pascal Fontaine, and Cesare Tinelli. 2019. The Satisfiability Modulo Theories Library (SMT-LIB). www.SMT-LIB.orgGoogle ScholarGoogle Scholar
  5. Clark Barrett, Aaron Stump, and Cesare Tinelli. 2010. The SMT-LIB Standard: Version 2.0. In SMT ’10.Google ScholarGoogle Scholar
  6. Dmitry Blotsky, Federico Mora, Murphy Berzish, Yunhui Zheng, Ifaz Kabir, and Vijay Ganesh. 2018. StringFuzz: A Fuzzer for String Solvers. In CAV ’18. 45–51. https://doi.org/10.1007/978-3-319-96142-2_6 Google ScholarGoogle ScholarCross RefCross Ref
  7. Robert Brummayer and Armin Biere. 2009. Boolector: An Efficient SMT Solver for Bit-Vectors and Arrays. In TACAS ’09. 174–177. https://doi.org/10.1007/978-3-642-00768-2_16 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Robert Brummayer and Armin Biere. 2009. Fuzzing and delta-debugging SMT solvers. In SMT ’09. 1–5. https://doi.org/10.1145/1670412.1670413 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Alexandra Bugariu and Peter Müller. 2020. Automatically Testing String Solvers. In ICSE ’20. https://doi.org/10.1145/3377811.3380398 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. 2008. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. In OSDI ’08. 209–224. https://www.usenix.org/conference/osdi-08/klee-unassisted-and-automatic-generation-high-coverage-tests-complex-systems Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. The International SMT Competition.. 2021. SMT-COMP. https://smt-comp.github.io/2021/Google ScholarGoogle Scholar
  12. Leonardo de Moura and Nikolaj Bjørner. 2008. Z3: An Efficient SMT Solver. In TACAS ’08. 337–340. https://doi.org/10.1007/978-3-540-78800-3_24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Rob DeLine and Rustan Leino. 2005. BoogiePL: A Typed Procedural Language for Checking Object-Oriented Programs.Google ScholarGoogle Scholar
  14. David Detlefs, Greg Nelson, and James B. Saxe. 2005. Simplify: A Theorem Prover for Program Checking. JACM, 365–473. https://doi.org/10.1145/1066100.1066102 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Patrice Godefroid, Nils Klarlund, and Koushik Sen. 2005. DART: directed automated random testing. In PLDI ’05. 213–223. https://doi.org/10.1145/1064978.1065036 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Patrice Godefroid, Hila Peleg, and Rishabh Singh. 2017. Learn Fuzz: Machine learning for input fuzzing. In ASE ’17. 50–59. https://doi.org/10.1109/ASE.2017.8115618 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. K. V. Hanford. 1970. Automatic generation of test cases. IBM Systems Journal, 9, 4 (1970), 242–257. https://doi.org/10.1147/sj.94.0242 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Gereon Kremer. 2021. pyDelta: delta debugging for SMT-LIB. https://github.com/nafur/pydeltaGoogle ScholarGoogle Scholar
  19. Leonidas Lampropoulos, Michael Hicks, and Benjamin C. Pierce. 2019. Coverage guided, property based testing. In OOPSLA ’19. 1–29. https://doi.org/10.1145/3360607 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Vsevolod Livinskii, Dmitry Babokin, and John Regehr. 2020. Random Testing for C and C++ Compilers with YARPGen. In OOPSLA ’20. 1–25. https://doi.org/10.1145/3428264 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Aina Niemetz, Mathias Preiner, and Armin Biere. 2017. Model-based API testing for SMT solvers. In SMT ’17. 10.Google ScholarGoogle Scholar
  22. Muhammad Numair Mansur, Maria Christakis, Valentin Wüstholz, and Fuyuan Zhang. 2020. Detecting Critical Bugs in SMT Solvers Using Blackbox Mutational Fuzzing. In FSE ’20. 701–712. https://doi.org/10.1145/3368089.3409763 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Van-Thuan Pham, Marcel Böhme, Andrew Edward Santosa, Alexandru Razvan Caciulescu, and Abhik Roychoudhury. 2019. Smart greybox fuzzing. TSE ’19, https://doi.org/10.1109/TSE.2019.2941681 Google ScholarGoogle ScholarCross RefCross Ref
  24. John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. 2012. Test-case Reduction for C Compiler Bugs. In PLDI ’12. 335–346. https://doi.org/10.1145/2345156.2254104 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Manuel Rigger and Zhendong Su. 2020. Testing Database Engines via Pivoted Query Synthesis. In OSDI ’20. 667–682. https://www.usenix.org/conference/osdi20/presentation/rigger Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Armando Solar-Lezama. 2008. Program Synthesis by Sketching. Ph.D. Dissertation. isbn:9781109097450 https://dl.acm.org/doi/10.5555/1714168 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Emina Torlak and Rastislav Bodik. 2014. A lightweight symbolic virtual machine for solver-aided host languages. In PLDI ’14. 530–541. https://doi.org/10.1145/2666356.2594340 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Junjie Wang, Bihuan Chen, Lei Wei, and Yang Liu. 2019. Superion: grammar-aware greybox fuzzing. In ICSE ’19. 724–735. https://doi.org/10.1109/ICSE.2019.00081 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. On the Unusal Effectiveness of Type-Aware Operator Mutation. OOPSLA ’20, https://doi.org/10.1145/3428261 Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2020. Validating SMT Solvers via Semantic Fusion. PLDI ’20, https://doi.org/10.1145/3385412.3385985 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Dominik Winterer, Chengyu Zhang, and Zhendong Su. 2021. yinyang: a fuzzer for SMT solvers. https://github.com/testsmt/yinyangGoogle ScholarGoogle Scholar
  32. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In PLDI ’11. 283–294. https://doi.org/10.1145/1993316.1993532 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Michal Zalewski. 2021. american fuzzy lop. https://lcamtuf.coredump.cx/afl/Google ScholarGoogle Scholar
  34. Qirun Zhang, Chengnian Sun, and Zhendong Su. 2017. Skeletal program enumeration for rigorous compiler testing. In PLDI ’17. 347–361. https://doi.org/10.1145/3140587.3062379 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Generative type-aware mutation for testing SMT solvers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!