skip to main content
research-article

Taming compiler fuzzers

Published:16 June 2013Publication History
Skip Abstract Section

Abstract

Aggressive random testing tools ("fuzzers") are impressively effective at finding compiler bugs. For example, a single test-case generator has resulted in more than 1,700 bugs reported for a single JavaScript engine. However, fuzzers can be frustrating to use: they indiscriminately and repeatedly find bugs that may not be severe enough to fix right away. Currently, users filter out undesirable test cases using ad hoc methods such as disallowing problematic features in tests and grepping test results. This paper formulates and addresses the fuzzer taming problem: given a potentially large number of random test cases that trigger failures, order them such that diverse, interesting test cases are highly ranked. Our evaluation shows our ability to solve the fuzzer taming problem for 3,799 test cases triggering 46 bugs in a C compiler and 2,603 test cases triggering 28 bugs in a JavaScript engine.

References

  1. James H. Andrews, Alex Groce, Melissa Weston, and Ru-Gang Xu. Random test run length and effectiveness. In Proc. ASE, pages 19--28, September 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abhishek Arya and Cris Neckar. Fuzzing for security, April 2012. http://blog.chromium.org/2012/04/fuzzing-for-security.html.Google ScholarGoogle Scholar
  3. Mariano Ceccato, Alessandro Marchetto, Leonardo Mariani, Cu D. Nguyen, and Paolo Tonella. An empirical study about the effectiveness of debugging when random test cases are used. In Proc. ICSE, pages 452--462, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Silvio Cesare and Yang Xiang. Malware variant detection using similarity search over sets of control flow graphs. In Proc. TRUSTCOM, pages 181--189, November 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Sagar Chaki, Alex Groce, and Ofer Strichman. Explaining abstract counterexamples. In Proc. FSE, pages 73--82, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proc. ICFP, pages 268--279, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Holger Cleve and Andreas Zeller. Locating causes of program failures. In Proc.\ ICSE, pages 342--351, May 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Shai Fine and Yishay Mansour. Active sampling for multiple output identification. Machine Learning, 69(2--3):213--228, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Patrick Francis, David Leon, Melinda Minch, and Andy Podgurski. Tree-based methods for classifying software failures. In Proc. ISSRE, pages 451--462, November 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293--306, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  11. Alex Groce. Error explanation with distance metrics. In Proc. TACAS, pages 108--122, March 2004.Google ScholarGoogle ScholarCross RefCross Ref
  12. Alex Groce, Gerard Holzmann, and Rajeev Joshi. Randomized differential testing as a prelude to formal verification. In Proc. ICSE, pages 621--631, May 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Alex Groce, Chaoqiang Zhang, Eric Eide, Yang Chen, and John Regehr. Swarm testing. In Proc. ISSTA, pages 78--88, July 2012.balance\phantom. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Christian Holler, Kim Herzig, and Andreas Zeller. Fuzzing with code fragments. In Proc. USENIX Security, pages 445--458, August 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. James A. Jones, James F. Bowring, and Mary Jean Harrold. Debugging in parallel. In Proc. ISSTA, pages 16--26, July 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. James A. Jones and Mary Jean Harrold. Empirical evaluation of the Tarantula automatic fault-localization technique. In Proc. ASE, pages 273--282, November 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. James A. Jones, Mary Jean Harrold, and John Stasko. Visualization of test information to assist fault localization. In Proc. ICSE, pages 467--477, May 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Yungbum Jung, Jaehwang Kim, Jaeho Shin, and Kwangkeun Yi. Taming false alarms from a domain-unaware C analyzer by a Bayesian statistical post analysis. In Proc. SAS, pages 203--217, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Ted Kremenek and Dawson Engler. Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In Proc. SAS, pages 295--315, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10:707--710, 1966.Google ScholarGoogle Scholar
  21. Ben Liblit, Alex Aiken, Alice X. Zheng, and Michael I. Jordan. Bug isolation via remote program sampling. In Proc. PLDI, pages 141--154, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. Scalable statistical bug isolation. In Proc. PLDI, pages 15--26, June 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Chao Liu and Jiawei Han. Failure proximity: a fault localization-based approach. In Proc. FSE, pages 46--56, November 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. William M. McKeeman. Differential testing for software. Digital Technical Journal, 10(1):100--107, December 1998.Google ScholarGoogle Scholar
  25. Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proc. PLDI, pages 89--100, June 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Dan Pelleg and Andrew Moore. Active learning for anomaly and rare-category detection. In Advances in Neural Information Processing Systems 18, December 2004.Google ScholarGoogle Scholar
  27. Dan Pelleg and Andrew W. Moore. X-means: Extending K-means with efficient estimation of the number of clusters. In Proc. ICML, pages 727--734, June/July 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch, Jiayang Sun, and Bin Wang. Automated support for classifying software failure reports. In Proc. ICSE, pages 465--475, May 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. Test-case reduction for C compiler bugs. In Proc. PLDI, pages 335--346, June 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Manos Renieris and Steven Reiss. Fault localization with nearest neighbor queries. In Proc. ASE, pages 30--39, October 2003.Google ScholarGoogle ScholarCross RefCross Ref
  31. Jesse Ruderman. Introducing jsfunfuzz. http://www.squarefree.com/2007/08/02/introducing-jsfunfuzz/.Google ScholarGoogle Scholar
  32. Jesse Ruderman. Mozilla bug 349611. https://bugzilla.mozilla.org/show_bug.cgi?id=349611 (A meta-bug containing all bugs found using jsfunfuzz.).Google ScholarGoogle Scholar
  33. Jesse Ruderman. How my DOM fuzzer ignores known bugs, 2010. http://www.squarefree.com/2010/11/21/how-my-dom-fuzzer-ignores-known-bugs/.Google ScholarGoogle Scholar
  34. G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. CACM, 18(11):613--620, November 1975. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. Winnowing: local algorithms for document fingerprinting. In Proc. SIGMOD, pages 76--85, June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alexander Strehl and Joydeep Ghosh. Cluster ensembles--a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3:583--617, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. Towards more accurate retrieval of duplicate bug reports. In Proc. ASE, pages 253--262, November 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In Proc. ICSE, pages 45--54, May 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Vipindeep Vangala, Jacek Czerwonka, and Phani Talluri. Test case comparison and clustering using program profiles and static execution. In Proc. ESEC/FSE, pages 293--294, August 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Pavan Vatturi and Weng-Keen Wong. Category detection using hierarchical mean shift. In Proc. KDD, pages 847--856, June/July 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Andrew Walenstein, Mohammad El-Ramly, James R. Cordy, William S. Evans, Kiarash Mahdavi, Markus Pizka, Ganesan Ramalingam, and Jürgen Wolff von Gudenberg. Similarity in programs. In Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings, July 2006.Google ScholarGoogle Scholar
  42. Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. An approach to detecting duplicate bug reports using natural language and execution information. In Proc. ICSE, pages 461--470, May 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. David B. Whalley. Automatic isolation of compiler errors. TOPLAS, 16(5):1648--1659, September 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In Proc. PLDI, pages 283--294, June 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Andreas Zeller and Ralf Hildebrandt. Simplifying and isolating failure-inducing input. IEEE TSE, 28(2):183--200, February 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Taming compiler fuzzers

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!