Abstract
Aggressive random testing tools ("fuzzers") are impressively effective at finding compiler bugs. For example, a single test-case generator has resulted in more than 1,700 bugs reported for a single JavaScript engine. However, fuzzers can be frustrating to use: they indiscriminately and repeatedly find bugs that may not be severe enough to fix right away. Currently, users filter out undesirable test cases using ad hoc methods such as disallowing problematic features in tests and grepping test results. This paper formulates and addresses the fuzzer taming problem: given a potentially large number of random test cases that trigger failures, order them such that diverse, interesting test cases are highly ranked. Our evaluation shows our ability to solve the fuzzer taming problem for 3,799 test cases triggering 46 bugs in a C compiler and 2,603 test cases triggering 28 bugs in a JavaScript engine.
- James H. Andrews, Alex Groce, Melissa Weston, and Ru-Gang Xu. Random test run length and effectiveness. In Proc. ASE, pages 19--28, September 2008. Google Scholar
Digital Library
- Abhishek Arya and Cris Neckar. Fuzzing for security, April 2012. http://blog.chromium.org/2012/04/fuzzing-for-security.html.Google Scholar
- Mariano Ceccato, Alessandro Marchetto, Leonardo Mariani, Cu D. Nguyen, and Paolo Tonella. An empirical study about the effectiveness of debugging when random test cases are used. In Proc. ICSE, pages 452--462, June 2012. Google Scholar
Digital Library
- Silvio Cesare and Yang Xiang. Malware variant detection using similarity search over sets of control flow graphs. In Proc. TRUSTCOM, pages 181--189, November 2011. Google Scholar
Digital Library
- Sagar Chaki, Alex Groce, and Ofer Strichman. Explaining abstract counterexamples. In Proc. FSE, pages 73--82, 2004. Google Scholar
Digital Library
- Koen Claessen and John Hughes. QuickCheck: a lightweight tool for random testing of Haskell programs. In Proc. ICFP, pages 268--279, 2000. Google Scholar
Digital Library
- Holger Cleve and Andreas Zeller. Locating causes of program failures. In Proc.\ ICSE, pages 342--351, May 2005. Google Scholar
Digital Library
- Shai Fine and Yishay Mansour. Active sampling for multiple output identification. Machine Learning, 69(2--3):213--228, 2007. Google Scholar
Digital Library
- Patrick Francis, David Leon, Melinda Minch, and Andy Podgurski. Tree-based methods for classifying software failures. In Proc. ISSRE, pages 451--462, November 2004. Google Scholar
Digital Library
- Teofilo F. Gonzalez. Clustering to minimize the maximum intercluster distance. Theoretical Computer Science, 38:293--306, 1985.Google Scholar
Cross Ref
- Alex Groce. Error explanation with distance metrics. In Proc. TACAS, pages 108--122, March 2004.Google Scholar
Cross Ref
- Alex Groce, Gerard Holzmann, and Rajeev Joshi. Randomized differential testing as a prelude to formal verification. In Proc. ICSE, pages 621--631, May 2007. Google Scholar
Digital Library
- Alex Groce, Chaoqiang Zhang, Eric Eide, Yang Chen, and John Regehr. Swarm testing. In Proc. ISSTA, pages 78--88, July 2012.balance\phantom. Google Scholar
Digital Library
- Christian Holler, Kim Herzig, and Andreas Zeller. Fuzzing with code fragments. In Proc. USENIX Security, pages 445--458, August 2012. Google Scholar
Digital Library
- James A. Jones, James F. Bowring, and Mary Jean Harrold. Debugging in parallel. In Proc. ISSTA, pages 16--26, July 2007. Google Scholar
Digital Library
- James A. Jones and Mary Jean Harrold. Empirical evaluation of the Tarantula automatic fault-localization technique. In Proc. ASE, pages 273--282, November 2005. Google Scholar
Digital Library
- James A. Jones, Mary Jean Harrold, and John Stasko. Visualization of test information to assist fault localization. In Proc. ICSE, pages 467--477, May 2002. Google Scholar
Digital Library
- Yungbum Jung, Jaehwang Kim, Jaeho Shin, and Kwangkeun Yi. Taming false alarms from a domain-unaware C analyzer by a Bayesian statistical post analysis. In Proc. SAS, pages 203--217, September 2005. Google Scholar
Digital Library
- Ted Kremenek and Dawson Engler. Z-ranking: using statistical analysis to counter the impact of static analysis approximations. In Proc. SAS, pages 295--315, June 2003. Google Scholar
Digital Library
- Vladimir I. Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals. Soviet Physics Doklady, 10:707--710, 1966.Google Scholar
- Ben Liblit, Alex Aiken, Alice X. Zheng, and Michael I. Jordan. Bug isolation via remote program sampling. In Proc. PLDI, pages 141--154, June 2003. Google Scholar
Digital Library
- Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken, and Michael I. Jordan. Scalable statistical bug isolation. In Proc. PLDI, pages 15--26, June 2005. Google Scholar
Digital Library
- Chao Liu and Jiawei Han. Failure proximity: a fault localization-based approach. In Proc. FSE, pages 46--56, November 2006. Google Scholar
Digital Library
- William M. McKeeman. Differential testing for software. Digital Technical Journal, 10(1):100--107, December 1998.Google Scholar
- Nicholas Nethercote and Julian Seward. Valgrind: a framework for heavyweight dynamic binary instrumentation. In Proc. PLDI, pages 89--100, June 2007. Google Scholar
Digital Library
- Dan Pelleg and Andrew Moore. Active learning for anomaly and rare-category detection. In Advances in Neural Information Processing Systems 18, December 2004.Google Scholar
- Dan Pelleg and Andrew W. Moore. X-means: Extending K-means with efficient estimation of the number of clusters. In Proc. ICML, pages 727--734, June/July 2000. Google Scholar
Digital Library
- Andy Podgurski, David Leon, Patrick Francis, Wes Masri, Melinda Minch, Jiayang Sun, and Bin Wang. Automated support for classifying software failure reports. In Proc. ICSE, pages 465--475, May 2003. Google Scholar
Digital Library
- John Regehr, Yang Chen, Pascal Cuoq, Eric Eide, Chucky Ellison, and Xuejun Yang. Test-case reduction for C compiler bugs. In Proc. PLDI, pages 335--346, June 2012. Google Scholar
Digital Library
- Manos Renieris and Steven Reiss. Fault localization with nearest neighbor queries. In Proc. ASE, pages 30--39, October 2003.Google Scholar
Cross Ref
- Jesse Ruderman. Introducing jsfunfuzz. http://www.squarefree.com/2007/08/02/introducing-jsfunfuzz/.Google Scholar
- Jesse Ruderman. Mozilla bug 349611. https://bugzilla.mozilla.org/show_bug.cgi?id=349611 (A meta-bug containing all bugs found using jsfunfuzz.).Google Scholar
- Jesse Ruderman. How my DOM fuzzer ignores known bugs, 2010. http://www.squarefree.com/2010/11/21/how-my-dom-fuzzer-ignores-known-bugs/.Google Scholar
- G. Salton, A. Wong, and C. S. Yang. A vector space model for automatic indexing. CACM, 18(11):613--620, November 1975. Google Scholar
Digital Library
- Saul Schleimer, Daniel S. Wilkerson, and Alex Aiken. Winnowing: local algorithms for document fingerprinting. In Proc. SIGMOD, pages 76--85, June 2003. Google Scholar
Digital Library
- Alexander Strehl and Joydeep Ghosh. Cluster ensembles--a knowledge reuse framework for combining multiple partitions. The Journal of Machine Learning Research, 3:583--617, 2003. Google Scholar
Digital Library
- Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. Towards more accurate retrieval of duplicate bug reports. In Proc. ASE, pages 253--262, November 2011. Google Scholar
Digital Library
- Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. A discriminative model approach for accurate duplicate bug report retrieval. In Proc. ICSE, pages 45--54, May 2010. Google Scholar
Digital Library
- Vipindeep Vangala, Jacek Czerwonka, and Phani Talluri. Test case comparison and clustering using program profiles and static execution. In Proc. ESEC/FSE, pages 293--294, August 2009. Google Scholar
Digital Library
- Pavan Vatturi and Weng-Keen Wong. Category detection using hierarchical mean shift. In Proc. KDD, pages 847--856, June/July 2009. Google Scholar
Digital Library
- Andrew Walenstein, Mohammad El-Ramly, James R. Cordy, William S. Evans, Kiarash Mahdavi, Markus Pizka, Ganesan Ramalingam, and Jürgen Wolff von Gudenberg. Similarity in programs. In Duplication, Redundancy, and Similarity in Software, Dagstuhl Seminar Proceedings, July 2006.Google Scholar
- Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. An approach to detecting duplicate bug reports using natural language and execution information. In Proc. ICSE, pages 461--470, May 2008. Google Scholar
Digital Library
- David B. Whalley. Automatic isolation of compiler errors. TOPLAS, 16(5):1648--1659, September 1994. Google Scholar
Digital Library
- Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. Finding and understanding bugs in C compilers. In Proc. PLDI, pages 283--294, June 2011. Google Scholar
Digital Library
- Andreas Zeller and Ralf Hildebrandt. Simplifying and isolating failure-inducing input. IEEE TSE, 28(2):183--200, February 2002. Google Scholar
Digital Library
Index Terms
Taming compiler fuzzers
Recommendations
Test-case reduction for C compiler bugs
PLDI '12: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and ImplementationTo report a compiler bug, one must often find a small test case that triggers the bug. The existing approach to automated test-case reduction, delta debugging, works by removing substrings of the original input; the result is a concatenation of ...
Taming compiler fuzzers
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and ImplementationAggressive random testing tools ("fuzzers") are impressively effective at finding compiler bugs. For example, a single test-case generator has resulted in more than 1,700 bugs reported for a single JavaScript engine. However, fuzzers can be frustrating ...
Test-case reduction for C compiler bugs
PLDI '12To report a compiler bug, one must often find a small test case that triggers the bug. The existing approach to automated test-case reduction, delta debugging, works by removing substrings of the original input; the result is a concatenation of ...







Comments