skip to main content
article
Public Access

Skeletal program enumeration for rigorous compiler testing

Published:14 June 2017Publication History
Skip Abstract Section

Abstract

A program can be viewed as a syntactic structure P (syntactic skeleton) parameterized by a collection of identifiers V (variable names). This paper introduces the skeletal program enumeration (SPE) problem: Given a syntactic skeleton P and a set of variables V , enumerate a set of programs P exhibiting all possible variable usage patterns within P. It proposes an effective realization of SPE for systematic, rigorous compiler testing by leveraging three important observations: (1) Programs with different variable usage patterns exhibit diverse control- and data-dependence, and help exploit different compiler optimizations; (2) most real compiler bugs were revealed by small tests (i.e., small-sized P) — this “small-scope” observation opens up SPE for practical compiler validation; and (3) SPE is exhaustive w.r.t. a given syntactic skeleton and variable set, offering a level of guarantee absent from all existing compiler testing techniques.

The key challenge of SPE is how to eliminate the enormous amount of equivalent programs w.r.t. α-conversion. Our main technical contribution is a novel algorithm for computing the canonical (and smallest) set of all non-α-equivalent programs. To demonstrate its practical utility, we have applied the SPE technique to test C/C++ compilers using syntactic skeletons derived from their own regression test-suites. Our evaluation results are extremely encouraging. In less than six months, our approach has led to 217 confirmed GCC/Clang bug reports, 119 of which have already been fixed, and the majority are long latent despite extensive prior testing efforts. Our SPE algorithm also provides six orders of magnitude reduction. Moreover, in three weeks, our technique has found 29 CompCert crashing bugs and 42 bugs in two Scala optimizing compilers. These results demonstrate our SPE technique’s generality and further illustrate its effectiveness.

References

  1. Dotty Compiler. http://dotty.epfl.ch/.Google ScholarGoogle Scholar
  2. Perennial, Inc. Perennial C Compiler Validation Suite. http: //www.peren.com/pages/cvsa_set.htm.Google ScholarGoogle Scholar
  3. Plum Hall, Inc. The Plum Hall Validation Suite for C. http: //www.plumhall.com/stec.html.Google ScholarGoogle Scholar
  4. Scala Compiler. http://www.scala-lang.org/.Google ScholarGoogle Scholar
  5. A. Balestrat. CCG. https://github.com/Mrktn/ccg.Google ScholarGoogle Scholar
  6. A. S. Boujarwah and K. Saleh. Compiler test case generation methods: a survey and assessment. Information & Software Technology, 39(9):617–625, 1997.Google ScholarGoogle ScholarCross RefCross Ref
  7. C. Boyapati, S. Khurshid, and D. Marinov. Korat: automated testing based on Java predicates. In ISSTA, pages 123–133, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. F. Briggs and M. O’Neill. Functional genetic programming and exhaustive program search with combinator expressions. KES Journal, 12(1):47–68, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Chen, A. Groce, C. Zhang, W. Wong, X. Fern, E. Eide, and J. Regehr. Taming compiler fuzzers. In PLDI, pages 197–208, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. K. Claessen and J. Hughes. QuickCheck: a lightweight tool for random testing of Haskell programs. In ICFP, pages 268–279, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. K. Claessen, J. Duregård, and M. H. Palka. Generating constrained random data with uniform distribution. J. Funct. Program., 25, 2015.Google ScholarGoogle Scholar
  12. B. Daniel, D. Dig, K. Garcia, and D. Marinov. Automated testing of refactoring engines. In FSE, pages 185–194, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. M. Daniel S. Wilkerson and S. Goldsmith. Berkeley Delta. http://delta.stage.tigris.org/.Google ScholarGoogle Scholar
  14. N. G. De Bruijn. Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the Church-Rosser theorem. In Indagationes Mathematicae, volume 75, pages 381–392, 1972.Google ScholarGoogle ScholarCross RefCross Ref
  15. R. A. DeMillo, R. J. Lipton, and F. G. Sayward. Hints on test data selection: Help for the practicing programmer. IEEE Computer, 11(4):34–41, 1978. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Duregård, P. Jansson, and M. Wang. Feat: functional enumeration of algebraic types. In Haskell, pages 61–72, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Ellison and G. Rosu. An executable formal semantics of C with applications. In POPL, pages 533–544, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Fetscher, K. Claessen, M. H. Palka, J. Hughes, and R. B. Findler. Making random judgments: Automatically generating well-typed terms from the definition of a type-system. In ESOP, pages 383–405, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. P. Galeotti, N. Rosner, C. G. L. Pombo, and M. F. Frias. TACO: efficient SAT-based bounded verification using symmetry breaking and tight bounds. IEEE Trans. Software Eng., 39(9):1283–1307, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Gligoric, T. Gvero, V. Jagannath, S. Khurshid, V. Kuncak, and D. Marinov. Test generation through programming in UDITA. In ICSE, pages 225–234, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. K. Grygiel and P. Lescanne. Counting and generating lambda terms. J. Funct. Program., 23(5):594–628, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  22. K. Grygiel and P. Lescanne. Counting and generating terms in the binary lambda calculus. J. Funct. Program., 25, 2015.Google ScholarGoogle Scholar
  23. R. G. Hamlet. Testing programs with the aid of a compiler. IEEE Trans. Software Eng., 3(4):279–290, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. Holler, K. Herzig, and A. Zeller. Fuzzing with code fragments. In USENIX Security, pages 445–458, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Katayama. Systematic search for lambda expressions. In TFP, pages 111–126, 2005.Google ScholarGoogle Scholar
  26. S. Katayama. Efficient exhaustive generation of functional programs using Monte-Carlo search with iterative deepening. In Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence, pages 199–210, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Katayama. An analytical inductive functional programming system that avoids unintended programs. In PEPM, pages 43–52, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. S. Khurshid and D. Marinov. TestEra: Specification-based testing of Java programs using SAT. Autom. Softw. Eng., 11(4): 403–434, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. D. E. Knuth. The art of computer programming. Vol. 4A., Combinatorial algorithms. Part 1. Addison-Wesley, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. D. L. Kreher and D. R. Stinson. Combinatorial algorithms: generation, enumeration, and search. CRC Press, London, New York, 1999.Google ScholarGoogle Scholar
  31. I. Kuraj, V. Kuncak, and D. Jackson. Programming with enumerable sets of structures. In OOPSLA, pages 37–56, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. V. Le, M. Afshari, and Z. Su. Compiler validation via equivalence modulo inputs. In PLDI, page 25, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. V. Le, C. Sun, and Z. Su. Finding deep compiler bugs via guided stochastic program mutation. In OOPSLA, pages 386– 399, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. V. Le, C. Sun, and Z. Su. Randomized stress-testing of linktime optimizers. In ISSTA, pages 327–337, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. X. Leroy. Formal certification of a compiler back-end or: programming a compiler with a proof assistant. In POPL, pages 42–54, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Lescanne. On counting untyped lambda terms. Theor. Comput. Sci., 474:80–97, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C. Lidbury, A. Lascu, N. Chong, and A. F. Donaldson. Manycore compiler fuzzing. In PLDI, pages 65–76, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. T. Mansour and G. Nassar. Gray codes, loopless algorithm and partitions. Journal of Mathematical Modelling and Algorithms, 7(3):291–310, 2008.Google ScholarGoogle ScholarCross RefCross Ref
  39. T. Mansour, G. Nassar, and V. Vajnovszki. Loop-free Gray code algorithm for the e-restricted growth functions. Information Processing Letters, 111(11):541–544, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. W. M. McKeeman. Differential testing for software. Digital Technical Journal, 10(1):100–107, 1998.Google ScholarGoogle Scholar
  41. E. Nagai, A. Hashimoto, and N. Ishiura. Reinforcing random testing of arithmetic optimization of C compilers by scaling up size and number of expressions. IPSJ Trans. System LSI Design Methodology, 7:91–100, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  42. G. C. Necula. Translation validation for an optimizing compiler. In PLDI, pages 83–94, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. F. Nielson, H. R. Nielson, and C. Hankin. Principles of program analysis. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. F. W. J. Olver, D. W. Lozier, R. F. Boisvert, and C. W. Clark, editors. NIST Handbook of Mathematical Functions. Cambridge University Press, New York, NY, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. M. H. Palka, K. Claessen, A. Russo, and J. Hughes. Testing an optimising compiler by generating random lambda terms. In AST, pages 91–97, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. A. Pnueli, M. Siegel, and E. Singerman. Translation validation. In TACAS, pages 151–166, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. J. Regehr, Y. Chen, P. Cuoq, E. Eide, C. Ellison, and X. Yang. Test-case reduction for C compiler bugs. In PLDI, pages 335– 346, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. N. Rosner, V. S. Bengolea, P. Ponzio, S. A. Khalek, N. Aguirre, M. F. Frias, and S. Khurshid. Bounded exhaustive test input generation from hybrid invariants. In OOPSLA, pages 655–674, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. C. Runciman, M. Naylor, and F. Lindblad. Smallcheck and lazy smallcheck: automatic exhaustive testing for small values. In Haskell, pages 37–48, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. V. Senni and F. Fioravanti. Generation of test data structures using constraint logic programming. In TAP, pages 115–131, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. K. J. Sullivan, J. Yang, D. Coppit, S. Khurshid, and D. Jackson. Software assurance by bounded exhaustive testing. In ISSTA, pages 133–142, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. C. Sun, V. Le, and Z. Su. Finding and analyzing compiler warning defects. In ICSE, pages 203–213, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. C. Sun, V. Le, and Z. Su. Finding compiler bugs via live code mutation. In OOPSLA, pages 849–863, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. C. Sun, V. Le, Q. Zhang, and Z. Su. Toward understanding compiler bugs in GCC and LLVM. In ISSTA, pages 294–305, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. P. Tarau. On type-directed generation of lambda terms. In ICLP (Technical Communications), 2015.Google ScholarGoogle Scholar
  56. W. Visser, C. S. Pasareanu, and S. Khurshid. Test input generation with Java PathFinder. In ISSTA, pages 97–107, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. X. Yang, Y. Chen, E. Eide, and J. Regehr. Finding and understanding bugs in C compilers. In PLDI, pages 283–294, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Skeletal program enumeration for rigorous compiler testing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 52, Issue 6
          PLDI '17
          June 2017
          708 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3140587
          Issue’s Table of Contents
          • cover image ACM Conferences
            PLDI 2017: Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation
            June 2017
            708 pages
            ISBN:9781450349888
            DOI:10.1145/3062341

          Copyright © 2017 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 June 2017

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!