Abstract
How does one test a language implementation with QuickCheck (aka. property-based testing)? One approach is to generate programs following the grammar of the language. But in a statically-typed language such as OCaml too many of these candidate programs will be rejected as ill-typed by the type checker. As a refinement Pałka et al. propose to generate programs in a goal-directed, bottom-up reading up of the typing relation. We have written such a generator. However many of the generated programs has output that depend on the evaluation order, which is commonly under-specified in languages such as OCaml, Scheme, C, C++, etc. In this paper we develop a type and effect system for conservatively detecting evaluation-order dependence and propose its goal-directed reading as a generator of programs that are independent of evaluation order. We illustrate the approach by generating programs to test OCaml's two compiler backends against each other and report on a number of bugs we have found doing so.
- Torben Amtoft, Hanne Riis Nielson, and Flemming Nielson. 1999. Type and effect systems - behaviours for concurrency. Imperial College Press. Google Scholar
Cross Ref
- John Banning. 1979. An Efficient Way to Find Side Effects of Procedure Calls and Aliases of Variables. In Proceedings of the Sixth Annual ACM Symposium on Principles of Programming Languages, Barry K. Rosen (Ed.). San Antonio, Texas, 29–41. Google Scholar
Digital Library
- Bruno Blanchet, Patrick Cousot, Radhia Cousot, Jérôme Feret, Laurent Mauborgne, Antoine Miné, David Monniaux, and Xavier Rival. 2003. A static analyzer for large safety-critical software. In Proceedings of the ACM SIGPLAN 2003 Conference on Programming Languages Design and Implementation, Ron Cytron and Rajiv Gupta (Eds.). San Diego, California, 196–207. Google Scholar
Digital Library
- Koen Claessen, Jonas Duregård, and Michal H. Pałka. 2015. Generating constrained random data with uniform distribution. Journal of Functional Programming 25 (2015). Google Scholar
Cross Ref
- Koen Claessen and John Hughes. 2000. QuickCheck: A Lightweight Tool for Random Testing of Haskell Programs. In Proceedings of the Fifth ACM SIGPLAN International Conference on Functional Programming (ICFP’00), Philip Wadler (Ed.). Montréal, Canada, 53–64. Google Scholar
Digital Library
- Patrick Cousot and Radhia Cousot. 1977. Abstract Interpretation: a unified lattice model for static analysis of programs by construction or approximation of fixpoints. In Proceedings of the Fourth Annual ACM Symposium on Principles of Programming Languages, Ravi Sethi (Ed.). Los Angeles, California, 238–252. Google Scholar
Digital Library
- Burke Fetscher, Koen Claessen, Michal H. Pałka, John Hughes, and Robert Bruce Findler. 2015. Making Random Judgments: Automatically Generating Well-Typed Terms from the Definition of a Type-System. In Programming Languages and Systems, 24th European Symposium on Programming, ESOP 2015 (Lecture Notes in Computer Science), Jan Vitek (Ed.), Vol. 9032. Springer-Verlag, 383–405.Google Scholar
- David K. Gifford and John M. Lucassen. 1986. Integrating Functional and Imperative Programming. In Proceedings of the 1986 ACM Conference on Lisp and Functional Programming, William L. Scherlis and John H. Williams (Eds.). Cambridge, Massachusetts, 28–38. Google Scholar
Digital Library
- James Gosling, Bill Joy, Guy Steele, and Gilad Bracha. 2000. Java Language Specification, Second Edition: The Java Series (2nd ed.). Addison-Wesley, Boston, MA, USA.Google Scholar
- John Hughes. 2016. Experiences with QuickCheck: Testing the Hard Stuff and Staying Sane. In A List of Successes That Can Change the World - Essays Dedicated to Philip Wadler on the Occasion of His 60th Birthday (Lecture Notes in Computer Science), Sam Lindley, Conor McBride, Philip W. Trinder, and Donald Sannella (Eds.), Vol. 9600. Springer-Verlag, 169–186.Google Scholar
- John Hughes, Ulf Norell, Nicholas Smallbone, and Thomas Arts. 2016. Find more bugs with QuickCheck!. In Proceedings of the 11th International Workshop on Automation of Software Test, [email protected] 2016, Austin, Texas, USA, May 14-15, 2016, Christof J. Budnik, Gordon Fraser, and Francesca Lonetti (Eds.). ACM, 71–77.Google Scholar
Digital Library
- Patrick Kasting and Mathias Nygaard Justesen. 2016. Quickchecking OCaml compilers by generating lambda terms. Unpublished course project report. Technical University of Denmark, Lyngby, Denmark.Google Scholar
- Vu Le, Mehrdad Afshari, and Zhendong Su. 2014. Compiler validation via equivalence modulo inputs. In Proceedings of the ACM SIGPLAN 2014 Conference on Programming Languages Design and Implementation, PLDI’14, Michael F. P. O’Boyle and Keshav Pingali (Eds.). 216–226. Google Scholar
Digital Library
- Xavier Leroy. 1990. The Zinc experiment: an economical implementation of the ML language. Rapport Technique 117. INRIA Rocquencourt, Le Chesnay, France.Google Scholar
- Xavier Leroy and François Pessaux. 2000. Type-based analysis of uncaught exceptions. ACM Transactions on Programming Languages and Systems 22, 2 (2000), 340–377. Google Scholar
Digital Library
- John M. Lucassen and David K. Gifford. 1988. Polymorphic effect systems. In Proceedings of the Fifteenth Annual ACM Symposium on Principles of Programming Languages, Jeanne Ferrante and Peter Mager (Eds.). ACM Press, San Diego, California, 47–57. Google Scholar
Digital Library
- John C. Martin. 1997. Introduction to Languages and the Theory of Computation. McGraw-Hill.Google Scholar
- Flemming Nielson and Hanne Riis Nielson. 1999. Type and Effect Systems. In Correct System Design, Recent Insight and Advances, (to Hans Langmaack on the occasion of his retirement from his professorship at the University of Kiel) (Lecture Notes in Computer Science), Ernst-Rüdiger Olderog and Bernhard Steffen (Eds.), Vol. 1710. Springer-Verlag, 114–136.Google Scholar
- Flemming Nielson, Hanne Riis Nielson, and Chris Hankin. 1999. Principles of Program Analysis. Springer. Google Scholar
Cross Ref
- Michal H. Pałka, Koen Claessen, Alejandro Russo, and John Hughes. 2011. Testing an optimising compiler by generating random lambda terms. In Proceedings of the 6th International Workshop on Automation of Software Test, AST 2011. 91–97. Google Scholar
Digital Library
- Benjamin C. Pierce. 2002. Types and Programming Languages. The MIT Press.Google Scholar
- Lee Pike. 2014. SmartCheck: automatic and efficient counterexample reduction and generalization. In Proceedings of the 2014 ACM SIGPLAN symposium on Haskell, Gothenburg, Sweden, September 4-5, 2014, Wouter Swierstra (Ed.). 53–64.Google Scholar
Digital Library
- Vincent St-Amour and Neil Toronto. 2013. Experience Report: Applying Random Testing to a Base Type Environment. In Proceedings of the 18th ACM SIGPLAN International Conference on Functional Programming (ICFP’13), Greg Morrisett and Tarmo Uustalu (Eds.). Boston, MA, 351–356. Google Scholar
Digital Library
- Andrew K. Wright and Matthias Felleisen. 1994. A syntactic approach to type soundness. Information and Computation 115 (1994), 38–94. Google Scholar
Digital Library
- Xuejun Yang, Yang Chen, Eric Eide, and John Regehr. 2011. Finding and Understanding Bugs in C Compilers. In Proceedings of the ACM SIGPLAN 2011 Conference on Programming Languages Design and Implementation, PLDI’11, David Padua (Ed.). San Jose, California, 283–294. Google Scholar
Digital Library
Index Terms
Effect-driven QuickChecking of compilers
Recommendations
A Survey of Compiler Testing
Virtually any software running on a computer has been processed by a compiler or a compiler-like tool. Because compilers are such a crucial piece of infrastructure for building software, their correctness is of paramount importance. To validate and ...
An Automatic Generator for Compiler Testing
A new method for testing compilers is presented. The compiler is exercized by compilable programs, automatically generated by a test generator. The generator is driven by a tabular description of the source language. This description is in a formalism ...
SpecTest: Specification-Based Compiler Testing
Fundamental Approaches to Software EngineeringAbstractCompilers are error-prone due to their high complexity. They are relevant for not only general purpose programming languages, but also for many domain specific languages. Bugs in compilers can potentially render all programs at risk. It is thus ...






Comments