Abstract
We consider from a practical perspective the problem of checking equivalence of context-free grammars. We present techniques for proving equivalence, as well as techniques for finding counter-examples that establish non-equivalence. Among the key building blocks of our approach is a novel algorithm for efficiently enumerating and sampling words and parse trees from arbitrary context-free grammars; the algorithm supports polynomial time random access to words belonging to the grammar. Furthermore, we propose an algorithm for proving equivalence of context-free grammars that is complete for LL grammars, yet can be invoked on any context-free grammar, including ambiguous grammars. Our techniques successfully find discrepancies between different syntax specifications of several real-world languages, and are capable of detecting fine-grained incremental modifications performed on grammars. Our evaluation shows that our tool improves significantly on the existing available state of the art tools. In addition, we used these algorithms to develop an online tutoring system for grammars that we then used in an undergraduate course on computer language processing. On questions involving grammar constructions, our system was able to automatically evaluate the correctness of 95% of the solutions submitted by students: it disproved 74% of cases and proved 21% of them.
Supplemental Material
Available for Download
A VM containing the executable implementation of the system described in the paper Automating Grammar Comparison, and the benchmarks used in the experimental study.
- Antlr version 4. http://www.antlr.org/.Google Scholar
- Java 7 language specification. http://docs.oracle.com/ javase/specs/jls/se7/html/jls-18.html.Google Scholar
- A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Princiles, Techniques, and Tools. Addison-Wesley, 1986. ISBN 0-201- 10088-6. Google Scholar
Digital Library
- R. Axelsson, K. Heljanko, and M. Lange. Analyzing context-free grammars using an incremental SAT solver. In Automata, Languages and Programming, ICALP, pages 410–422, 2008. Google Scholar
Digital Library
- . URL http://dx.doi.org/10.1007/ 978-3-540-70583-3_34.Google Scholar
- C. Bastien, J. Czyzowicz, W. Fraczak, and W. Rytter. Prime normal form and equivalence of simple grammars. Theor. Comput. Sci., 363(2):124–134, 2006. Google Scholar
Digital Library
- A. Bertoni, M. Goldwurm, and M. Santini. Random generation and approximate counting of ambiguously described combinatorial structures. In STACS 2000, pages 567–580. 2000. Google Scholar
Digital Library
- C. Creus and G. Godoy. Automatic evaluation of context-free grammars (system description). In Rewriting and Typed Lambda Calculi RTA-TLCA, pages 139– 148, 2014.Google Scholar
Cross Ref
- . URL http://dx.doi.org/10.1007/ 978-3-319-08918-8_10.Google Scholar
- B. Daniel, D. Dig, K. Garcia, and D. Marinov. Automated testing of refactoring engines. In Foundations of Software Engineering, pages 185–194, 2007. Google Scholar
Digital Library
- P. Godefroid, A. Kiezun, and M. Y. Levin. Grammar-based whitebox fuzzing. In Programming Language Design and Implementation, pages 206–215, 2008. Google Scholar
Digital Library
- V. Gore, M. Jerrum, S. Kannan, Z. Sweedyk, and S. R. Mahaney. A quasi-polynomial-time algorithm for sampling words from a context-free language. Inf. Comput., 134(1):59–74, 1997. Google Scholar
Digital Library
- H. Guo and Z. Qiu. Automatic grammar-based test generation. In Testing Software and Systems ICTSS, pages 17–32, 2013.Google Scholar
Cross Ref
- M. A. Harrison, I. M. Havel, and A. Yehudai. On equivalence of grammars through transformation trees. Theor. Comput. Sci., 9:173–205, 1979.Google Scholar
Cross Ref
- M. Hennessy. An analysis of rule coverage as a criterion in generating minimal test suites for grammar-based software. In Automated Software Engineering, pages 104–113, 2005. Google Scholar
Digital Library
- T. J. Hickey and J. Cohen. Uniform random generation of strings in a context-free language. SIAM J. Comput., 12(4): 645–655, 1983.Google Scholar
Digital Library
- A. J. Korenjak and J. E. Hopcroft. Simple deterministic languages. In Symposium on Switching and Automata Theory (Swat), pages 36–46, 1966. Google Scholar
Digital Library
- D. Kozen. Automata and computability. Undergraduate texts in computer science. Springer, 1997. ISBN 978-0-387-94907-9. Google Scholar
- I. Kuraj and V. Kuncak. Scife: Scala framework for efficient enumeration of data structures with invariants. In Scala Workshop, pages 45–49, 2014. Google Scholar
Digital Library
- R. Lämmel and W. Schulte. Controllable combinatorial coverage in grammar-based testing. In Testing of Communicating Systems, TestCom, pages 19–38, 2006. Google Scholar
Digital Library
- H. G. Mairson. Generating words in a context-free language uniformly at random. Inf. Process. Lett., 49(2):95–99, 1994. Google Scholar
Digital Library
- R. Majumdar and R. Xu. Directed test generation using symbolic grammars. In Automated Software Engineering, pages 553–556, 2007. Google Scholar
Digital Library
- B. A. Malloy. An interpretation of purdom’s algorithm for automatic generation of test cases. In International Conference on Computer and Information Science, pages 3–5, 2001.Google Scholar
- P. M. Maurer. Generating test data with enhanced context-free grammars. IEEE Software, 7(4):50–55, 1990. Google Scholar
Digital Library
- A. Nijholt. The equivalence problem for LL- and LR-regular grammars. pages 149–161, 1982.Google Scholar
- T. Olshansky and A. Pnueli. A direct algorithm for checking equivalence of LL(k) grammars. Theor. Comput. Sci., 4(3): 321–349, 1977.Google Scholar
Cross Ref
- T. Parr, S. Harwell, and K. Fisher. Adaptive LL(*) parsing: the power of dynamic analysis. In Object Oriented Programming Systems Languages & Applications, OOPSLA, pages 579––598, 2014. Google Scholar
Digital Library
- S. Pigeon. Pairing function. http://mathworld.wolfram. com/PairingFunction.html.Google Scholar
- P. Purdom. A sentence generator for testing parsers. BIT Numerical Mathematics, pages 366–375, 1972.Google Scholar
- D. J. Rosenkrantz and R. E. Stearns. Properties of deterministic top down grammars. In Symposium on Theory of Computing STOC, pages 165–180, 1969. Google Scholar
Digital Library
- R. Singh, S. Gulwani, and A. Solar-Lezama. Automated feedback generation for introductory programming assignments. In Programming Language Design and Implementation PLDI, pages 15–26, 2013. Google Scholar
Digital Library
- E. G. Sirer and B. N. Bershad. Using production grammars in software testing. In Domain-Specific Languages DSL, pages 1–13, 1999. Google Scholar
Digital Library
- G. Sénizergues. L(a)=l(b)? decidability results from complete formal systems. Theoretical Computer Science, 251(1–2):1 – 166, 2001. Google Scholar
Digital Library
- L. G. Valiant. Decision procedures for families of deterministic pushdown automata. Technical report, University of Warwick, Coventry, UK, 1973. Google Scholar
Digital Library
- A. Warth, J. R. Douglass, and T. D. Millstein. Packrat parsers can support left recursion. In Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM, pages 103–110, 2008. Google Scholar
Digital Library
Index Terms
Automating grammar comparison
Recommendations
Automating grammar comparison
OOPSLA 2015: Proceedings of the 2015 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and ApplicationsWe consider from a practical perspective the problem of checking equivalence of context-free grammars. We present techniques for proving equivalence, as well as techniques for finding counter-examples that establish non-equivalence. Among the key ...
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04For decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...






Comments