Abstract
Current algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously. Might et al. introduced parsing with derivatives, which handles arbitrary context-free grammars while being both easy to understand and simple to implement. Despite much initial enthusiasm and a multitude of independent implementations, its worst-case complexity has never been proven to be better than exponential. In fact, high-level arguments claiming it is fundamentally exponential have been advanced and even accepted as part of the folklore. Performance ended up being sluggish in practice, and this sluggishness was taken as informal evidence of exponentiality. In this paper, we reexamine the performance of parsing with derivatives. We have discovered that it is not exponential but, in fact, cubic. Moreover, simple (though perhaps not obvious) modifications to the implementation by Might et al. lead to an implementation that is not only easy to understand but also highly performant in practice.
- Bison. Bison. URL https://www.gnu.org/software/bison/. Janusz A. Brzozowski. Derivatives of regular expressions. Journal of the ACM (JACM), 11(4):481–494, October 1964. ISSN 0004- 5411. doi: 10.1145/321239.321249. Google Scholar
Digital Library
- William Byrd. relational-parsing-with-derivatives, 2013. URL https://github.com/webyrd/ relational-parsing-with-derivatives. Russ Cox. Yacc is not dead. Blog, December 2010. URL http://research.swtch.com/yaccalive. Jay Earley. An Efficient Context-Free Parsing Algorithm. PhD thesis, Carnegie Mellon University, 1968. URL http://reports-archive.adm.cs.cmu.edu/anon/anon/ usr/ftp/scan/CMU-CS-68-earley.pdf. Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102, February 1970. ISSN 0001- 0782. doi: 10.1145/362007.362035.Google Scholar
- Mark Engelberg. instaparse, 2015. URL https://github. com/Engelberg/instaparse. Abraham Flaxman, Aram W. Harrow, and Gregory B. Sorkin. Strings with maximally many distinct subsequences and substrings. The Electronic Journal of Combanatorics, 11(1):R8, 2004. ISSN 1077-8926. URL http://www.combinatorics. org/ojs/index.php/eljc/article/view/v11i1r8. Bryan Ford. Parsing expression grammars: a recognition-based syntactic foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’04, pages 111–122, New York, NY, USA, January 2004. ACM. ISBN 1-58113-729-X. doi: 10.1145/ 964001.964011.Google Scholar
- Gary A. Kildall. A unified approach to global program optimization. In Proceedings of the 1st Annual ACM SIGACTSIGPLAN Symposium on Principles of Programming Languages, POPL ’73, pages 194–206, New York, NY, USA, October 1973. ACM. doi: 10.1145/512927.512945. Google Scholar
Digital Library
- Dexter Kozen. A completeness theorem for kleene algebras and the algebra of regular events. Information and Computation, 110(2): 366–390, May 1994. ISSN 0890-5401. doi: 10.1006/inco.1994. Google Scholar
Digital Library
- 1037.Google Scholar
- Bernard Lang. Deterministic techniques for efficient nondeterministic parsers. In Prof. Dr.-Ing. J. Loeckx, editor, Automata, Languages and Programming, volume 14 of Lecture Notes in Computer Science, pages 255–269. Springer Berlin Heidelberg, 1974. ISBN 978-3-540-06841-9. doi: 10.1007/ 3-540-06841-4_65. Tommy McGuire. Java-Parser-Derivatives, 2012. URL https://github.com/tmmcguire/ Java-Parser-Derivatives. Gary H. Merrill. Parsing non-LR(k) grammars with yacc. Software: Practice and Experience, 23(8):829–850, August 1993. ISSN 1097-024X. doi: 10.1002/spe.4380230803. Google Scholar
Digital Library
- Matthew Might. derp documentation, 2013. URL http://matt.might.net/teaching/compilers/ spring-2013/derp.html. Matthew Might, David Darais, and Daniel Spiewak. Parsing with derivatives: a functional pearl. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming, ICFP ’11, pages 189–195, New York, NY, USA, September 2011. ACM. ISBN 978-1-4503-0865-6. doi: 10. 1145/2034773.2034801. Google Scholar
Digital Library
- Russell Mull. parsing-with-derivatives, 2013. URL https: //github.com/mullr/parsing-with-derivatives. Scott Owens, John Reppy, and Aaron Turon. Regular-expression derivatives re-examined. Journal of Functional Programming, 19(02):173–190, March 2009. ISSN 1469-7653. doi: 10.1017/ S0956796808007090. Google Scholar
Digital Library
- Per Vognsen. parser, 2012. URL https://gist.github.com/ pervognsen/815b208b86066f6d7a00. Introduction Background The bzd Derivative Parsing Expressions Derivatives of Parsing Expressions Nullability Derivatives of Context-free Languages Representation Computation Performance Complexity Analysis Total Running Time in Terms of Grammar Nodes Grammar Nodes in Terms of Input Length Running Time in Terms of Input Length Improving Performance in Practice Benchmarks Computing Fixed Points Compaction Right-hand Children of Sequence Nodes Canonicalizing Chains of Sequence Nodes Avoiding Separate Passes Hash Tables and Memoization Conclusion ReferencesGoogle Scholar
Index Terms
On the complexity and performance of parsing with derivatives
Recommendations
Parsing with derivatives: a functional pearl
ICFP '11: Proceedings of the 16th ACM SIGPLAN international conference on Functional programmingWe present a functional approach to parsing unrestricted context-free grammars based on Brzozowski's derivative of regular expressions. If we consider context-free grammars as recursive regular expressions, Brzozowski's equational theory extends without ...
Derivative grammars: a symbolic approach to parsing with derivatives
We present a novel approach to context-free grammar parsing that is based on generating a sequence of grammars called derivative grammars from a given context-free grammar and input string. The generation of the derivative grammars is described by a few ...
On the complexity and performance of parsing with derivatives
PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and ImplementationCurrent algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously. Might et al. ...







Comments