Abstract
Parsing with Derivatives (PwD) is an elegant approach to parsing context-free grammars (CFGs). It takes the equational theory behind Brzozowski's derivative for regular expressions and augments that theory with laziness, memoization, and fixed points. The result is a simple parser for arbitrary CFGs. Although recent work improved the performance of PwD, it remains inefficient due to the algorithm repeatedly traversing some parts of the grammar.
In this functional pearl, we show how to avoid this inefficiency by suspending the state of the traversal in a zipper. When subsequent derivatives are taken, we can resume the traversal from where we left off without retraversing already traversed parts of the grammar.
However, the original zipper is designed for use with trees, and we want to parse CFGs. CFGs can include shared regions, cycles, and choices between alternates, which makes them incompatible with the traditional tree model for zippers. This paper develops a generalization of zippers to properly handle these additional features. Just as PwD generalized Brzozowski's derivatives from regular expressions to CFGs, we generalize Huet's zippers from trees to CFGs.
Abstract The resulting parsing algorithm is concise and efficient: it takes only 31 lines of OCaml code to implement the derivative function but performs 6,500 times faster than the original PwD and 3.24 times faster than the optimized implementation of PwD.
Supplemental Material
- Michael D. Adams, Celeste Hollenbeck, and Mathew Might. 2016. On the complexity and performance of parsing with derivatives. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (Santa Barbara, CA, USA) ( PLDI '16). ACM, New York, NY, USA, 224-236. https://doi.org/10.1145/2908080.2908128 Google Scholar
Digital Library
- Janusz A. Brzozowski. 1964. Derivatives of Regular Expressions. Journal of the ACM (JACM) 11, 4 (Oct. 1964 ), 481-494. https://doi.org/10.1145/321239.321249 Google Scholar
Digital Library
- Nils Anders Danielsson. 2010. Total parser combinators. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (Baltimore, Maryland, USA) ( ICFP '10). ACM, New York, NY, USA, 285-296. https://doi.org/10. 1145/1863543.1863585 Google Scholar
Digital Library
- Jay Earley. 1970. An eficient context-free parsing algorithm. Communications of the ACM (CACM) 13, 2 (Feb. 1970 ), 94-102. https://doi.org/10.1145/362007.362035 Google Scholar
Digital Library
- Romain Edelmann, Jad Hamza, and Viktor Kunčak. 2020. Zippy LL(1) Parsing with Derivatives. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (London, UK) ( PLDI '20). ACM, New York, NY, USA, 1036-1051. https://doi.org/10.1145/3385412.3385992 Google Scholar
Digital Library
- Gérard Huet. 1997. The Zipper. Journal of Functional Programming 7, 05 (Sept. 1997 ), 549-554. https://doi.org/10.1017/ S0956796897002864 Google Scholar
Digital Library
- Jane Street. 2014. core_bench. https://github.com/janestreet/core_bench version 109.58.01.Google Scholar
- Mark Johnson. 1995. Memoization in top-down parsing. Computational Linguistics 21, 3 (Sept. 1995 ), 405-417. http://dl.acm.org/citation.cfm?id= 216261. 216269Google Scholar
- Xavier Leroy, Damien Doligez, Alain Frisch, Jacques Garrigue, Didier Rémy, and Jérôme Vouillon. 2020. The OCaml system: release 4.10. https://ocaml.org/releases/4.10/htmlman/Google Scholar
- Conor McBride. 2001. The Derivative of a Regular Type is its Type of One-Hole Contexts. strictlypositive.org/diff.pdfGoogle Scholar
- Conor McBride. 2008. Clowns to the left of me, jokers to the right (pearl): dissecting data structures. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (San Francisco, California, USA) ( POPL '08). ACM, New York, NY, USA, 287-295. https://doi.org/10.1145/1328438.1328474 Google Scholar
Digital Library
- Mathew Might, David Darais, and Daniel Spiewak. 2011. Parsing with derivatives: a functional pearl. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming (Tokyo, Japan) ( ICFP '11). ACM, New York, NY, USA, 189-195. https://doi.org/10.1145/2034773.2034801 Google Scholar
Digital Library
- Emmanuel Onzon. 2012. dypgen: Self-extensible parsers and lexers for OCaml. http://dypgen.free.fr/ version 20120619.Google Scholar
- Scot Owens, John Reppy, and Aaron Turon. 2009. Regular-expression derivatives re-examined. Journal of Functional Programming 19, 02 (March 2009 ), 173-190. https://doi.org/10.1017/S0956796808007090 Google Scholar
Digital Library
- François Potier and Yann Régis-Gianas. 2019. Menhir. http://gallium.inria.fr/~fpottier/menhir/ version 20190626.Google Scholar
- Python Software Foundation. 2015a. Python 3.4.3. https://www.python.org/downloads/release/python-343/Google Scholar
- Python Software Foundation. 2015b. The Python Language Reference: Full Grammar specification. https://docs.python.org/ 3/reference/grammar.htmlGoogle Scholar
- Elizabeth Scot and Adrian Johnstone. 2010. GLL Parsing. Electronic Notes in Theoretical Computer Science 253, 7 (Sept. 2010 ), 177-189. https://doi.org/10.1016/j.entcs. 2010. 08.041 Google Scholar
Digital Library
- Elizabeth Scot and Adrian Johnstone. 2013. GLL parse-tree generation. Science of Computer Programming 78, 10 (Oct. 2013 ), 1828-1844. https://doi.org/10.1016/j.scico. 2012. 03.005 Google Scholar
Digital Library
Index Terms
Parsing with zippers (functional pearl)
Recommendations
Left recursion in Parsing Expression Grammars
Parsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG ...
Left recursion in parsing expression grammars
SBLP'12: Proceedings of the 16th Brazilian conference on Programming LanguagesParsing Expression Grammars (PEGs) are a formalism that can describe all deterministic context-free languages through a set of rules that specify a top-down parser for some language. PEGs are easy to use, and there are efficient implementations of PEG ...
Derivative grammars: a symbolic approach to parsing with derivatives
We present a novel approach to context-free grammar parsing that is based on generating a sequence of grammars called derivative grammars from a given context-free grammar and input string. The generation of the derivative grammars is described by a few ...






Comments