Abstract
We present a functional approach to parsing unrestricted context-free grammars based on Brzozowski's derivative of regular expressions. If we consider context-free grammars as recursive regular expressions, Brzozowski's equational theory extends without modification to context-free grammars (and it generalizes to parser combinators). The supporting actors in this story are three concepts familiar to functional programmers - laziness, memoization and fixed points; these allow Brzozowski's original equations to be transliterated into purely functional code in about 30 lines spread over three functions.
Yet, this almost impossibly brief implementation has a drawback: its performance is sour - in both theory and practice. The culprit? Each derivative can double the size of a grammar, and with it, the cost of the next derivative.
Fortunately, much of the new structure inflicted by the derivative is either dead on arrival, or it dies after the very next derivative. To eliminate it, we once again exploit laziness and memoization to transliterate an equational theory that prunes such debris into working code. Thanks to this compaction, parsing times become reasonable in practice.
We equip the functional programmer with two equational theories that, when combined, make for an abbreviated understanding and implementation of a system for parsing context-free languages.
Supplemental Material
- Brzozowski, J. A. Derivatives of regular expressions. Journal of the ACM 11, 4 (Oct. 1964), 481--494. Google Scholar
Digital Library
- Cocke, J., and Schwartz, J. T. Programming languages and their compilers: Preliminary notes. Tech. rep., Courant Institute of Mathematical Sciences, New York University, New York, NY, 1970.Google Scholar
- Cousot, P., and Cousot, R. Parsing as abstract interpretation of grammar semantics. Theoretical Computer Science 290 (2003), 531--544. Google Scholar
Digital Library
- Cousot, P., and Cousot, R. Grammar analysis and parsing by abstract interpretation, invited chapter. In Program Analysis and Compilation, Theory and Practice: Essays dedicated to Reinhard Wilhelm, T. Reps, M. Sagiv, and J. Bauer, Eds., LNCS 4444. Springer discretionary-Verlag, Dec. 2006, pp. 178--203. Google Scholar
Digital Library
- Danielsson, N. A. Total parser combinators. In Proceedings of the 15th ACM SIGPLAN international conference on Functional programming (New York, NY, USA, 2010), ICFP '10, ACM, pp. 285--296. Google Scholar
Digital Library
- DeRemer, F. L. Practical translators for LR(k) languages. Tech. rep., Cambridge, MA, USA, 1969. Google Scholar
Digital Library
- Dijkstra, E. W. Selected Writings on Computing: A Personal Perspective. Springer, Oct. 1982. Google Scholar
Digital Library
- Earley, J. An efficient context-free parsing algorithm. Communications of the ACM 13, 2 (Feb. 1970), 94--102. Google Scholar
Digital Library
- Floyd, R. W. Syntactic analysis and operator precedence. Journal of the ACM 10, 3 (July 1963), 316--333. Google Scholar
Digital Library
- Ford, B. Packrat parsing: Simple, powerful, lazy, linear time. In Proceedings of the 2002 International Conference on Functional Programming (Oct. 2002). Google Scholar
Digital Library
- Kasami, T. An efficient recognition and syntax-analysis algorithm for context-free languages. Tech. rep., Air Force Cambridge Research Lab, Bedford, MA, 1965.Google Scholar
- Knuth, D. On the translation of languages from left to right. Information and Control 8 (1965), 607--639.Google Scholar
Cross Ref
- Owens, S., Reppy, J., and Turon, A. Regular-expression derivatives re-examined. Journal of Functional Programming 19, 02 (2009), 173--190. Google Scholar
Digital Library
- Pratt, V. R. Top down operator precedence. In POPL '73: Proceedings of the 1st annual ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages (New York, NY, USA, 1973), POPL '73, ACM, pp. 41--51. Google Scholar
Digital Library
- Swierstra, D. S., Pablo, and Sariava, J. Designing and implementing combinator languages. In Advanced Functional Programming (1998), pp. 150--206.Google Scholar
- Swierstra, S. Combinator parsing: A short tutorial. In Language Engineering and Rigorous Software Development, A. Bove, L. Barbosa, A. Pardo, and J. Pinto, Eds., vol. 5520 of Lecture Notes in Computer Science. Springer Berlin / Heidelberg, Berlin, Heidelberg, 2009, ch. 6, pp. 252--300. Google Scholar
Digital Library
- Tomita, M. LR parsers for natural languages. In ACL-22: Proceedings of the 10th International Conference on Computational Linguistics and 22nd annual meeting on Association for Computational Linguistics (Morristown, NJ, USA, 1984), Association for Computational Linguistics, pp. 354--357. Google Scholar
Digital Library
- Warth, A., Douglass, J. R., and Millstein, T. Packrat parsers can support left recursion. In PEPM '08: Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation (New York, NY, USA, 2008), ACM, pp. 103--110. Google Scholar
Digital Library
- Wirth, N. Compiler Construction (International Computer Science Series), pap/dsk ed. Addison-Wesley Pub (Sd).Google Scholar
- Younger, D. H. Recognition and parsing of context-free languages in time n3. Information and Control 10, 2 (1967), 189--208.Google Scholar
Cross Ref
Index Terms
Parsing with derivatives: a functional pearl
Recommendations
Parsing with derivatives: a functional pearl
ICFP '11: Proceedings of the 16th ACM SIGPLAN international conference on Functional programmingWe present a functional approach to parsing unrestricted context-free grammars based on Brzozowski's derivative of regular expressions. If we consider context-free grammars as recursive regular expressions, Brzozowski's equational theory extends without ...
Preference logic grammars
Preference logic grammars (PLGs) are introduced in this paper as a concise, declarative, modular, and efficient means of resolving ambiguity in logic grammars. Preference logic grammars can be thought as extensions of definite clause grammars (DCGs) and ...
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...







Comments