Abstract
Despite the advances made by modern parsing strategies such as PEG, LL(*), GLR, and GLL, parsing is not a solved problem. Existing approaches suffer from a number of weaknesses, including difficulties supporting side-effecting embedded actions, slow and/or unpredictable performance, and counter-intuitive matching strategies. This paper introduces the ALL(*) parsing strategy that combines the simplicity, efficiency, and predictability of conventional top-down LL(k) parsers with the power of a GLR-like mechanism to make parsing decisions. The critical innovation is to move grammar analysis to parse-time, which lets ALL(*) handle any non-left-recursive context-free grammar. ALL(*) is O(n4) in theory but consistently performs linearly on grammars used in practice, outperforming general strategies such as GLL and GLR by orders of magnitude. ANTLR 4 generates ALL(*) parsers and supports direct left-recursion through grammar rewriting. Widespread ANTLR 4 use (5000 downloads/month in 2013) provides evidence that ALL(*) is effective for a wide variety of applications.
Supplemental Material
Available for Download
All files
Step-by-Step Instructions
Getting Started
- Ancona, M., Dodero, G., Gianuzzi, V., and Morgavi, M. Efficient construction of LR(k) states and tables. ACM Trans. Program. Lang. Syst. 13, 1 (Jan. 1991), 150--178. Google Scholar
Digital Library
- Bermudez, M. E., and Schimpf, K. M. Practical arbitrary lookahead LR parsing. Journal of Computer and System Sciences 41, 2 (1990).Google Scholar
Cross Ref
- Brown, S., and Vranesic, Z. Fundamentals of Digital Logic with Verilog Design. McGraw-Hill series in ECE. 2003. Google Scholar
Digital Library
- Charles, P. A Practical Method for Constructing Efficient LALR(k) Parsers with Automatic Error Recovery. PhD thesis, New York University, New York, NY, USA, 1991. Google Scholar
Digital Library
- Clarke, K. The top-down parsing of expressions. Unpublished technical report, Dept. of Computer Science and Statistics, Queen Mary College, London, June 1986.Google Scholar
- Cleveland, W. S. Robust Locally Weighted Regression and Smoothing Scatterplots. Journal of the American Statistical Association 74 (1979), 829--836.Google Scholar
Cross Ref
- Cohen, R., and Culik, K. LR-Regular grammars - an extension of LR(k) grammars. In SWAT '71 (Washington, DC, USA, 1971), IEEE Computer Society, pp. 153--165. Google Scholar
Digital Library
- Earley, J. An efficient context-free parsing algorithm. Communications of the ACM 13, 2 (1970), 94--102. Google Scholar
Digital Library
- Ford, B. Parsing Expression Grammars: A recognition-based syntactic foundation. In POPL (2004), ACM Press. Google Scholar
Digital Library
- Grimm, R. Better extensibility through modular syntax. In PLDI (2006), ACM Press, pp. 38--51. Google Scholar
Digital Library
- Hopcroft, J., and Ullman, J. Introduction to Automata Theory, Languages, and Computation. Addison-Wesley, Reading, Massachusetts, 1979. Google Scholar
Digital Library
- Jarzabek, S., and Krawczyk, T. LL-Regular grammars. Information Processing Letters 4, 2 (1975), 31--37.Google Scholar
Cross Ref
- Jim, T., Mandelbaum, Y., and Walker, D. Semantics and algorithms for data-dependent grammars. In POPL 2010. Google Scholar
Digital Library
- Johnson, M. The computational complexity of GLR parsing. In Generalized LR Parsing, M. Tomita, Ed. Kluwer, 1991.Google Scholar
- Kipps, J. Generalized LR Parsing. Springer, 1991, pp. 43--59.Google Scholar
Cross Ref
- Mclean, P., and Horspool, R. N. A faster Earley parser. In CC (1996), Springer, pp. 281--293. Google Scholar
Digital Library
- McPeak, S. Elkhound: A fast, practical GLR parser generator. Tech. rep., UC Berkeley (EECS), Dec. 2002. Google Scholar
Digital Library
- McPeak, S., and Necula, G. C. Elkhound: A fast, practical GLR parser generator. In CC (2004), pp. 73--88.Google Scholar
Cross Ref
- Parr, T. The Definitive ANTLR 4 Reference. The Pragmatic Programmers, 2013. Google Scholar
Digital Library
- Parr, T., and Fisher, K. LL(*): The Foundation of the ANTLR Parser Generator. In PLDI (2011), pp. 425--436. Google Scholar
Digital Library
- Parr, T. J. Obtaining practical variants of LL(k) and LR(k) for k>1 by splitting the atomic k-tuple. PhD thesis, Purdue University, West Lafayette, IN, USA, 1993. Google Scholar
Digital Library
- Parr, T. J., and Quong, R. W. Adding Semantic and Syntactic Predicates to LL(k) - pred-LL(k). In CC (1994). Google Scholar
Digital Library
- Perlin, M. LR recursive transition networks for Earley and Tomita parsing. In Proceedings of the 29th Annual Meeting on Association for Computational Linguistics (1991), ACL '91. Google Scholar
Digital Library
- Plevyak, J. DParser: GLR parser generator, Oct. 2013.Google Scholar
- Scott, E., and Johnstone, A. GLL parsing. Electron. Notes Theor. Comput. Sci. 253, 7 (Sept. 2010), 177--189. Google Scholar
Digital Library
- Tomita, M. Efficient Parsing for Natural Language. Kluwer Academic Publishers, 1986. Google Scholar
Digital Library
- Woods, W. A. Transition network grammars for natural language analysis. Comm. of the ACM 13, 10 (1970). Google Scholar
Digital Library
Index Terms
Adaptive LL(*) parsing: the power of dynamic analysis
Recommendations
Faster general parsing through context-free memoization
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and ImplementationWe present a novel parsing algorithm for all context-free languages. The algorithm features a clean mathematical formulation: parsing is expressed as a series of standard operations on regular languages and relations. Parsing complexity w.r.t. input ...
Adaptive LL(*) parsing: the power of dynamic analysis
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & ApplicationsDespite the advances made by modern parsing strategies such as PEG, LL(*), GLR, and GLL, parsing is not a solved problem. Existing approaches suffer from a number of weaknesses, including difficulties supporting side-effecting embedded actions, slow and/...
LL(*): the foundation of the ANTLR parser generator
PLDI '11Despite the power of Parser Expression Grammars (PEGs) and GLR, parsing is not a solved problem. Adding nondeterminism (parser speculation) to traditional LL and LR parsers can lead to unexpected parse-time behavior and introduces practical issues with ...







Comments