ABSTRACT
We explore how to make the benefits of modularity available for syntactic specifications and present Rats!, a parser generator for Java that supports easily extensible syntax. Our parser generator builds on recent research on parsing expression grammars (PEGs), which, by being closed under composition, prioritizing choices, supporting unlimited lookahead, and integrating lexing and parsing, offer an attractive alternative to context-free grammars. PEGs are implemented by so-called packrat parsers, which are recursive descent parsers that memoize all intermediate results (hence their name). Memoization ensures linear-time performance in the presence of unlimited lookahead, but also results in an essentially lazy, functional parsing technique. In this paper, we explore how to leverage PEGs and packrat parsers as the foundation for extensible syntax. In particular, we show how make packrat parsing more widely applicable by implementing this lazy, functional technique in a strict, imperative language, while also generating better performing parsers through aggressive optimizations. Next, we develop a module system for organizing, modifying, and composing large-scale syntactic specifications. Finally, we describe a new technique for managing (global) parsing state in functional parsers. Our experimental evaluation demonstrates that the resulting parser generator succeeds at providing extensible syntax. In particular, Rats! enables other grammar writers to realize real-world language extensions in little time and code, and it generates parsers that consistently out-perform parsers created by two GLR parser generators.
- J. Aycock and R. N. Horspool. Practical Earley parsing. The Computer Journal, 45(6):620--630, 2002.Google Scholar
Cross Ref
- J. Bender. Mini-bibliography on modules for functional programming languages. http://readscheme.org/modules/, Mar. 2004.Google Scholar
- A. Birman. The TMG Recognition Schema. PhD thesis, Princeton University, Feb. 1970. Google Scholar
Digital Library
- A. Birman and J. D. Ullman. Parsing algorithms with backtrack. Information and Control, 23(1):1--34, Aug. 1973.Google Scholar
Cross Ref
- C. Brabrand and M. I. Schwartzbach. Growing languages with metamorphic syntax macros. In Proc. 2002 ACM SIGPLAN Workshop on Partial Evaluation and Semantics-Based Program Manipulation, pp. 31--40, Jan. 2002. Google Scholar
Digital Library
- C. Brabrand, M. I. Schwartzbach, and M. Vanggaard. The \tt metafront system: Extensible parsing and transformation. Electronic Notes in Theoretical Computer Science, 82(3):592--611, Dec. 2003. Google Scholar
Digital Library
- G. Bracha, M. Odersky, D. Stoutamire, and P. Wadler. Making the future safe for the past: Adding genericity to the Java programming language. In Proc. 1998 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 183--200, Oct. 1998. Google Scholar
Digital Library
- M. Bravenboer, A. van Dam, K. Olmos, and E. Visser. Program transformation with scoped dynamic rewrite rules. Fundamenta Informaticae, 69(1--2):123--178, 2005. Google Scholar
Digital Library
- M. Bravenboer and E. Visser. Concrete syntax for objects. In Proc. 2004 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 365--383, Oct. 2004. Google Scholar
Digital Library
- M. Brukman and A. C. Myers. PPG: A parser generator for extensible grammars. http://www.cs.cornell.edu/Projects/polyglot/ppg.html.Google Scholar
- L. Cardelli, F. Matthes, and M. Abadi. Extensible syntax with lexical scoping. Tech. Report 121, DEC SRC, Feb. 1994.Google Scholar
- Cryptix Foundation. Cryptix JCE. http://www.cryptix.org/.Google Scholar
- A. Dikjstra and D. S. Swierstra. Lazy functional parser combinators in Java. In Proc. 1st Workshop on Multiparadigm Programming with Object-Oriented Languages, pp. 11--42. John von Neumann Institute for Computing, May 2001.Google Scholar
- J. C. Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94--102, Feb. 1970. Google Scholar
Digital Library
- T. Ekman and G. Hedin. Rewritable reference attributed grammars. In Proc. 18th European Conference on Object-Oriented Programming, vol. 3086 of LNCS, pp. 147--171. Springer, June 2004.Google Scholar
Cross Ref
- O. Enseling. Build your own languages with JavaCC. JavaWorld, Dec. 2000. http://www.javaworld.com/javaworld/jw-12-2000/jw-1229-cooltools.html.Google Scholar
- M. E. Fiuczynski, R. Grimm, Y. Coady, and D. Walker. patch (1) considered harmful. In Proc. 10th Workshop on Hot Topics in Operating Systems, pp. 91--96, June 2005. Google Scholar
Digital Library
- B. Ford. Packrat parsing: A practical linear-time algorithm with backtracking. Master's thesis, MIT, Sept. 2002.Google Scholar
- B. Ford. Packrat parsing: Simple, powerful, lazy, linear time. In Proc. 2002 ACM International Conference on Functional Programming, pp. 36--47, Oct. 2002. Google Scholar
Digital Library
- B. Ford. Parsing expression grammars: A recognition-based syntactic foundation. In Proc. 31st ACM Symposium on Principles of Programming Languages, pp. 111--122, Jan. 2004. Google Scholar
Digital Library
- Free Software Foundation. Bison. http://www.gnu.org/software/bison/.Google Scholar
- R. Grimm. Systems need languages need systems! In Proc. 2nd ECOOP Workshop on Programming Languages and Operating Systems, July 2005.Google Scholar
- ISO. Information technology---syntactic metalanguage---extended BNF. ISO/IEC Standard 14977, Aug. 1996.Google Scholar
- E. Kohler, M. F. Kaashoek, and D. R. Montgomery. A readable TCP in the Prolac protocol language. In Proc. 1999 ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communication, pp. 3--13, Aug. 1999. Google Scholar
Digital Library
- K. Lee, A. LaMarca, and C. Chambers. HydroJ: Object-oriented pattern matching for evolvable distributed systems. In Proc. 2003 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 205--223, Oct. 2003. Google Scholar
Digital Library
- J. R. Levine. lex & yacc. O'Reilly, Oct. 1992.Google Scholar
- D. Mazières. A toolkit for user-level file systems. In Proc. 2001 USENIX Annual Technical Conference, pp. 261--274, June 2001. Google Scholar
Digital Library
- S. McPeak and G. C. Necula. Elkhound: A fast, practical GLR parser generator. In Proc. 13th International Conference on Compiler Construction, vol. 2985 of LNCS, pp. 73--88. Springer, Mar. 2004.Google Scholar
Cross Ref
- F. Mérillon, L. Réveillère, C. Consel, R. Marlet, and G. Muller. Devil: An IDL for hardware programming. In Proc. 4th USENIX Symposium on Operating Systems Design and Implementation, pp. 17--30, Oct. 2000. Google Scholar
Digital Library
- T. Millstein. Practical predicate dispatch. In Proc. 2004 ACM Conference on Object-Oriented Programming Systems, Languages, and Applications, pp. 345--364, Oct. 2004. Google Scholar
Digital Library
- A. C. Myers. JFlow: Practical mostly-static information flow control. In Proc. 26th ACM Symposium on Principles of Programming Languages, pp. 228--241, Jan. 1999. Google Scholar
Digital Library
- N. Nystrom, M. R. Clarkson, and A. C. Myers. Polyglot: An extensible compiler framework for Java. In Proc. 12th International Conference on Compiler Construction, vol. 2622 of LNCS, pp. 138--152. Springer, Apr. 2003. Google Scholar
Digital Library
- J. Palsberg, K. Tao, and W. Wang. Java tree builder. http://compilers.cs.ucla.edu/jtb/.Google Scholar
- T. J. Parr and R. W. Quong. ANTLR: A predicated-LL(k) parser generator. Software---Practice and Experience, 25(7):789--810, July 1995. Google Scholar
Digital Library
- A. Rodriguez, C. Killian, S. Bhat, D. Kostić, and A. Vahdat. MACEDON: Methodology for automatically creating, evaluating, and designing overlay networks. In Proc. 1st ACM/USENIX Symposium on Networked Systems Design and Implementation, pp. 267--280, Mar. 2004. Google Scholar
Digital Library
- J. Roskind. Parsing C, the last word. http://groups-beta.google.com/group/comp.compilers/msg/c0797b5b668605b4, Jan. 1992.Google Scholar
- D. J. Salomon and G. V. Cormack. Scannerless NSLR(1) parsing of programming languages. In Proc. 1989 ACM Conference on Programming Language Design and Implementation, pp. 170--178, June 1989. Google Scholar
Digital Library
- M. Tomita, ed. Generalized LR Parsing. Kluwer, 1991. Google Scholar
Digital Library
- M. G. J. van den Brand, J. Scheerder, J. J. Vinju, and E. Visser. Disambiguation filters for scannerless generalized LR parsers. In Proc. 11th International Conference on Compiler Construction, vol. 2304 of LNCS, pp. 143--158. Springer, Apr. 2004. Google Scholar
Digital Library
- E. Visser. Syntax Definition for Language Prototyping. PhD thesis, University of Amsterdam, Sept. 1997.Google Scholar
- P. Wadler. Monads for functional programming. In Advanced Functional Programming, vol. 925 of LNCS, pp. 24--52. Springer, 1995. Google Scholar
Digital Library
- N. Wirth. What can we do about the unnecessary diversity of notation for syntactic definitions? Communications of the ACM, 20(11):822--823, Nov. 1977. Google Scholar
Digital Library
Index Terms
Better extensibility through modular syntax
Recommendations
Parsing expression grammars: a recognition-based syntactic foundation
POPL '04: Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of programming languagesFor decades we have been using Chomsky's generative system of grammars, particularly context-free grammars (CFGs) and regular expressions (REs), to express the syntax of programming languages and protocols. The power of generative grammars to express ...
Better extensibility through modular syntax
Proceedings of the 2006 PLDI ConferenceWe explore how to make the benefits of modularity available for syntactic specifications and present Rats!, a parser generator for Java that supports easily extensible syntax. Our parser generator builds on recent research on parsing expression grammars ...
Packrat parsers can handle practical grammars in mostly constant space
PASTE '10: Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineeringPackrat parsing is a powerful parsing algorithm presented by Ford in 2002. Packrat parsers can handle complicated grammars and recursive structures in lexical elements more easily than the traditional LL(k) or LR(1) parsing algorithms. However, packrat ...







Comments