Abstract
Several popular languages, such as Haskell, Python, and F#, use the indentation and layout of code as part of their syntax. Because context-free grammars cannot express the rules of indentation, parsers for these languages currently use ad hoc techniques to handle layout. These techniques tend to be low-level and operational in nature and forgo the advantages of more declarative specifications like context-free grammars. For example, they are often coded by hand instead of being generated by a parser generator.
This paper presents a simple extension to context-free grammars that can express these layout rules, and derives GLR and LR(k) algorithms for parsing these grammars. These grammars are easy to write and can be parsed efficiently. Examples for several languages are presented, as are benchmarks showing the practical efficiency of these algorithms.
Supplemental Material
- base version 4.5.1.0, June 2012. URL http://hackage.haskell.org/package/base/.Google Scholar
- Sam Anklesaria. indents version 0.3.3, May 2012. URL http://hackage.haskell.org/package/indents/.Google Scholar
- Net}yamlOren Ben-Kiki, Clark Evans, and Ingy döt Net. phYAML Ain't Markup Language (YAML) Version 1.2, 3rd edition, October 2009. URL http://www.yaml.org/spec/1.2/spec.html.Google Scholar
- bacher(2006)}indent-sens-langsLeonhard Brunauer and Bernhard Mühlbacher. Indentation sensitive languages. Unpublished manuscript, July 2006. URL http://www.cs.uni-salzburg.at/ ck/wiki/uploads/TCS-Summer-2006.Indentat%ionSensitiveLanguages/.Google Scholar
- Janusz A. Brzozowski. Derivatives of regular expressions. phJournal of the ACM (JACM), 11 (4): 481--494, October 1964. ISSN 0004--5411. 10.1145/321239.321249. Google Scholar
Digital Library
- ner, and Ostermann}sugarj-indentSebastian Erdweg, Tillmann Rendel, Christian K\"astner, and Klaus Ostermann. Layout-sensitive generalized parsing. In phSoftware Language Engineering, Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2012. URL http://sugarj.org/layout-parsing.pdf. To appear.Google Scholar
- phThe Glorious Glasgow Haskell Compilation System User's Guide, Version 7.2.1. The GHC Team, August 2011. URL http://www.haskell.org/ghc/docs/7.2.1/html/users_guide/.Google Scholar
- David Goodger. phreStructuredText Markup Specification, January 2012. URL http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html. Revision 7302.Google Scholar
- John Gruber. phMarkdown: Syntax. URL http://daringfireball.net/projects/markdown/syntax. Retrieved on June 24, 2012.Google Scholar
- 006)}curryMichael Hanus (ed.). Curry: An integrated functional logic language (version 0.8.2). Technical report, March 2006. URL http://www.informatik.uni-kiel.de/ curry/report.html.Google Scholar
- 010)}habitHASP Project. The Habit programming language: The revised preliminary report, November 2010. URL http://hasp.cs.pdx.edu/habit-report-Nov2010.pdf.Google Scholar
- Graham Hutton. Higher-order functions for parsing. phJournal of Functional Programming, 2 (03): 323--343, July 1992. 10.1017/S0956796800000411.Google Scholar
Cross Ref
- Graham Hutton and Erik Meijer. Monadic parser combinators. Technical Report NOTTCS-TR-96--4, Department of Computer Science, University of Nottingham, 1996.Google Scholar
- 984)}occamINMOS Limited. phoccam programming manual. Prentice-Hall international series in computer science. Prentice-Hall International, 1984. ISBN 978-0--13--629296--8. Google Scholar
Digital Library
- Mark P. Jones. The implementation of the Gofer functional programming system. Research Report YALEU/DCS/RR-1030, Yale University, New Haven, Connecticut, USA, May 1994.Google Scholar
- Donald E. Knuth. On the translation of languages from left to right. phInformation and Control, 8 (6): 607--639, December 1965. ISSN 0019--9958. 10.1016/S0019--9958(65)90426--2.Google Scholar
Cross Ref
- Piyush P. Kurur. indentparser version 0.1, January 2012. URL http://hackage.haskell.org/package/indentparser/.Google Scholar
- P. J. Landin. The next 700 programming languages. phCommunications of the ACM, 9 (3): 157--166, March 1966. ISSN 0001-0782. 10.1145/365230.365257. Google Scholar
Digital Library
- Daan Leijen and Paolo Martini. parsec version 3.1.3, June 2012. URL http://hackage.haskell.org/package/parsec/.Google Scholar
- Simon Marlow and Andy Gill. phHappy User Guide, 2009. URL http://www.haskell.org/happy/doc/html/. For Happy version 1.18.Google Scholar
- Simon Marlow, Sven Panne, and Noel Winstanley. haskell-src version 1.0.1.5, November 2011. URL http://hackage.haskell.org/package/haskell-src.Google Scholar
- 010)}haskell2010Simon Marlow (ed.). phHaskell 2010 Language Report, April 2010. URL http://www.haskell.org/onlinereport/haskell2010/.Google Scholar
- er(2005)}srfi-49Egil Möller. phSRFI-49: Indentation-sensitive syntax, May 2005. URL http://srfi.schemers.org/srfi-49/srfi-49.html.Google Scholar
- Python. phThe Python Language Reference. URL http://docs.python.org/reference/. Retrieved on June 26, 2012.Google Scholar
- S. Doaitse Swierstra. uulib version 0.9.14, August 2011. URL http://hackage.haskell.org/package/uulib/.Google Scholar
- Don Syme et al. phThe F\# 2.0 Language Specification. Microsoft Corporation, April 2010. URL https://research.microsoft.com/en-us/um/cambridge/projects/fsharp/manua%l/spec.html. Updated April 2012.Google Scholar
- Masaru Tomita. phEfficient Parsing for Natural Language: A Fast Algorithm for Practical Systems. Kluwer International Series in Engineering and Computer Science. Kluwer Academic Publishers, 1985. ISBN 978-0--89838--202-0. Google Scholar
Digital Library
- D. A. Turner. phMiranda System Manual. Research Software Limited, 1989. URL http://www.cs.kent.ac.uk/people/staff/dat/miranda/manual/.Google Scholar
- Philip Wadler. An introduction to Orwell. Technical report, Programming Research Group at Oxford University, 1985.Google Scholar
Index Terms
Principled parsing for indentation-sensitive languages: revisiting landin's offside rule
Recommendations
Indentation-sensitive parsing for Parsec
Haskell '14Several popular languages including Haskell and Python use the indentation and layout of code as an essential part of their syntax. In the past, implementations of these languages used ad hoc techniques to implement layout. Recent work has shown that a ...
Principled parsing for indentation-sensitive languages: revisiting landin's offside rule
POPL '13: Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languagesSeveral popular languages, such as Haskell, Python, and F#, use the indentation and layout of code as part of their syntax. Because context-free grammars cannot express the rules of indentation, parsers for these languages currently use ad hoc ...
Indentation-sensitive parsing for Parsec
Haskell '14: Proceedings of the 2014 ACM SIGPLAN symposium on HaskellSeveral popular languages including Haskell and Python use the indentation and layout of code as an essential part of their syntax. In the past, implementations of these languages used ad hoc techniques to implement layout. Recent work has shown that a ...







Comments