skip to main content
article
Public Access

On the complexity and performance of parsing with derivatives

Published:02 June 2016Publication History
Skip Abstract Section

Abstract

Current algorithms for context-free parsing inflict a trade-off between ease of understanding, ease of implementation, theoretical complexity, and practical performance. No algorithm achieves all of these properties simultaneously. Might et al. introduced parsing with derivatives, which handles arbitrary context-free grammars while being both easy to understand and simple to implement. Despite much initial enthusiasm and a multitude of independent implementations, its worst-case complexity has never been proven to be better than exponential. In fact, high-level arguments claiming it is fundamentally exponential have been advanced and even accepted as part of the folklore. Performance ended up being sluggish in practice, and this sluggishness was taken as informal evidence of exponentiality. In this paper, we reexamine the performance of parsing with derivatives. We have discovered that it is not exponential but, in fact, cubic. Moreover, simple (though perhaps not obvious) modifications to the implementation by Might et al. lead to an implementation that is not only easy to understand but also highly performant in practice.

References

  1. Bison. Bison. URL https://www.gnu.org/software/bison/. Janusz A. Brzozowski. Derivatives of regular expressions. Journal of the ACM (JACM), 11(4):481–494, October 1964. ISSN 0004- 5411. doi: 10.1145/321239.321249. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. William Byrd. relational-parsing-with-derivatives, 2013. URL https://github.com/webyrd/ relational-parsing-with-derivatives. Russ Cox. Yacc is not dead. Blog, December 2010. URL http://research.swtch.com/yaccalive. Jay Earley. An Efficient Context-Free Parsing Algorithm. PhD thesis, Carnegie Mellon University, 1968. URL http://reports-archive.adm.cs.cmu.edu/anon/anon/ usr/ftp/scan/CMU-CS-68-earley.pdf. Jay Earley. An efficient context-free parsing algorithm. Communications of the ACM, 13(2):94–102, February 1970. ISSN 0001- 0782. doi: 10.1145/362007.362035.Google ScholarGoogle Scholar
  3. Mark Engelberg. instaparse, 2015. URL https://github. com/Engelberg/instaparse. Abraham Flaxman, Aram W. Harrow, and Gregory B. Sorkin. Strings with maximally many distinct subsequences and substrings. The Electronic Journal of Combanatorics, 11(1):R8, 2004. ISSN 1077-8926. URL http://www.combinatorics. org/ojs/index.php/eljc/article/view/v11i1r8. Bryan Ford. Parsing expression grammars: a recognition-based syntactic foundation. In Proceedings of the 31st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL ’04, pages 111–122, New York, NY, USA, January 2004. ACM. ISBN 1-58113-729-X. doi: 10.1145/ 964001.964011.Google ScholarGoogle Scholar
  4. Gary A. Kildall. A unified approach to global program optimization. In Proceedings of the 1st Annual ACM SIGACTSIGPLAN Symposium on Principles of Programming Languages, POPL ’73, pages 194–206, New York, NY, USA, October 1973. ACM. doi: 10.1145/512927.512945. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dexter Kozen. A completeness theorem for kleene algebras and the algebra of regular events. Information and Computation, 110(2): 366–390, May 1994. ISSN 0890-5401. doi: 10.1006/inco.1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. 1037.Google ScholarGoogle Scholar
  7. Bernard Lang. Deterministic techniques for efficient nondeterministic parsers. In Prof. Dr.-Ing. J. Loeckx, editor, Automata, Languages and Programming, volume 14 of Lecture Notes in Computer Science, pages 255–269. Springer Berlin Heidelberg, 1974. ISBN 978-3-540-06841-9. doi: 10.1007/ 3-540-06841-4_65. Tommy McGuire. Java-Parser-Derivatives, 2012. URL https://github.com/tmmcguire/ Java-Parser-Derivatives. Gary H. Merrill. Parsing non-LR(k) grammars with yacc. Software: Practice and Experience, 23(8):829–850, August 1993. ISSN 1097-024X. doi: 10.1002/spe.4380230803. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Matthew Might. derp documentation, 2013. URL http://matt.might.net/teaching/compilers/ spring-2013/derp.html. Matthew Might, David Darais, and Daniel Spiewak. Parsing with derivatives: a functional pearl. In Proceedings of the 16th ACM SIGPLAN International Conference on Functional Programming, ICFP ’11, pages 189–195, New York, NY, USA, September 2011. ACM. ISBN 978-1-4503-0865-6. doi: 10. 1145/2034773.2034801. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Russell Mull. parsing-with-derivatives, 2013. URL https: //github.com/mullr/parsing-with-derivatives. Scott Owens, John Reppy, and Aaron Turon. Regular-expression derivatives re-examined. Journal of Functional Programming, 19(02):173–190, March 2009. ISSN 1469-7653. doi: 10.1017/ S0956796808007090. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Per Vognsen. parser, 2012. URL https://gist.github.com/ pervognsen/815b208b86066f6d7a00. Introduction Background The bzd Derivative Parsing Expressions Derivatives of Parsing Expressions Nullability Derivatives of Context-free Languages Representation Computation Performance Complexity Analysis Total Running Time in Terms of Grammar Nodes Grammar Nodes in Terms of Input Length Running Time in Terms of Input Length Improving Performance in Practice Benchmarks Computing Fixed Points Compaction Right-hand Children of Sequence Nodes Canonicalizing Chains of Sequence Nodes Avoiding Separate Passes Hash Tables and Memoization Conclusion ReferencesGoogle ScholarGoogle Scholar

Index Terms

  1. On the complexity and performance of parsing with derivatives

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 51, Issue 6
      PLDI '16
      June 2016
      726 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2980983
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        PLDI '16: Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation
        June 2016
        726 pages
        ISBN:9781450342612
        DOI:10.1145/2908080
        • General Chair:
        • Chandra Krintz,
        • Program Chair:
        • Emery Berger

      Copyright © 2016 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 June 2016

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!