skip to main content
research-article

Staged parser combinators for efficient data processing

Published:15 October 2014Publication History
Skip Abstract Section

Abstract

Parsers are ubiquitous in computing, and many applications depend on their performance for decoding data efficiently. Parser combinators are an intuitive tool for writing parsers: tight integration with the host language enables grammar specifications to be interleaved with processing of parse results. Unfortunately, parser combinators are typically slow due to the high overhead of the host language abstraction mechanisms that enable composition.

We present a technique for eliminating such overhead. We use staging, a form of runtime code generation, to dissociate input parsing from parser composition, and eliminate intermediate data structures and computations associated with parser composition at staging time. A key challenge is to maintain support for input dependent grammars, which have no clear stage distinction.

Our approach applies to top-down recursive-descent parsers as well as bottom-up non-deterministic parsers with key applications in dynamic programming on sequences, where we auto-generate code for parallel hardware. We achieve performance comparable to specialized, hand-written parsers.

References

  1. The Apache HTTP server project. http://httpd.apache.org/.Google ScholarGoogle Scholar
  2. L. Cartey, R. Lyngsø, and O. de Moor. Synthesising graphics card programs from DSLs. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 121--132, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. D.-J. Chang, C. Kimmer, and M. Ouyang. Accelerating the Nussinov RNA folding algorithm with CUDA/GPU. In Proceedings of the 10th IEEE International Symposium on Signal Processing and Information Technology, ISSPIT '10, pages 120--125, Washington, DC, USA, 2010. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP '07, pages 315--326, New York, NY, USA, 2007. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. J. Eisner, E. Goldlust, and N. A. Smith. Dyna: A declarative language for implementing dynamic programs. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, ACLdemo '04, Stroudsburg, PA, USA, 2004. Association for Computational Linguistics. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. B. Ford. Packrat parsing: Simple, powerful, lazy, linear time, functional pearl. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming, ICFP '02, pages 36--47, New York, NY, USA, 2002. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. R. Frost. Monadic memoization towards correctness-preserving reduction of search. In Proceedings of the 16th Canadian Society for Computational Studies of Intelligence Conference on Advances in Artificial Intelligence, AI '03, pages 66--80, Berlin, Heidelberg, 2003. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. R. A. Frost and B. Szydlowski. Memoizing purely functional top-down backtracking language processors. Science of Computer Programming, 27(3):263--288, November 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Futamura. Partial evaluation of computation process - an approach to a compiler-compiler. Higher-Order and Symbolic Computation, 12(4):381--391, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Giegerich, C. Meyer, and P. Steffen. A discipline of dynamic programming over sequence data. Science of Computer Programming, 51(3):215--263, June 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. R. Giegerich and G. Sauthoff. Yield grammar analysis in the Bellman's GAP compiler. In Proceedings of the Eleventh Workshop on Language Descriptions, Tools and Applications, LDTA '11, pages 7:1--7:8, New York, NY, USA, 2011. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Gill and S. Marlow. Happy: The parser generator for Haskell. http://www.haskell.org/happy/, 2010.Google ScholarGoogle Scholar
  13. I. L. Hofacker. Vienna RNA secondary structure server. Nucleic Acids Research, 31(13):3429--3431, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  14. C. Höner zu Siederdissen. Sneaking around concatmap: efficient combinators for dynamic programming. In Proceedings of the 17th ACM SIGPLAN international conference on Functional programming, ICFP '12, pages 215--226, New York, NY, USA, 2012. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. S. Janssen, C. Schudoma, G. Steger, and R. Giegerich. Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction. BMC Bioinformatics, 12(429), 2011.Google ScholarGoogle Scholar
  16. S. C. Johnson. YACC: Yet Another Compiler-compiler, volume 32 of Computing Science Technical Report. Bell Laboratories, Murray Hill, NJ, 1975.Google ScholarGoogle Scholar
  17. P. Koopman and R. Plasmeijer. Efficient combinator parsers. In Implementation of Functional Languages, LNCS, pages 122--138, Berlin, Heidelberg, 1998. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Leijen and E. Meijer. Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-35, Department of Information and Computing Sciences, Utrecht University, 2001.Google ScholarGoogle Scholar
  19. Y. Liu, A. Wirawan, and B. Schmidt. CUDASW++ 3.0: Accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, 14:117, 2013.Google ScholarGoogle ScholarCross RefCross Ref
  20. A. Moors, F. Piessens, and M. Odersky. Parser combinators in Scala. CW Reports CW491, Department of Computer Science, K.U. Leuven, February 2008.Google ScholarGoogle Scholar
  21. M. Odersky, L. Spoon, and B. Venners. Programming in Scala: A Comprehensive Step-by-step Guide. Artima Incorporation, USA, 1st edition, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. J. Parr and R. W. Quong. ANTLR: A predicated-LL(k) parser generator. Softw., Pract. Exper., 25(7):789--810, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A. Prokopec. Scalameter: Automate your performance testing today. http://scalameter.github.io/.Google ScholarGoogle Scholar
  24. T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scala-Virtualized: Linguistic reuse for deep embeddings. Higher Order and Symbolic Computation, August-September: 1-43, 2013.Google ScholarGoogle Scholar
  25. T. Rompf and M. Odersky. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs. In Proceedings of the Ninth International Conference on Generative Programming and Component Engineering, GPCE '10, pages 127--136, New York, NY, USA, October 10-13 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Rompf, A. K. Sujeeth, N. Amin, K. J. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: New directions for extensible compilers based on staging. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '13, pages 497--510, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. E. F. d. O. Sandes and A. C. M. A. de Melo. CUDAlign: Using GPU to accelerate the comparison of megabase genomic sequences. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 137--146, New York, NY, USA, 2010. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. E. F. d. O. Sandes and A. C. M. A. de Melo. Smith-Waterman alignment of huge sequences with GPU in linear space. In Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '11, pages 1199--1211, Washington, DC, USA, May 16-20 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. E. F. d. O. Sandes and A. C. M. A. de Melo. Retrieving Smith-Waterman alignments with optimizations for megabase biological sequences using GPU. IEEE Transactions on Parallel and Distributed Systems, 24(5):1009--1021, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. G. Sauthoff. Bellman's GAP: a 2nd generation language and system for algebraic dynamic programming. PhD thesis, Bielefeld University, 2011.Google ScholarGoogle Scholar
  31. M. Sperber and P. Thiemann. The essence of LR parsing. In Proceedings of the 1995 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM '95, pages 146--155, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. P. Steffen, R. Giegerich, and M. Giraud. Gpu parallelization of algebraic dynamic programming. In Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II, PPAM '09, pages 290--299, Berlin, Heidelberg, 2010. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. K. Swadi, W. Taha, O. Kiselyov, and E. Pasalic. A monadic approach for avoiding code duplication when staging memoized functions. In Proceedings of the 2006 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM '06, pages 160--169, New York, NY, USA, 2006. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. I. Sysoev. The nginx HTTP server. http://nginx.org/.Google ScholarGoogle Scholar
  35. W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science, 248(1-2):211--242, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. P. Wadler. Monads for functional programming. In Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques, Tutorial Text, volume 925 of LNCS, pages 24--52, Berlin, Heidelberg, May 24-30 1995. Springer. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. A. Warth, J. R. Douglass, and T. Millstein. Packrat parsers can support left recursion. In Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semanticsbased Program Manipulation, PEPM '08, pages 103--110, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. C.-C. Wu, J.-Y. Ke, H. Lin, and W. chun Feng. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS '11, pages 96--103, Washington, DC, USA, December 7-9 2011. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Würthinger, C. Wimmer, A. Wöß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko. One VM to rule them all. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! '13, pages 187--204, New York, NY, USA, 2013. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. S. Xiao and W. chun Feng. Inter-block GPU communication via fast barrier synchronization. In Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '10, pages 1--12, Washington, DC, USA, April 19-23 2010. IEEE Computer Society.Google ScholarGoogle Scholar

Index Terms

  1. Staged parser combinators for efficient data processing

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 49, Issue 10
        OOPSLA '14
        October 2014
        907 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2714064
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & Applications
          October 2014
          946 pages
          ISBN:9781450325851
          DOI:10.1145/2660193

        Copyright © 2014 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 15 October 2014

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!