Abstract
Parsers are ubiquitous in computing, and many applications depend on their performance for decoding data efficiently. Parser combinators are an intuitive tool for writing parsers: tight integration with the host language enables grammar specifications to be interleaved with processing of parse results. Unfortunately, parser combinators are typically slow due to the high overhead of the host language abstraction mechanisms that enable composition.
We present a technique for eliminating such overhead. We use staging, a form of runtime code generation, to dissociate input parsing from parser composition, and eliminate intermediate data structures and computations associated with parser composition at staging time. A key challenge is to maintain support for input dependent grammars, which have no clear stage distinction.
Our approach applies to top-down recursive-descent parsers as well as bottom-up non-deterministic parsers with key applications in dynamic programming on sequences, where we auto-generate code for parallel hardware. We achieve performance comparable to specialized, hand-written parsers.
- The Apache HTTP server project. http://httpd.apache.org/.Google Scholar
- L. Cartey, R. Lyngsø, and O. de Moor. Synthesising graphics card programs from DSLs. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 121--132, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- D.-J. Chang, C. Kimmer, and M. Ouyang. Accelerating the Nussinov RNA folding algorithm with CUDA/GPU. In Proceedings of the 10th IEEE International Symposium on Signal Processing and Information Technology, ISSPIT '10, pages 120--125, Washington, DC, USA, 2010. IEEE Computer Society. Google Scholar
Digital Library
- D. Coutts, R. Leshchinskiy, and D. Stewart. Stream fusion: From lists to streams to nothing at all. In Proceedings of the 12th ACM SIGPLAN International Conference on Functional Programming, ICFP '07, pages 315--326, New York, NY, USA, 2007. ACM. Google Scholar
Digital Library
- J. Eisner, E. Goldlust, and N. A. Smith. Dyna: A declarative language for implementing dynamic programs. In Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions, ACLdemo '04, Stroudsburg, PA, USA, 2004. Association for Computational Linguistics. Google Scholar
Digital Library
- B. Ford. Packrat parsing: Simple, powerful, lazy, linear time, functional pearl. In Proceedings of the Seventh ACM SIGPLAN International Conference on Functional Programming, ICFP '02, pages 36--47, New York, NY, USA, 2002. ACM. Google Scholar
Digital Library
- R. Frost. Monadic memoization towards correctness-preserving reduction of search. In Proceedings of the 16th Canadian Society for Computational Studies of Intelligence Conference on Advances in Artificial Intelligence, AI '03, pages 66--80, Berlin, Heidelberg, 2003. Springer. Google Scholar
Digital Library
- R. A. Frost and B. Szydlowski. Memoizing purely functional top-down backtracking language processors. Science of Computer Programming, 27(3):263--288, November 1996. Google Scholar
Digital Library
- Y. Futamura. Partial evaluation of computation process - an approach to a compiler-compiler. Higher-Order and Symbolic Computation, 12(4):381--391, 1999. Google Scholar
Digital Library
- R. Giegerich, C. Meyer, and P. Steffen. A discipline of dynamic programming over sequence data. Science of Computer Programming, 51(3):215--263, June 2004. Google Scholar
Digital Library
- R. Giegerich and G. Sauthoff. Yield grammar analysis in the Bellman's GAP compiler. In Proceedings of the Eleventh Workshop on Language Descriptions, Tools and Applications, LDTA '11, pages 7:1--7:8, New York, NY, USA, 2011. ACM. Google Scholar
Digital Library
- A. Gill and S. Marlow. Happy: The parser generator for Haskell. http://www.haskell.org/happy/, 2010.Google Scholar
- I. L. Hofacker. Vienna RNA secondary structure server. Nucleic Acids Research, 31(13):3429--3431, 2003.Google Scholar
Cross Ref
- C. Höner zu Siederdissen. Sneaking around concatmap: efficient combinators for dynamic programming. In Proceedings of the 17th ACM SIGPLAN international conference on Functional programming, ICFP '12, pages 215--226, New York, NY, USA, 2012. ACM. Google Scholar
Digital Library
- S. Janssen, C. Schudoma, G. Steger, and R. Giegerich. Lost in folding space? Comparing four variants of the thermodynamic model for RNA secondary structure prediction. BMC Bioinformatics, 12(429), 2011.Google Scholar
- S. C. Johnson. YACC: Yet Another Compiler-compiler, volume 32 of Computing Science Technical Report. Bell Laboratories, Murray Hill, NJ, 1975.Google Scholar
- P. Koopman and R. Plasmeijer. Efficient combinator parsers. In Implementation of Functional Languages, LNCS, pages 122--138, Berlin, Heidelberg, 1998. Springer. Google Scholar
Digital Library
- D. Leijen and E. Meijer. Parsec: Direct style monadic parser combinators for the real world. Technical Report UU-CS-2001-35, Department of Information and Computing Sciences, Utrecht University, 2001.Google Scholar
- Y. Liu, A. Wirawan, and B. Schmidt. CUDASW++ 3.0: Accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions. BMC Bioinformatics, 14:117, 2013.Google Scholar
Cross Ref
- A. Moors, F. Piessens, and M. Odersky. Parser combinators in Scala. CW Reports CW491, Department of Computer Science, K.U. Leuven, February 2008.Google Scholar
- M. Odersky, L. Spoon, and B. Venners. Programming in Scala: A Comprehensive Step-by-step Guide. Artima Incorporation, USA, 1st edition, 2008. Google Scholar
Digital Library
- T. J. Parr and R. W. Quong. ANTLR: A predicated-LL(k) parser generator. Softw., Pract. Exper., 25(7):789--810, 1995. Google Scholar
Digital Library
- A. Prokopec. Scalameter: Automate your performance testing today. http://scalameter.github.io/.Google Scholar
- T. Rompf, N. Amin, A. Moors, P. Haller, and M. Odersky. Scala-Virtualized: Linguistic reuse for deep embeddings. Higher Order and Symbolic Computation, August-September: 1-43, 2013.Google Scholar
- T. Rompf and M. Odersky. Lightweight modular staging: A pragmatic approach to runtime code generation and compiled DSLs. In Proceedings of the Ninth International Conference on Generative Programming and Component Engineering, GPCE '10, pages 127--136, New York, NY, USA, October 10-13 2010. ACM. Google Scholar
Digital Library
- T. Rompf, A. K. Sujeeth, N. Amin, K. J. Brown, V. Jovanovic, H. Lee, M. Jonnalagedda, K. Olukotun, and M. Odersky. Optimizing data structures in high-level programs: New directions for extensible compilers based on staging. In Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL '13, pages 497--510, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- E. F. d. O. Sandes and A. C. M. A. de Melo. CUDAlign: Using GPU to accelerate the comparison of megabase genomic sequences. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP '10, pages 137--146, New York, NY, USA, 2010. ACM. Google Scholar
Digital Library
- E. F. d. O. Sandes and A. C. M. A. de Melo. Smith-Waterman alignment of huge sequences with GPU in linear space. In Proceedings of the 25th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '11, pages 1199--1211, Washington, DC, USA, May 16-20 2011. IEEE Computer Society. Google Scholar
Digital Library
- E. F. d. O. Sandes and A. C. M. A. de Melo. Retrieving Smith-Waterman alignments with optimizations for megabase biological sequences using GPU. IEEE Transactions on Parallel and Distributed Systems, 24(5):1009--1021, 2013. Google Scholar
Digital Library
- G. Sauthoff. Bellman's GAP: a 2nd generation language and system for algebraic dynamic programming. PhD thesis, Bielefeld University, 2011.Google Scholar
- M. Sperber and P. Thiemann. The essence of LR parsing. In Proceedings of the 1995 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM '95, pages 146--155, New York, NY, USA, 1995. ACM. Google Scholar
Digital Library
- P. Steffen, R. Giegerich, and M. Giraud. Gpu parallelization of algebraic dynamic programming. In Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part II, PPAM '09, pages 290--299, Berlin, Heidelberg, 2010. Springer. Google Scholar
Digital Library
- K. Swadi, W. Taha, O. Kiselyov, and E. Pasalic. A monadic approach for avoiding code duplication when staging memoized functions. In Proceedings of the 2006 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, PEPM '06, pages 160--169, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- I. Sysoev. The nginx HTTP server. http://nginx.org/.Google Scholar
- W. Taha and T. Sheard. MetaML and multi-stage programming with explicit annotations. Theoretical Computer Science, 248(1-2):211--242, 2000. Google Scholar
Digital Library
- P. Wadler. Monads for functional programming. In Advanced Functional Programming, First International Spring School on Advanced Functional Programming Techniques, Tutorial Text, volume 925 of LNCS, pages 24--52, Berlin, Heidelberg, May 24-30 1995. Springer. Google Scholar
Digital Library
- A. Warth, J. R. Douglass, and T. Millstein. Packrat parsers can support left recursion. In Proceedings of the 2008 ACM SIGPLAN Symposium on Partial Evaluation and Semanticsbased Program Manipulation, PEPM '08, pages 103--110, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- C.-C. Wu, J.-Y. Ke, H. Lin, and W. chun Feng. Optimizing dynamic programming on graphics processing units via adaptive thread-level parallelism. In Proceedings of the 17th IEEE International Conference on Parallel and Distributed Systems, ICPADS '11, pages 96--103, Washington, DC, USA, December 7-9 2011. IEEE Computer Society. Google Scholar
Digital Library
- T. Würthinger, C. Wimmer, A. Wöß, L. Stadler, G. Duboscq, C. Humer, G. Richards, D. Simon, and M. Wolczko. One VM to rule them all. In Proceedings of the 2013 ACM International Symposium on New Ideas, New Paradigms, and Reflections on Programming & Software, Onward! '13, pages 187--204, New York, NY, USA, 2013. ACM. Google Scholar
Digital Library
- S. Xiao and W. chun Feng. Inter-block GPU communication via fast barrier synchronization. In Proceedings of the 24th IEEE International Symposium on Parallel and Distributed Processing, IPDPS '10, pages 1--12, Washington, DC, USA, April 19-23 2010. IEEE Computer Society.Google Scholar
Index Terms
Staged parser combinators for efficient data processing
Recommendations
Staged parser combinators for efficient data processing
OOPSLA '14: Proceedings of the 2014 ACM International Conference on Object Oriented Programming Systems Languages & ApplicationsParsers are ubiquitous in computing, and many applications depend on their performance for decoding data efficiently. Parser combinators are an intuitive tool for writing parsers: tight integration with the host language enables grammar specifications ...
Practical, general parser combinators
PEPM '16: Proceedings of the 2016 ACM SIGPLAN Workshop on Partial Evaluation and Program ManipulationParser combinators are a popular approach to parsing where context-free grammars are represented as executable code. However, conventional parser combinators do not support left recursion, and can have worst-case exponential runtime. These limitations ...
Efficient parsing with parser combinators
AbstractParser combinators offer a universal and flexible approach to parsing. They follow the structure of an underlying grammar, are modular, well-structured, easy to maintain, and can recognize a large variety of languages including context-...
Highlights- A performance analysis of bottlenecks of parser combinators.
- A description of ...







Comments