skip to main content
research-article

DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations

Published:14 January 2015Publication History
Skip Abstract Section

Abstract

We present DReX, a declarative language that can express all regular string-to string transformations, and can still be efficiently evaluated. The class of regular string transformations has a robust theoretical foundation including multiple characterizations, closure properties, and decidable analysis questions, and admits a number of string operations such as insertion, deletion, substring swap, and reversal. Recent research has led to a characterization of regular string transformations using a primitive set of function combinators analogous to the definition of regular languages using regular expressions. While these combinators form the basis for the language DReX proposed in this paper, our main technical focus is on the complexity of evaluating the output of a DReX program on a given input string. It turns out that the natural evaluation algorithm involves dynamic programming, leading to complexity that is cubic in the length of the input string. Our main contribution is identifying a consistency restriction on the use of combinators in DReX programs, and a single-pass evaluation algorithm for consistent programs with time complexity that is linear in the length of the input string and polynomial in the size of the program. We show that the consistency restriction does not limit the expressiveness, and whether a DReX program is consistent can be checked efficiently. We report on a prototype implementation, and evaluate it using a representative set of text processing tasks.

Skip Supplemental Material Section

Supplemental Material

p125-sidebyside.mpg

References

  1. R. Alur and P. Černy. Streaming transducers for algorithmic verification of single-pass list-processing programs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 599--610. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. R. Alur and L. D'Antoni. Streaming tree transducers. In A. Czumaj, K. Mehlhorn, A. Pitts, and R. Wattenhofer, editors, Automata, Languages, and Programming, volume 7392 of Lecture Notes in Computer Science, pages 42--53. Springer, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. R. Alur, A. Freilich, and M. Raghothaman. Regular combinators for string transformations. In Proceedings of the Joint Meeting of the 23rd EACSL Annual Conference on Computer Science Logic (CSL) and the 29th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), CSL-LICS '14, pages 9:1--9:10. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Becker. Streaming transformations for xml-stx. In XMIDX, volume 24 of LNI, pages 83--88. GI, 2003.Google ScholarGoogle Scholar
  5. A. Bohannon, N. Foster, B. Pierce, A. Pilkiewicz, and A. Schmitt. Boomerang: Resourceful lenses for string data. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 407--419. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bojańczyk. Transducers with origin information. In Automata, Languages, and Programming, volume 8573 of Lecture Notes in Computer Science, pages 26--37. Springer, 2014.Google ScholarGoogle ScholarCross RefCross Ref
  7. R. Book, S. Even, S. Greibach, and G. Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, 20(2):149--153, February 1971. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. A. Brüggemann-Klein. Regular expressions into finite automata. In LATIN '92, volume 583 of Lecture Notes in Computer Science, pages 87--98. Springer, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Chytil and V. Jákl. Serial composition of 2-way finite-state transducers and simple programs on strings. In Automata, Languages, and Programming, volume 52 of Lecture Notes in Computer Science, pages 135--147. Springer, 1977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. B. Courcelle. Monadic second-order definable graph transductions: a survey. Theoretical Computer Science, 126(1):53--75, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. L. D'Antoni and R. Alur. Symbolic visibly pushdown automata. In Computer Aided Verification, volume 8559 of Lecture Notes in Computer Science, pages 209--225. Springer, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. L. D'Antoni and M. Veanes. Equivalence of extended symbolic finite transducers. In Computer Aided Verification, volume 8044 of Lecture Notes in Computer Science, pages 624--639. Springer, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. L. D'Antoni and M. Veanes. Static analysis of string encoders and decoders. In Verification, Model Checking, and Abstract Interpretation, volume 7737 of Lecture Notes in Computer Science, pages 209--228. Springer, 2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. L. D'Antoni and M. Veanes. Minimization of symbolic automata. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 541--553, New York, NY, USA, 2014. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. L. D'Antoni, M. Veanes, B. Livshits, and D. Molnar. Fast: A transducer- based language for tree manipulation. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 384--394. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. J. Engelfriet and H. J. Hoogeboom. MSO definable string transductions and two-way finite-state transducers. ACM Transactions on Computational Logic, 2(2):216--254, April 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Engelfriet and S. Maneth. Macro tree transducers, attribute grammars, and MSO definable tree translations. Information and Computation, 154(1):34--91, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Engelfriet, G. Rozenberg, and G. Slutzki. Tree transducers, L systems, and two-way machines. Journal of Computer and System Sciences, 20(2):150--202, 1980.Google ScholarGoogle ScholarCross RefCross Ref
  19. J. Engelfriet and H. Vogler. Macro tree transducers. Journal of Computer and System Sciences, 31(1):71--146, 1985.Google ScholarGoogle ScholarCross RefCross Ref
  20. S. Gulwani. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, pages 317--330. ACM, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E. Gurari. The equivalence problem for deterministic two-way sequential transducers is decidable. In 21st Annual Symposium on Foundations of Computer Science, pages 83--85, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Mytkowicz, M. Musuvathi, and W. Schulte. Data-parallel finite- state machines. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 529--542. ACM, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. G. Rosu. An effective algorithm for the membership problem for extended regular expressions. In Foundations of Software Science and Computational Structures, volume 4423 of Lecture Notes in Computer Science, pages 332--345. Springer, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. Sipser. Introduction to the Theory of Computation. Cengage Learning, 3rd edition, 2012.Google ScholarGoogle Scholar
  25. R. Stearns and H. Hunt. On the equivalence and containment problems for unambiguous regular expressions, grammars, and automata. In Proceedings of the 22nd Annual Symposium on Foundations of Computer Science, pages 74--81. IEEE Computer Society, 1981. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjorner. Symbolic finite state transducers: Algorithms and applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 137--150. ACM, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 50, Issue 1
        POPL '15
        January 2015
        682 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2775051
        • Editor:
        • Andy Gill
        Issue’s Table of Contents
        • cover image ACM Conferences
          POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
          January 2015
          716 pages
          ISBN:9781450333009
          DOI:10.1145/2676726

        Copyright © 2015 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 14 January 2015

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!