Abstract
We present DReX, a declarative language that can express all regular string-to string transformations, and can still be efficiently evaluated. The class of regular string transformations has a robust theoretical foundation including multiple characterizations, closure properties, and decidable analysis questions, and admits a number of string operations such as insertion, deletion, substring swap, and reversal. Recent research has led to a characterization of regular string transformations using a primitive set of function combinators analogous to the definition of regular languages using regular expressions. While these combinators form the basis for the language DReX proposed in this paper, our main technical focus is on the complexity of evaluating the output of a DReX program on a given input string. It turns out that the natural evaluation algorithm involves dynamic programming, leading to complexity that is cubic in the length of the input string. Our main contribution is identifying a consistency restriction on the use of combinators in DReX programs, and a single-pass evaluation algorithm for consistent programs with time complexity that is linear in the length of the input string and polynomial in the size of the program. We show that the consistency restriction does not limit the expressiveness, and whether a DReX program is consistent can be checked efficiently. We report on a prototype implementation, and evaluate it using a representative set of text processing tasks.
Supplemental Material
- R. Alur and P. Černy. Streaming transducers for algorithmic verification of single-pass list-processing programs. In Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 599--610. ACM, 2011. Google Scholar
Digital Library
- R. Alur and L. D'Antoni. Streaming tree transducers. In A. Czumaj, K. Mehlhorn, A. Pitts, and R. Wattenhofer, editors, Automata, Languages, and Programming, volume 7392 of Lecture Notes in Computer Science, pages 42--53. Springer, 2012. Google Scholar
Digital Library
- R. Alur, A. Freilich, and M. Raghothaman. Regular combinators for string transformations. In Proceedings of the Joint Meeting of the 23rd EACSL Annual Conference on Computer Science Logic (CSL) and the 29th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS), CSL-LICS '14, pages 9:1--9:10. ACM, 2014. Google Scholar
Digital Library
- O. Becker. Streaming transformations for xml-stx. In XMIDX, volume 24 of LNI, pages 83--88. GI, 2003.Google Scholar
- A. Bohannon, N. Foster, B. Pierce, A. Pilkiewicz, and A. Schmitt. Boomerang: Resourceful lenses for string data. In Proceedings of the 35th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 407--419. ACM, 2008. Google Scholar
Digital Library
- M. Bojańczyk. Transducers with origin information. In Automata, Languages, and Programming, volume 8573 of Lecture Notes in Computer Science, pages 26--37. Springer, 2014.Google Scholar
Cross Ref
- R. Book, S. Even, S. Greibach, and G. Ott. Ambiguity in graphs and expressions. IEEE Transactions on Computers, 20(2):149--153, February 1971. Google Scholar
Digital Library
- A. Brüggemann-Klein. Regular expressions into finite automata. In LATIN '92, volume 583 of Lecture Notes in Computer Science, pages 87--98. Springer, 1992. Google Scholar
Digital Library
- M. Chytil and V. Jákl. Serial composition of 2-way finite-state transducers and simple programs on strings. In Automata, Languages, and Programming, volume 52 of Lecture Notes in Computer Science, pages 135--147. Springer, 1977. Google Scholar
Digital Library
- B. Courcelle. Monadic second-order definable graph transductions: a survey. Theoretical Computer Science, 126(1):53--75, 1994. Google Scholar
Digital Library
- L. D'Antoni and R. Alur. Symbolic visibly pushdown automata. In Computer Aided Verification, volume 8559 of Lecture Notes in Computer Science, pages 209--225. Springer, 2014. Google Scholar
Digital Library
- L. D'Antoni and M. Veanes. Equivalence of extended symbolic finite transducers. In Computer Aided Verification, volume 8044 of Lecture Notes in Computer Science, pages 624--639. Springer, 2013. Google Scholar
Digital Library
- L. D'Antoni and M. Veanes. Static analysis of string encoders and decoders. In Verification, Model Checking, and Abstract Interpretation, volume 7737 of Lecture Notes in Computer Science, pages 209--228. Springer, 2013.Google Scholar
Digital Library
- L. D'Antoni and M. Veanes. Minimization of symbolic automata. In Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 541--553, New York, NY, USA, 2014. ACM. Google Scholar
Digital Library
- L. D'Antoni, M. Veanes, B. Livshits, and D. Molnar. Fast: A transducer- based language for tree manipulation. In Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 384--394. ACM, 2014. Google Scholar
Digital Library
- J. Engelfriet and H. J. Hoogeboom. MSO definable string transductions and two-way finite-state transducers. ACM Transactions on Computational Logic, 2(2):216--254, April 2001. Google Scholar
Digital Library
- J. Engelfriet and S. Maneth. Macro tree transducers, attribute grammars, and MSO definable tree translations. Information and Computation, 154(1):34--91, 1999. Google Scholar
Digital Library
- J. Engelfriet, G. Rozenberg, and G. Slutzki. Tree transducers, L systems, and two-way machines. Journal of Computer and System Sciences, 20(2):150--202, 1980.Google Scholar
Cross Ref
- J. Engelfriet and H. Vogler. Macro tree transducers. Journal of Computer and System Sciences, 31(1):71--146, 1985.Google Scholar
Cross Ref
- S. Gulwani. Automating string processing in spreadsheets using input-output examples. In Proceedings of the 38th Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Languages, pages 317--330. ACM, 2011. Google Scholar
Digital Library
- E. Gurari. The equivalence problem for deterministic two-way sequential transducers is decidable. In 21st Annual Symposium on Foundations of Computer Science, pages 83--85, 1980. Google Scholar
Digital Library
- T. Mytkowicz, M. Musuvathi, and W. Schulte. Data-parallel finite- state machines. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, pages 529--542. ACM, 2014. Google Scholar
Digital Library
- G. Rosu. An effective algorithm for the membership problem for extended regular expressions. In Foundations of Software Science and Computational Structures, volume 4423 of Lecture Notes in Computer Science, pages 332--345. Springer, 2007. Google Scholar
Digital Library
- M. Sipser. Introduction to the Theory of Computation. Cengage Learning, 3rd edition, 2012.Google Scholar
- R. Stearns and H. Hunt. On the equivalence and containment problems for unambiguous regular expressions, grammars, and automata. In Proceedings of the 22nd Annual Symposium on Foundations of Computer Science, pages 74--81. IEEE Computer Society, 1981. Google Scholar
Digital Library
- M. Veanes, P. Hooimeijer, B. Livshits, D. Molnar, and N. Bjorner. Symbolic finite state transducers: Algorithms and applications. In Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pages 137--150. ACM, 2012. Google Scholar
Digital Library
Index Terms
DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations
Recommendations
MSO definable string transductions and two-way finite-state transducers
We extend a classic result of Büchi, Elgot, and Trakhtenbrot: MSO definable string transductions i.e., string-to-string functions that are definable by an interpretation using monadic second-order (MSO) logic, are exactly those realized by deterministic ...
Regular combinators for string transformations
CSL-LICS '14: Proceedings of the Joint Meeting of the Twenty-Third EACSL Annual Conference on Computer Science Logic (CSL) and the Twenty-Ninth Annual ACM/IEEE Symposium on Logic in Computer Science (LICS)We focus on (partial) functions that map input strings to a monoid such as the set of integers with addition and the set of output strings with concatenation. The notion of regularity for such functions has been defined using two-way finite-state ...
DReX: A Declarative Language for Efficiently Evaluating Regular String Transformations
POPL '15: Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming LanguagesWe present DReX, a declarative language that can express all regular string-to string transformations, and can still be efficiently evaluated. The class of regular string transformations has a robust theoretical foundation including multiple ...







Comments