Abstract
Regular expression patterns provide a natural, declarative way to express constraints on semistructured data and to extract relevant information from it. Indeed, it is a core feature of the programming language Perl, surfaces in various UNIX tools such as sed and awk, and has recently been proposed in the context of the XML programming language XDuce. Since regular expressions can be ambiguous in general, different disambiguation policies have been proposed to get a unique matching strategy. We formally define the matching semantics under both (1) the POSIX, and (2) the first and longest match disambiguation strategies. We show that the generally accepted method of defining the longest match in terms of the first match and recursion does not conform to the natural notion of longest match. We continue by solving the type inference problem for both disambiguation strategies, which consists of calculating the set of all subparts of input values a subexpression can match under the given policy.
- Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. Int. J. Dig. Lib. 1, 1, 68--88.Google Scholar
- Baader, F. and Nipkow, T. 1998. Term Rewriting and All That, Section 2.3. Cambridge University Press, Cambridge, U.K. Google Scholar
- Boag, S., Chamberlin, D., Fernández, M. F., Florescu, D., Robie, J., and Siméon, J. 2005. XQuery 1.0: An XML Query Language. W3C working draft. World Wide Web Consortium. Go online to www.w3.org.Google Scholar
- Book, R., Even, S., Greibach, S., and Ott, G. 1971. Ambiguity in graphs and expressions. IEEE Trans. Comput. 20, 2, 149--153.Google Scholar
- Brüggemann-Klein, A., Murata, M., and Wood, D. 2001. Regular tree and regular hedge languages over unranked alphabets. Unpublished manuscript, version 1.Google Scholar
- Buneman, P., Fernandez, M. F., and Suciu, D. 2000. UnQL: A query language and algebra for semistructured data based on structural recursion. VLDB J. 9, 1, 76--110. Google Scholar
- Clark, J. and Makoto, M. 2001. RELAX NG Specification. Organization for the Advancement of Structured Information Standards. Go online to www.oasis-open.org.Google Scholar
- Davidson, A., Fuchs, M., Hedin, M., Jain, M., Koistinen, J., Lloyd, C., Maloney, M., and Schwarzhof, K. 1999. Schema for object-oriented XML 2.0. Tech. rep., Veo Systems Inc. (Now part of Perfect Commerce. Go online to www.perfect.com.)Google Scholar
- Dougherty, D. and Robbins, A. 1996. Sed and Awk. O'Reilly, Sebastopol, CA. Google Scholar
- Elgaard, J., Klarlund, N., and Møller, A. 1998. Mona 1.x: New techniques for WS1S and WS2S. In Computer Aided Verification, CAV '98, Proceedings. Lecture Notes in Computer Science, vol. 1427. Springer-Verlag, Berlin, Germany. Google Scholar
- Frisch, A. 2004. Regular tree language recognition with static information. In Exploring New Frontiers of Theoretical Informatics, IFIP 18th World Computer Congress, TCS 3rd International Conference on Theoretical Computer Science. Kluwer, Dordrecht, The Netherlands, 661--674.Google Scholar
- Frisch, A. and Cardelli, L. 2004. Greedy regular expression matching. In Automata, Languages and Programming: ICALP 2004. Proceedings. Lecture Notes in Computer Science, vol. 3142. Springer-Verlag, Berlin, Germany, 618--629.Google Scholar
- Frisch, A., Castagna, G., and Benzaken, V. 2002. Semantic subtyping. In Proceedings of the Seventeenth Annual IEEE Symposium on Logic in Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 137--146. Google Scholar
- Frisch, A., Castagna, G., and Benzaken, V. 2003. ℂDuce: An XML-centric general-purpose language. In Proceedings of the Eighth ACM SIGPLAN International Conference on Functional Programming. ACM Press, New York, NY, 51--63. Google Scholar
- Hopcroft, J. E. and Ullman, J. D. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Reading, MA. Google Scholar
- Hosoya, H. 2000. Regular expression types for XML. Ph.D. dissertation. University of Tokyo, Tokyo, Japan.Google Scholar
- Hosoya, H. 2003. Regular expression pattern matching---a simpler design. Tech. rep. 1397, RIMS, Kyoto University, Kyoto, Japan.Google Scholar
- Hosoya, H. and Pierce, B. C. 2002. Regular expression pattern matching for XML. J. Funct. Prog. 13, 6, 961--1004. Google Scholar
- Hosoya, H. and Pierce, B. C. 2003. XDuce: A statically typed XML processing language. ACM Trans. Internet Tech. 3, 2, 117--148. Google Scholar
- Hosoya, H., Vouillon, J., and Pierce, B. C. 2005. Regular expression types for XML. ACM Trans. Prog. Lang. Syst. 27, 1, 46--90. Google Scholar
- Institute of Electrical and Electronic Engineers. 1992. Portable operating system interface (POSIX). IEEE Std 1003.2. IEEE, Piscataway, NJ.Google Scholar
- Klarlund, N. and Møller, A. 2001. MONA Version 1.4 User Manual. Basic Research in Computer Science (BRICS) Notes Series NS-01-1. Department of Computer Science, University of Aarhus, Aarhus, Denmark.Google Scholar
- Laurikari, V. 2000. NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In Symposium on String Processing and Information Retrieval (SPIRE). Google Scholar
- Laurikari, V. 2001. Efficient submatch addressing for regular expressions. M.S. thesis. Helsinki University of Technology, Helsinki, Finland.Google Scholar
- Levin, M. Y. 2003. Compiling regular patterns. In Proceedings of the Eighth ACM SIGPLAN International Conference on Functional Programming. ACM Press, New York, NY, 65--77. Google Scholar
- Møller, A. 2003. Document structure description 2.0. Tech. rep. Basic Research in Computer Science (BRICS). Department of Computer Science, University of Aarhus, Aarhus, Denmark.Google Scholar
- Murata, M. 1999. Hedge automata: a formal model for XML schemata. Available online at http://www.geocities.com/murata_makoto.Google Scholar
- Murata, M. 2001. Extended path expressions for XML. In Proceedings of the Twentieth ACM Symposium on Principles of Database Systems. ACM Press, New York, NY, 126--137. Google Scholar
- Murata, M., Lee, D., and Mani, M. 2001. Taxonomy of XML schema languages using formal language theory. In Proceedings of the Conference on Extreme Markup Languages (Montreal, P.Q., Canada).Google Scholar
- Neumann, A. and Seidl, H. 1998. Locating matches of tree patterns in forests. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, vol. 1530. Springer-Verlag, Berlin, Germany, 134--145. Google Scholar
- Neven, F. 2002. Automata theory for XML researchers. ACM SIGMOD Rec. 31, 3, 39--46. Google Scholar
- Neven, F. and Schwentick, T. 2001. Automata- and logic-based pattern languages for tree-structured data. In Semantics in Databases. Lecture Notes in Computer Science, vol. 2582. Springer. Berlin, Germany, 160--178. Google Scholar
- Sterling, L. and Shapiro, E. 1994. The Art of Prolog (second edition). MIT Press, Cambridge, MA.Google Scholar
- Suciu, D. 2002. The XML typechecking problem. ACM SIGMOD Rec. 31, 1, 89--96. Google Scholar
- Sumii, E. May 2003. Personal communication.Google Scholar
- Tabuchi, N., Sumii, E., and Yonezawa, A. 2002. Regular expression types for strings in a text processing language (extended abstract). In Workshop on Types in Programming (TIP'02). Go online to http://web.yl.is.s.u-tokyo.ac.jp/~tabee/xperl/.Google Scholar
- Thompson, H. S., Beech, D., Maloney, M., and Mendelsohn, N. 2001. XML Schema. W3C Recommendation. World Wide Web Consortium. Go online to www.w3.org.Google Scholar
- Ullman, J. D. 1998. Elements of ML Programming, 2nd ed. Prentice Hall, Englewood Cliffs, NJ. Google Scholar
- Vianu, V. 2001. A Web odyssey: From Codd to XML. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 1--15. Google Scholar
- Wall, L., Christiansen, T., and Orwant, J. 2000. Programming Perl, 3rd ed. O'Reilly & Associates, Sebastopol, CA. Google Scholar
- Yergeau, F., Bray, T., Paoli, J., Sperberg-McQueen, C. M., and Maler, E. 2004. Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation. World Wide Web Consortium. Go online to www.w3.org.Google Scholar
Index Terms
Type inference for unique pattern matching
Recommendations
Optimising First-Class Pattern Matching
SLE 2022: Proceedings of the 15th ACM SIGPLAN International Conference on Software Language EngineeringPattern matching is a high-level notation for programs to analyse the shape of data, and can be optimised to efficient low-level instructions. The Stratego language uses first-class pattern matching, a powerful form of pattern matching that ...
Open pattern matching for C++
GPCE '13Pattern matching is an abstraction mechanism that can greatly simplify source code. We present functional-style pattern matching for C++ implemented as a library, called Mach71. All the patterns are user-definable, can be stored in variables, passed ...
Suffix array for multi-pattern matching with variable length wildcards
Approximate multi-pattern matching is an important issue that is widely and frequently utilized, when the pattern contains variable-length wildcards. In this paper, two suffix array-based algorithms have been proposed to solve this problem. Suffix ...






Comments