skip to main content
article
Free Access

Type inference for unique pattern matching

Authors Info & Claims
Published:01 May 2006Publication History
Skip Abstract Section

Abstract

Regular expression patterns provide a natural, declarative way to express constraints on semistructured data and to extract relevant information from it. Indeed, it is a core feature of the programming language Perl, surfaces in various UNIX tools such as sed and awk, and has recently been proposed in the context of the XML programming language XDuce. Since regular expressions can be ambiguous in general, different disambiguation policies have been proposed to get a unique matching strategy. We formally define the matching semantics under both (1) the POSIX, and (2) the first and longest match disambiguation strategies. We show that the generally accepted method of defining the longest match in terms of the first match and recursion does not conform to the natural notion of longest match. We continue by solving the type inference problem for both disambiguation strategies, which consists of calculating the set of all subparts of input values a subexpression can match under the given policy.

References

  1. Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. Int. J. Dig. Lib. 1, 1, 68--88.Google ScholarGoogle Scholar
  2. Baader, F. and Nipkow, T. 1998. Term Rewriting and All That, Section 2.3. Cambridge University Press, Cambridge, U.K. Google ScholarGoogle Scholar
  3. Boag, S., Chamberlin, D., Fernández, M. F., Florescu, D., Robie, J., and Siméon, J. 2005. XQuery 1.0: An XML Query Language. W3C working draft. World Wide Web Consortium. Go online to www.w3.org.Google ScholarGoogle Scholar
  4. Book, R., Even, S., Greibach, S., and Ott, G. 1971. Ambiguity in graphs and expressions. IEEE Trans. Comput. 20, 2, 149--153.Google ScholarGoogle Scholar
  5. Brüggemann-Klein, A., Murata, M., and Wood, D. 2001. Regular tree and regular hedge languages over unranked alphabets. Unpublished manuscript, version 1.Google ScholarGoogle Scholar
  6. Buneman, P., Fernandez, M. F., and Suciu, D. 2000. UnQL: A query language and algebra for semistructured data based on structural recursion. VLDB J. 9, 1, 76--110. Google ScholarGoogle Scholar
  7. Clark, J. and Makoto, M. 2001. RELAX NG Specification. Organization for the Advancement of Structured Information Standards. Go online to www.oasis-open.org.Google ScholarGoogle Scholar
  8. Davidson, A., Fuchs, M., Hedin, M., Jain, M., Koistinen, J., Lloyd, C., Maloney, M., and Schwarzhof, K. 1999. Schema for object-oriented XML 2.0. Tech. rep., Veo Systems Inc. (Now part of Perfect Commerce. Go online to www.perfect.com.)Google ScholarGoogle Scholar
  9. Dougherty, D. and Robbins, A. 1996. Sed and Awk. O'Reilly, Sebastopol, CA. Google ScholarGoogle Scholar
  10. Elgaard, J., Klarlund, N., and Møller, A. 1998. Mona 1.x: New techniques for WS1S and WS2S. In Computer Aided Verification, CAV '98, Proceedings. Lecture Notes in Computer Science, vol. 1427. Springer-Verlag, Berlin, Germany. Google ScholarGoogle Scholar
  11. Frisch, A. 2004. Regular tree language recognition with static information. In Exploring New Frontiers of Theoretical Informatics, IFIP 18th World Computer Congress, TCS 3rd International Conference on Theoretical Computer Science. Kluwer, Dordrecht, The Netherlands, 661--674.Google ScholarGoogle Scholar
  12. Frisch, A. and Cardelli, L. 2004. Greedy regular expression matching. In Automata, Languages and Programming: ICALP 2004. Proceedings. Lecture Notes in Computer Science, vol. 3142. Springer-Verlag, Berlin, Germany, 618--629.Google ScholarGoogle Scholar
  13. Frisch, A., Castagna, G., and Benzaken, V. 2002. Semantic subtyping. In Proceedings of the Seventeenth Annual IEEE Symposium on Logic in Computer Science. IEEE Computer Society Press, Los Alamitos, CA, 137--146. Google ScholarGoogle Scholar
  14. Frisch, A., Castagna, G., and Benzaken, V. 2003. ℂDuce: An XML-centric general-purpose language. In Proceedings of the Eighth ACM SIGPLAN International Conference on Functional Programming. ACM Press, New York, NY, 51--63. Google ScholarGoogle Scholar
  15. Hopcroft, J. E. and Ullman, J. D. 1979. Introduction to Automata Theory, Languages and Computation. Addison-Wesley, Reading, MA. Google ScholarGoogle Scholar
  16. Hosoya, H. 2000. Regular expression types for XML. Ph.D. dissertation. University of Tokyo, Tokyo, Japan.Google ScholarGoogle Scholar
  17. Hosoya, H. 2003. Regular expression pattern matching---a simpler design. Tech. rep. 1397, RIMS, Kyoto University, Kyoto, Japan.Google ScholarGoogle Scholar
  18. Hosoya, H. and Pierce, B. C. 2002. Regular expression pattern matching for XML. J. Funct. Prog. 13, 6, 961--1004. Google ScholarGoogle Scholar
  19. Hosoya, H. and Pierce, B. C. 2003. XDuce: A statically typed XML processing language. ACM Trans. Internet Tech. 3, 2, 117--148. Google ScholarGoogle Scholar
  20. Hosoya, H., Vouillon, J., and Pierce, B. C. 2005. Regular expression types for XML. ACM Trans. Prog. Lang. Syst. 27, 1, 46--90. Google ScholarGoogle Scholar
  21. Institute of Electrical and Electronic Engineers. 1992. Portable operating system interface (POSIX). IEEE Std 1003.2. IEEE, Piscataway, NJ.Google ScholarGoogle Scholar
  22. Klarlund, N. and Møller, A. 2001. MONA Version 1.4 User Manual. Basic Research in Computer Science (BRICS) Notes Series NS-01-1. Department of Computer Science, University of Aarhus, Aarhus, Denmark.Google ScholarGoogle Scholar
  23. Laurikari, V. 2000. NFAs with tagged transitions, their conversion to deterministic automata and application to regular expressions. In Symposium on String Processing and Information Retrieval (SPIRE). Google ScholarGoogle Scholar
  24. Laurikari, V. 2001. Efficient submatch addressing for regular expressions. M.S. thesis. Helsinki University of Technology, Helsinki, Finland.Google ScholarGoogle Scholar
  25. Levin, M. Y. 2003. Compiling regular patterns. In Proceedings of the Eighth ACM SIGPLAN International Conference on Functional Programming. ACM Press, New York, NY, 65--77. Google ScholarGoogle Scholar
  26. Møller, A. 2003. Document structure description 2.0. Tech. rep. Basic Research in Computer Science (BRICS). Department of Computer Science, University of Aarhus, Aarhus, Denmark.Google ScholarGoogle Scholar
  27. Murata, M. 1999. Hedge automata: a formal model for XML schemata. Available online at http://www.geocities.com/murata_makoto.Google ScholarGoogle Scholar
  28. Murata, M. 2001. Extended path expressions for XML. In Proceedings of the Twentieth ACM Symposium on Principles of Database Systems. ACM Press, New York, NY, 126--137. Google ScholarGoogle Scholar
  29. Murata, M., Lee, D., and Mani, M. 2001. Taxonomy of XML schema languages using formal language theory. In Proceedings of the Conference on Extreme Markup Languages (Montreal, P.Q., Canada).Google ScholarGoogle Scholar
  30. Neumann, A. and Seidl, H. 1998. Locating matches of tree patterns in forests. In Foundations of Software Technology and Theoretical Computer Science. Lecture Notes in Computer Science, vol. 1530. Springer-Verlag, Berlin, Germany, 134--145. Google ScholarGoogle Scholar
  31. Neven, F. 2002. Automata theory for XML researchers. ACM SIGMOD Rec. 31, 3, 39--46. Google ScholarGoogle Scholar
  32. Neven, F. and Schwentick, T. 2001. Automata- and logic-based pattern languages for tree-structured data. In Semantics in Databases. Lecture Notes in Computer Science, vol. 2582. Springer. Berlin, Germany, 160--178. Google ScholarGoogle Scholar
  33. Sterling, L. and Shapiro, E. 1994. The Art of Prolog (second edition). MIT Press, Cambridge, MA.Google ScholarGoogle Scholar
  34. Suciu, D. 2002. The XML typechecking problem. ACM SIGMOD Rec. 31, 1, 89--96. Google ScholarGoogle Scholar
  35. Sumii, E. May 2003. Personal communication.Google ScholarGoogle Scholar
  36. Tabuchi, N., Sumii, E., and Yonezawa, A. 2002. Regular expression types for strings in a text processing language (extended abstract). In Workshop on Types in Programming (TIP'02). Go online to http://web.yl.is.s.u-tokyo.ac.jp/~tabee/xperl/.Google ScholarGoogle Scholar
  37. Thompson, H. S., Beech, D., Maloney, M., and Mendelsohn, N. 2001. XML Schema. W3C Recommendation. World Wide Web Consortium. Go online to www.w3.org.Google ScholarGoogle Scholar
  38. Ullman, J. D. 1998. Elements of ML Programming, 2nd ed. Prentice Hall, Englewood Cliffs, NJ. Google ScholarGoogle Scholar
  39. Vianu, V. 2001. A Web odyssey: From Codd to XML. In Proceedings of the Twentieth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems. ACM Press, New York, NY, 1--15. Google ScholarGoogle Scholar
  40. Wall, L., Christiansen, T., and Orwant, J. 2000. Programming Perl, 3rd ed. O'Reilly & Associates, Sebastopol, CA. Google ScholarGoogle Scholar
  41. Yergeau, F., Bray, T., Paoli, J., Sperberg-McQueen, C. M., and Maler, E. 2004. Extensible Markup Language (XML) 1.0 (Third Edition). W3C Recommendation. World Wide Web Consortium. Go online to www.w3.org.Google ScholarGoogle Scholar

Index Terms

  1. Type inference for unique pattern matching

              Recommendations

              Comments

              Login options

              Check if you have access through your login credentials or your institution to get full access on this article.

              Sign in

              Full Access

              PDF Format

              View or Download as a PDF file.

              PDF

              eReader

              View online with eReader.

              eReader
              About Cookies On This Site

              We use cookies to ensure that we give you the best experience on our website.

              Learn more

              Got it!