skip to main content
10.1145/1559795.1559831acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Running tree automata on probabilistic XML

Published:29 June 2009Publication History

ABSTRACT

Tree automata (specifically, bottom-up and unranked) form a powerful tool for querying and maintaining validity of XML documents. XML with uncertain data can be modeled as a probability space of labeled trees, and that space is often represented by a tree with distributional nodes. This paper investigates the problem of evaluating a tree automaton over such a representation, where the goal is to compute the probability that the automaton accepts a random possible world. This problem is generally intractable, but for the case where the tree automaton is deterministic (and its transitions are defined by deterministic string automata), an efficient algorithm is presented. The paper discusses the applications of this result, including the ability to sample and to evaluate queries (e.g., in monadic second-order logic) while requiring a-priori conformance to a schema (e.g., DTD). XML schemas also include attribute constraints, and the complexity of key, foreign-key and inclusion constraints are studied in the context of probabilistic XML. Finally, the paper discusses the generalization of the results to an extended data model, where distributional nodes can repeatedly sample the same subtree, thereby adding another exponent to the size of the probability space.

References

  1. S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, pages 1059--1068, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Blum, M. Shub, and S. Smale. On a theory of computation and complexity over the real numbers: NP completeness, recursive functions, and universal machines. Bull. A.M.S., 21:1--46, 1989.Google ScholarGoogle ScholarCross RefCross Ref
  3. A. Brüggemann-Klein, M. Murata, and D. Wood. Regular tree and regular hedge languages over unranked alphabets: Version 1, april 3, 2001. Technical Report HKUST-TCSC-2001-0, The Hongkong University of Science and Technology, 2001.Google ScholarGoogle Scholar
  4. N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD, pages 310--321. ACM, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Cohen, B. Kimelfeld, and Y. Sagiv. Incorporating constraints in probabilistic XML. In PODS, pages 109--118, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. N.N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pages 864--875, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N.N. Dalvi and D. Suciu. The dichotomy of conjunctive queries on probabilistic structures. In PODS, pages 293--302. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. Doner. Tree acceptors and some of their applications. J. Comput. Syst. Sci., 4(5):406--451, 1970.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R.G. Downey and M.R. Fellows. Fixed-parameter tractability and completeness I: Basic results. SIAM Journal on Computing, 24(4):873--921, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R.G. Downey and M.R. Fellows. Parameterized Complexity. Monographs in Computer Science. Springer, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. W. Fan and L. Libkin. On XML integrity constraints in the presence of DTDs. J. ACM, 49(3):368--406, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. M. Frick and M. Grohe. The complexity of first-order and monadic second-order logic revisited. In LICS, pages 215--224. IEEE Computer Society, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. E. Grädel, Y. Gurevich, and C. Hirsch. The complexity of query reliability. In PODS, pages 227--234. ACM, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J.E. Hopcroft, R. Motwani, and J.D. Ullman. Introduction to Automata Theory, Languages and Computation (3rd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. E. Hung, L. Getoor, and V.S. Subrahmanian. Probabilistic interval XML. In ICDT, pages 361--377, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. E. Hung, L. Getoor, and V.S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In ICDE, pages 467--478, 2003.Google ScholarGoogle ScholarCross RefCross Ref
  17. H.B. Hunt III, M.V. Marathe, V. Radhakrishnan, and R.E. Stearns. The complexity of planar counting problems. SIAM J. Comput., 27(4):1142--1167, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query efficiency in probabilistic XML models. In SIGMOD Conference, pages 701--714, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. B. Kimelfeld and Y. Sagiv. Matching twigs in probabilistic XML. In VLDB, pages 27--38. ACM, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. C. Koch. Approximating predicates and expressive queries on probabilistic databases. In PODS, pages 99--108. ACM, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T. Li, Q. Shao, and Y. Chen. PEPX: a query-friendly probabilistic XML database. In CIKM, pages 848--849. ACM, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. W. Martens and J. Niehren. On the minimization of xml schemas and tree automata for unranked trees. J. Comput. Syst. Sci., 73(4):550--583, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. A.R. Meyer. Weak monadic second-order theory of successor is not elementary recursive. Logic Colloquim, 453:132--154, 1975.Google ScholarGoogle ScholarCross RefCross Ref
  24. F. Neven and T. Schwentick. Expressive and efficient pattern languages for tree-structured data. In PODS, pages 145--156. ACM, 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. F. Neven and T. Schwentick. Query automata over finite trees. Theor. Comput. Sci., 275(1-2):633--674, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Nierman and H.V. Jagadish. ProTDB: Probabilistic data in XML. In VLDB, pages 646--657, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Y. Papakonstantinou and V. Vianu. Incremental validation of xml documents. In ICDT, pages 47--63. Springer, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. J.S. Provan and M.O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal on Computing, 12(4):777--788, 1983.Google ScholarGoogle ScholarCross RefCross Ref
  29. P. Senellart and S. Abiteboul. On the complexity of managing probabilistic XML data. In PODS, pages 283--292, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. J.W. Thatcher and J.B. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory, 2(1):57--81, 1968.Google ScholarGoogle ScholarCross RefCross Ref
  31. S. Toda and M. Ogiwara. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 21(2), 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In ICDE, pages 459--470. IEEE Computer Society, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. M.Y. Vardi. The complexity of relational query languages (extended abstract). In STOC, pages 137--146. ACM, 1982. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Running tree automata on probabilistic XML

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in
            • Published in

              cover image ACM Conferences
              PODS '09: Proceedings of the twenty-eighth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
              June 2009
              298 pages
              ISBN:9781605585536
              DOI:10.1145/1559795
              • General Chair:
              • Jan Paredaens,
              • Program Chair:
              • Jianwen Su

              Copyright © 2009 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 29 June 2009

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article

              Acceptance Rates

              Overall Acceptance Rate476of1,835submissions,26%

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!