ABSTRACT
Tree automata (specifically, bottom-up and unranked) form a powerful tool for querying and maintaining validity of XML documents. XML with uncertain data can be modeled as a probability space of labeled trees, and that space is often represented by a tree with distributional nodes. This paper investigates the problem of evaluating a tree automaton over such a representation, where the goal is to compute the probability that the automaton accepts a random possible world. This problem is generally intractable, but for the case where the tree automaton is deterministic (and its transitions are defined by deterministic string automata), an efficient algorithm is presented. The paper discusses the applications of this result, including the ability to sample and to evaluate queries (e.g., in monadic second-order logic) while requiring a-priori conformance to a schema (e.g., DTD). XML schemas also include attribute constraints, and the complexity of key, foreign-key and inclusion constraints are studied in the context of probabilistic XML. Finally, the paper discusses the generalization of the results to an extended data model, where distributional nodes can repeatedly sample the same subtree, thereby adding another exponent to the size of the probability space.
- S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In EDBT, pages 1059--1068, 2006. Google Scholar
Digital Library
- L. Blum, M. Shub, and S. Smale. On a theory of computation and complexity over the real numbers: NP completeness, recursive functions, and universal machines. Bull. A.M.S., 21:1--46, 1989.Google Scholar
Cross Ref
- A. Brüggemann-Klein, M. Murata, and D. Wood. Regular tree and regular hedge languages over unranked alphabets: Version 1, april 3, 2001. Technical Report HKUST-TCSC-2001-0, The Hongkong University of Science and Technology, 2001.Google Scholar
- N. Bruno, N. Koudas, and D. Srivastava. Holistic twig joins: optimal XML pattern matching. In SIGMOD, pages 310--321. ACM, 2002. Google Scholar
Digital Library
- S. Cohen, B. Kimelfeld, and Y. Sagiv. Incorporating constraints in probabilistic XML. In PODS, pages 109--118, 2008. Google Scholar
Digital Library
- N.N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In VLDB, pages 864--875, 2004. Google Scholar
Digital Library
- N.N. Dalvi and D. Suciu. The dichotomy of conjunctive queries on probabilistic structures. In PODS, pages 293--302. ACM, 2007. Google Scholar
Digital Library
- J. Doner. Tree acceptors and some of their applications. J. Comput. Syst. Sci., 4(5):406--451, 1970.Google Scholar
Digital Library
- R.G. Downey and M.R. Fellows. Fixed-parameter tractability and completeness I: Basic results. SIAM Journal on Computing, 24(4):873--921, 1995. Google Scholar
Digital Library
- R.G. Downey and M.R. Fellows. Parameterized Complexity. Monographs in Computer Science. Springer, 1999. Google Scholar
Digital Library
- W. Fan and L. Libkin. On XML integrity constraints in the presence of DTDs. J. ACM, 49(3):368--406, 2002. Google Scholar
Digital Library
- M. Frick and M. Grohe. The complexity of first-order and monadic second-order logic revisited. In LICS, pages 215--224. IEEE Computer Society, 2002. Google Scholar
Digital Library
- E. Grädel, Y. Gurevich, and C. Hirsch. The complexity of query reliability. In PODS, pages 227--234. ACM, 1998. Google Scholar
Digital Library
- J.E. Hopcroft, R. Motwani, and J.D. Ullman. Introduction to Automata Theory, Languages and Computation (3rd Edition). Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2006. Google Scholar
Digital Library
- E. Hung, L. Getoor, and V.S. Subrahmanian. Probabilistic interval XML. In ICDT, pages 361--377, 2003. Google Scholar
Digital Library
- E. Hung, L. Getoor, and V.S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In ICDE, pages 467--478, 2003.Google Scholar
Cross Ref
- H.B. Hunt III, M.V. Marathe, V. Radhakrishnan, and R.E. Stearns. The complexity of planar counting problems. SIAM J. Comput., 27(4):1142--1167, 1998. Google Scholar
Digital Library
- B. Kimelfeld, Y. Kosharovsky, and Y. Sagiv. Query efficiency in probabilistic XML models. In SIGMOD Conference, pages 701--714, 2008. Google Scholar
Digital Library
- B. Kimelfeld and Y. Sagiv. Matching twigs in probabilistic XML. In VLDB, pages 27--38. ACM, 2007. Google Scholar
Digital Library
- C. Koch. Approximating predicates and expressive queries on probabilistic databases. In PODS, pages 99--108. ACM, 2008. Google Scholar
Digital Library
- T. Li, Q. Shao, and Y. Chen. PEPX: a query-friendly probabilistic XML database. In CIKM, pages 848--849. ACM, 2006. Google Scholar
Digital Library
- W. Martens and J. Niehren. On the minimization of xml schemas and tree automata for unranked trees. J. Comput. Syst. Sci., 73(4):550--583, 2007. Google Scholar
Digital Library
- A.R. Meyer. Weak monadic second-order theory of successor is not elementary recursive. Logic Colloquim, 453:132--154, 1975.Google Scholar
Cross Ref
- F. Neven and T. Schwentick. Expressive and efficient pattern languages for tree-structured data. In PODS, pages 145--156. ACM, 2000. Google Scholar
Digital Library
- F. Neven and T. Schwentick. Query automata over finite trees. Theor. Comput. Sci., 275(1-2):633--674, 2002. Google Scholar
Digital Library
- A. Nierman and H.V. Jagadish. ProTDB: Probabilistic data in XML. In VLDB, pages 646--657, 2002. Google Scholar
Digital Library
- Y. Papakonstantinou and V. Vianu. Incremental validation of xml documents. In ICDT, pages 47--63. Springer, 2003. Google Scholar
Digital Library
- J.S. Provan and M.O. Ball. The complexity of counting cuts and of computing the probability that a graph is connected. SIAM Journal on Computing, 12(4):777--788, 1983.Google Scholar
Cross Ref
- P. Senellart and S. Abiteboul. On the complexity of managing probabilistic XML data. In PODS, pages 283--292, 2007. Google Scholar
Digital Library
- J.W. Thatcher and J.B. Wright. Generalized finite automata theory with an application to a decision problem of second-order logic. Mathematical Systems Theory, 2(1):57--81, 1968.Google Scholar
Cross Ref
- S. Toda and M. Ogiwara. Counting classes are at least as hard as the polynomial-time hierarchy. SIAM Journal on Computing, 21(2), 1992. Google Scholar
Digital Library
- M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In ICDE, pages 459--470. IEEE Computer Society, 2005. Google Scholar
Digital Library
- M.Y. Vardi. The complexity of relational query languages (extended abstract). In STOC, pages 137--146. ACM, 1982. Google Scholar
Digital Library
Index Terms
Running tree automata on probabilistic XML
Recommendations
Using Regular Tree Automata as XML Schemas
ADL '00: Proceedings of the IEEE Advances in Digital Libraries 2000We address the problem of tight XML schemas and use regular tree automata to model XML data. We show that the tree automata model is more powerful that the XML DTDs and is closed under main algebraic operations. We introduce the XML query algebra based ...
Relating word and tree automata
LICS '96: Proceedings of the 11th Annual IEEE Symposium on Logic in Computer ScienceIn the automata-theoretic approach to verification, we translate specifications to automata. Complexity considerations motivate the distinction between different types of automata. Already in the 60's, it was known that deterministic Buchi word automata ...
Towards the preservation of functional dependency in XML data transformation
With the advent of XML as a data representation and exchange format over the web, a massive amount of data is being stored in XML. As the use of XML grows rapidly, the task of data transformation for integration purposes in XML is getting much ...






Comments