ABSTRACT
In [3], we introduced a framework for querying and updating probabilistic information over unordered labeled trees, the probabilistic tree model. The data model is based on trees where nodes are annotated with conjunctions of probabilistic event variables. We briefly described an implementation and scenarios of usage. We develop here a mathematical foundation for this model. In particular, we present complexity results. We identify a very large class of queries for which simple variations of querying and updating algorithms from [3] compute the correct answer. A main contribution is a full complexity analysis of queries and updates. We also exhibit a decision procedure for the equivalence of probabilistic trees and prove it is in co-RP. Furthermore, we study the issue of removing less probable possible worlds, and that of validating a probabilistic tree against a DTD. We show that these two problems are intractable in the most general case.
- S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds. Theoretical Computer Science, 78(1):158--187, 1991. Google Scholar
Digital Library
- S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. Technical Report 435, GEMO, Inria Futurs, Orsay, France, Dec. 2005.Google Scholar
- S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In Extending DataBase Technology, Munich, Germany, Mar. 2006. Google Scholar
Digital Library
- A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, USA, 1974. Google Scholar
Digital Library
- D. Barbará, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 4(5):487--502, 1992. Google Scholar
Digital Library
- R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In Very Large Data Bases, 1987. Google Scholar
Digital Library
- N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In Very Large Data Bases, Hong Kong, China, Sept. 2004. Google Scholar
Digital Library
- M. de Rougemont. The reliability of queries. In Principles Of Database Systems, San Jose, United States, 1995. Google Scholar
Digital Library
- A. Dekhtyar, J. Goldsmith, and S. R. Hawkes. Semistructured probabilistic databases. In Statistical and Scientific Database Management, Tokyo, Japan, July 2001. Google Scholar
Digital Library
- N. Fuhr and T. Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1), 1997. Google Scholar
Digital Library
- E. Hung, L. Getoor, and V. S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In International Conference on Data Engineering, Bangalore, India, Mar. 2003.Google Scholar
Cross Ref
- T. ImieliDski and W. Lipski. Incomplete information in relational databases. J. ACM, 31(4):761--791, 1984. Google Scholar
Digital Library
- D. E. Knuth. The Art of Computer Programming, volume 1. Addison-Wesley, Boston, USA, third edition, 1997. Google Scholar
Digital Library
- A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In Very Large Data Bases, Hong Kong, China, Aug. 2002. Google Scholar
Digital Library
- R. Otter. The number of trees. Annals of Mathematics, 49(3):583--599, July 1948.Google Scholar
Cross Ref
- C. H. Papadimitriou. Computational Complexity. Addison Wesley Pub. Co., Reading, USA, 1994.Google Scholar
- J. T. Schwartz. Fast probabilistic algorithms for verification of polynomial identities. Journal of the ACM, 27(4):701--717, 1980. Google Scholar
Digital Library
- M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In International Conference on Data Engineering, Apr. 2005. Google Scholar
Digital Library
- J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In Biennal Conference on Innovative Data Systems Research, Pacific Grove, USA, Jan. 2005.Google Scholar
- R. Zippel. Probabilistic algorithms for sparse polynomials. In International Symposium on Symbolic and Algebraic Computation, Marseille, France, 1979. Google Scholar
Digital Library
Index Terms
On the complexity of managing probabilistic XML data
Recommendations
Updating probabilistic XML
EDBT '10: Proceedings of the 2010 EDBT/ICDT WorkshopsWe investigate the complexity of performing updates on probabilistic XML data for various classes of probabilistic XML documents of different succinctness. We consider two elementary kinds of updates, insertions and deletions, that are defined with the ...
Aggregate queries for discrete and continuous probabilistic XML
ICDT '10: Proceedings of the 13th International Conference on Database TheorySources of data uncertainty and imprecision are numerous. A way to handle this uncertainty is to associate probabilistic annotations to data. Many such probabilistic database models have been proposed, both in the relational and in the semi-structured ...
On the expressiveness of probabilistic XML models
Various known models of probabilistic XML can be represented as instantiations of the abstract notion of p-documents. In addition to ordinary nodes, p-documents have distributional nodes that specify the possible worlds and their probabilistic ...






Comments