skip to main content
10.1145/1265530.1265570acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

On the complexity of managing probabilistic XML data

Published:11 June 2007Publication History

ABSTRACT

In [3], we introduced a framework for querying and updating probabilistic information over unordered labeled trees, the probabilistic tree model. The data model is based on trees where nodes are annotated with conjunctions of probabilistic event variables. We briefly described an implementation and scenarios of usage. We develop here a mathematical foundation for this model. In particular, we present complexity results. We identify a very large class of queries for which simple variations of querying and updating algorithms from [3] compute the correct answer. A main contribution is a full complexity analysis of queries and updates. We also exhibit a decision procedure for the equivalence of probabilistic trees and prove it is in co-RP. Furthermore, we study the issue of removing less probable possible worlds, and that of validating a probabilistic tree against a DTD. We show that these two problems are intractable in the most general case.

References

  1. S. Abiteboul, P. Kanellakis, and G. Grahne. On the representation and querying of sets of possible worlds. Theoretical Computer Science, 78(1):158--187, 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. Technical Report 435, GEMO, Inria Futurs, Orsay, France, Dec. 2005.Google ScholarGoogle Scholar
  3. S. Abiteboul and P. Senellart. Querying and updating probabilistic information in XML. In Extending DataBase Technology, Munich, Germany, Mar. 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. V. Aho, J. E. Hopcroft, and J. D. Ullman. The Design and Analysis of Computer Algorithms. Addison-Wesley, Reading, USA, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. Barbará, H. Garcia-Molina, and D. Porter. The management of probabilistic data. IEEE Transactions on Knowledge and Data Engineering, 4(5):487--502, 1992. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. Cavallo and M. Pittarelli. The theory of probabilistic databases. In Very Large Data Bases, 1987. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. N. N. Dalvi and D. Suciu. Efficient query evaluation on probabilistic databases. In Very Large Data Bases, Hong Kong, China, Sept. 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. de Rougemont. The reliability of queries. In Principles Of Database Systems, San Jose, United States, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Dekhtyar, J. Goldsmith, and S. R. Hawkes. Semistructured probabilistic databases. In Statistical and Scientific Database Management, Tokyo, Japan, July 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Fuhr and T. Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. E. Hung, L. Getoor, and V. S. Subrahmanian. PXML: A probabilistic semistructured data model and algebra. In International Conference on Data Engineering, Bangalore, India, Mar. 2003.Google ScholarGoogle ScholarCross RefCross Ref
  12. T. ImieliDski and W. Lipski. Incomplete information in relational databases. J. ACM, 31(4):761--791, 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. E. Knuth. The Art of Computer Programming, volume 1. Addison-Wesley, Boston, USA, third edition, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Nierman and H. V. Jagadish. ProTDB: Probabilistic data in XML. In Very Large Data Bases, Hong Kong, China, Aug. 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Otter. The number of trees. Annals of Mathematics, 49(3):583--599, July 1948.Google ScholarGoogle ScholarCross RefCross Ref
  16. C. H. Papadimitriou. Computational Complexity. Addison Wesley Pub. Co., Reading, USA, 1994.Google ScholarGoogle Scholar
  17. J. T. Schwartz. Fast probabilistic algorithms for verification of polynomial identities. Journal of the ACM, 27(4):701--717, 1980. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. van Keulen, A. de Keijzer, and W. Alink. A probabilistic XML approach to data integration. In International Conference on Data Engineering, Apr. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Widom. Trio: A system for integrated management of data, accuracy, and lineage. In Biennal Conference on Innovative Data Systems Research, Pacific Grove, USA, Jan. 2005.Google ScholarGoogle Scholar
  20. R. Zippel. Probabilistic algorithms for sparse polynomials. In International Symposium on Symbolic and Algebraic Computation, Marseille, France, 1979. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. On the complexity of managing probabilistic XML data

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        PODS '07: Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
        June 2007
        328 pages
        ISBN:9781595936851
        DOI:10.1145/1265530

        Copyright © 2007 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 11 June 2007

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • Article

        Acceptance Rates

        Overall Acceptance Rate476of1,835submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!