skip to main content
10.1145/1989284.1989302acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Provenance for aggregate queries

Published:13 June 2011Publication History

ABSTRACT

We study in this paper provenance information for queries with aggregation. Provenance information was studied in the context of various query languages that do not allow for aggregation, and recent work has suggested to capture provenance by annotating the different database tuples with elements of a commutative semiring and propagating the annotations through query evaluation. We show that aggregate queries pose novel challenges rendering this approach inapplicable. Consequently, we propose a new approach, where we annotate with provenance information not just tuples but also the individual values within tuples, using provenance to describe the values computation. We realize this approach in a concrete construction, first for "simple" queries where the aggregation operator is the last one applied, and then for arbitrary (positive) relational algebra queries with aggregation; the latter queries are shown to be more challenging in this context. Finally, we use aggregation to encode queries with difference, and study the semantics obtained for such queries on provenance annotated databases.

References

  1. S. Abiteboul, R. Hull, and V. Vianu. Foundations of Databases. Addison-Wesley, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. F. N. Afrati and A. Vasilakopoulos. Managing lineage and uncertainty under a data exchange setting. In SUM, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. Antova, C. Koch, and D. Olteanu. 10(106) worlds and beyond: efficient representation and processing of incomplete information. VLDB J., 18(5), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. O. Benjelloun, A.D. Sarma, A.Y. Halevy, M. Theobald, and J. Widom. Databases with uncertainty and lineage. VLDB J., 17, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. In ICDT, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. P. Buneman, J. Cheney, and S. Vansummeren. On the expressiveness of implicit provenance in query and update languages. ACM Trans. Database Syst., 33(4), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Buneman, S. Khanna, and W.C. Tan. Why and where: A characterization of data provenance. In ICDT, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Buneman, S. Naqvi, V. Tannen, and L. Wong. Principles of programming with complex objects and collection types. TCS, 149(1), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Cheney, L. Chiticariu, and W. C. Tan. Provenance in databases: Why, how, and where. Foundations and Trends in Databases, 1(4), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. J. Cheney, S. Chong, N. Foster, M. I. Seltzer, and S. Vansummeren. Provenance: a future history. In OOPSLA Companion, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. S. Cohen. Containment of aggregate queries. SIGMOD Record, 34(1), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Cohen, W. Nutt, and Y. Sagiv. Rewriting queries with arbitrary aggregation functions using views. ACM Trans. Database Syst., 31(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Y. Cui, J. Widom, and J.L. Wiener. Tracing the lineage of view data in a warehousing environment. ACM Transactions on Database Systems, 25(2), 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. N. N. Dalvi, C. Ré, and D. Suciu. Probabilistic databases: diamonds in the dirt. Commun. ACM, 52(7), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. J.N. Foster, T.J. Green, and V. Tannen. Annotated XML: queries and provenance. In PODS, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. Fuhr and T. Rölleke. A probabilistic relational algebra for the integration of information retrieval and database systems. ACM Trans. Inf. Syst., 15(1), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. F. Geerts and A. Poggi. On database query languages for k-relations. J. Applied Logic, 8(2), 2010.Google ScholarGoogle Scholar
  18. T.J. Green. Collaborative data sharing with mappings and provenance. PhD thesis, University of Pennsylvania, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. T.J. Green. Containment of conjunctive queries on annotated relations. In ICDT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. T.J. Green, Z. Ives, and V. Tannen. Reconcilable differences. In ICDT, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. T.J. Green, G. Karvounarakis, Z. Ives, and V. Tannen. Update exchange with mappings and provenance. In VLDB, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T.J. Green, G. Karvounarakis, and V. Tannen. Provenance semirings. In PODS, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. T.J. Green, G. Karvounarakis, N. E. Taylor, O. Biton, Z. Ives, and V. Tannen. Orchestra: facilitating collaborative data sharing. In Proc. of SIGMOD, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. A. Gupta, I.S. Mumick, and V.S. Subrahmanian. Maintaining views incrementally. In SIGMOD, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. Huang, T. Chen, A. Doan, and J. F. Naughton. On the provenance of non-answers to queries over extracted data. Proc. VLDB, 1, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. T. Imielinski and W. Lipski. Incomplete information in relational databases. J. ACM, 31(4), 1984. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Issam, F. Adrian, and S. Vladimiro. A formal model of provenance in distributed systems. In First workshop on on Theory and practice of provenance, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. G. Karvounarakis, Z. G. Ives, and V. Tannen. Querying data provenance. In SIGMOD Conference, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. C. Koch. Incremental query evaluation in a ring of databases. In PODS, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. C. Koch and D. Olteanu. Conditioning probabilistic databases. PVLDB, 1(1), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. N. Kwasnikowska and J. Van den Bussche. Mapping the nrc dataflow model to the open provenance model. In IPAW, pages 3--16, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Lechtenbörger, H. Shu, and G. Vossen. Aggregate queries over conditional tables. J. Intell. Inf. Syst., 19(3), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. S.K. Lellahi and V. Tannen. A calculus for collections and aggregates. In Cat. Theory and Comp. Science, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Li, B. Saha, and A. Deshpande. A unified approach to ranking in probabilistic databases. Proc. VLDB, 2, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. B. Liu, L. Chiticariu, V. Chu, H.V. Jagadish, and F. Reiss. Refining information extraction rules using data provenance. IEEE Data Eng. Bull., 33(3), 2010.Google ScholarGoogle Scholar
  36. A. Meliou, W. Gatterbauer, K. F. Moore, and D. Suciu. The complexity of causality and responsibility for query answers and non-answers. PVLDB, 4(1), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. C. Ré and D. Suciu. Efficient evaluation of having queries on a probabilistic database. In DBPL, 2007.Google ScholarGoogle Scholar
  38. S. Vansummeren and J. Cheney. Recording provenance for sql queries and updates. IEEE Data Eng. Bull., 30(4), 2007.Google ScholarGoogle Scholar
  39. W. Zhou, M. Sherr, T. Tao, X. Li, B. T. Loo, and Y. Mao. Efficient querying and maintenance of network provenance at internet-scale. In SIGMOD, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. E. Zimányi. Query evaluation in probabilistic relational databases. Theor. Comput. Sci., 171(1-2), 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Provenance for aggregate queries

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in
    • Published in

      cover image ACM Conferences
      PODS '11: Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
      June 2011
      332 pages
      ISBN:9781450306607
      DOI:10.1145/1989284

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 June 2011

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article

      Acceptance Rates

      Overall Acceptance Rate476of1,835submissions,26%

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!