skip to main content
research-article

Retrieving XML data from heterogeneous sources through vague querying

Published:11 May 2009Publication History
Skip Abstract Section

Abstract

We propose a framework for querying heterogeneous XML data sources. The framework ensures high autonomy to participating sources as it does not rely on a global schema or on semantic mappings between schemas. The basic intuition is that of extending traditional approaches for approximate query evaluation, by providing techniques for combining partial answers coming from different sources, possibly on the basis of limited knowledge about the local schemas (i.e., key constraints). We define a query language and its associated semantics, that allows us to collect as much information as possible from several heterogeneous XML sources. We provide algorithms for query evaluation and characterize the complexity of the query language. Finally, we validate the approach in a medical application scenario.

References

  1. Abiteboul, S., Benjelloun, O., and Milo, T. 2008. The active XML project: An overview. VLDB J. (accepted for publication). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Amer-Yahia, S., Cho, S., and Srivastava, D. 2002. Tree pattern relaxation. In Proceedings of the International Conference on Extending Database Technology (EDBT'02), 496--513. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., and Toman, D. 2005. Structure and content scoring for XML. In Proceedings of the International Conference on Very Large Databases (VLDB'05), 361--372. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Amer-Yahia, S., Lakshmanan, L. V. S., and Pandit, S. 2004. Flexpath: Flexible structure and full-text querying for xml. In Proceedings of the ACM International Conference on Management of Data, SIGMOD. 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Baru, C. K., Gupta, A., Ludäscher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P., and Chu, V. 1999. XML-based information mediation with mix. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 597--599. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. BitTorrent. 2007. BitTorrent home page. http://www.bittorrent.com.Google ScholarGoogle Scholar
  7. Bonifati, A., Chang, E. Q., Ho, T., Lakshmanan, L. V. S., and Pottinger, R. 2005. Heptox: Marrying XML and heterogeneity in your p2p databases. In Proceedings of the International Conference on Very Large Databases (VLDB'05), 1267--1270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Camillo, S. D., Heuser, C. A., and dos Santos Mello, R. 2003. Querying heterogeneous XML sources through a conceptual schema. In Proceedings of the ER. 186--199.Google ScholarGoogle ScholarCross RefCross Ref
  9. Chen, C. X., Mihaila, G. A., Padmanabhan, S., and Rouvellou, I. 2005. Query translation scheme for heterogeneous XML data sources. In Proceedings of the Annual ACM International Workshop on Web Information and Data Management (WIDM'05), 31--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Do, H. H. and Rahm, E. 2002. Coma - A system for flexible combination of schema matching approaches. In Proceedings of the International Conference on Very Large Databases (VLDB'05), 610--621. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Doan, A., Domingos, P., and Halevy, A. Y. 2001. Reconciling schemas of disparate data sources: A machine-learning approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 509--520. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Fazzinga, B., Flesca, S., and Pugliese, A. 2007. Vague queries on peer-to-peer XML databases. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA'07). 287--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Fuhr, N. and Großjohann, K. 2004. Xirql: An XML query language based on information retrieval concepts. ACM Trans. Inf. Syst. 22, 2, 313--356. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Guha, S., Jagadish, H. V., Koudas, N., Srivastava, D., and Yu, T. 2006. Integrating XML data sources using approximate joins. ACM Trans. Database Syst. 31, 1, 161--207. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Halevy, A. Y., Ives, Z. G., Madhavan, J., Mork, P., Suciu, D., and Tatarinov, I. 2004. The piazza peer data management system. IEEE Trans. Knowl. Data Eng. 16, 7, 787--798. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Leitão, L., Calado, P., and Weis, M. 2007. Structure-based inference of XML similarity for fuzzy duplicate detection. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'07), 293--302. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Madhavan, J., Bernstein, P. A., and Rahm, E. 2001. Generic schema matching with cupid. In Proceedings of the International Conference on Very Large Databases (VLDB'01), 49--58. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mandreoli, F., Martoglia, R., and Tiberio, P. 2004. Approximate query answering for a heterogeneous XML document base. In Proceedings of the International Conference on Web Information Systems Engineering (WISE'04), 337--351.Google ScholarGoogle Scholar
  20. Manolescu, I., Florescu, D., and Kossmann, D. 2001. Answering XML queries on heterogeneous data sources. In Proceedings of the International Conference on Very Large Databases (VLDB'01), 241--250. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Miklau, G. and Suciu, D. 2004. Containment and equivalence for a fragment of Xpath. J. ACM 51, 1, 2--45. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Milano, D., Scannapieco, M., and Catarci, T. 2006. Structure aware XML object identification. In Proceedings of the International VLDB Workshop on Clean Databases (CleanDB).Google ScholarGoogle Scholar
  23. Napster. 2007. Napster homepage. http://www.napster.com.Google ScholarGoogle Scholar
  24. Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Naeve, A., Nilsson, M., Palmer, M., and Risch, T. 2001. Edutella: A p2p networking infrastructure based on rdf. https://edutella.dev.java.net/reports/edutella-whitepaper.pdf.Google ScholarGoogle Scholar
  25. Pitoura, E., Abiteboul, S., Pfoser, D., Samaras, G., and Vazirgiannis, M. 2003. Dbglobe: A service-oriented p2p system for global computing. SIGMOD Rec. 32, 3, 77--82. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Polyzotis, N. and Garofalakis, M. N. 2006. Xsketch synopses for XML data graphs. ACM Trans. Database Syst. 31, 3, 1014--1063. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Puhlmann, S., Weis, M., and Naumann, F. 2006. XML duplicate detection using sorted neighborhoods. In Proceedings of the International Conference on Extending Database Technology (EDBT'06), 773--791. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Reyner, S. W. 1977. An analysis of a good algorithm for the subtree problem. SIAM J. Comput. 6, 4, 730--732.Google ScholarGoogle ScholarCross RefCross Ref
  29. Rodríguez-Gianolli, P. and Mylopoulos, J. 2001. A semantic approach to XML-based data integration. In Proceedings of the ER, 117--132. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Schlieder, T. 2002. Schema-Driven evaluation of approximate tree-pattern queries. In Proceedings of the International Conference on Extending Database Technology (EDBT'02), 514--532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tatarinov, I. and Halevy, A. Y. 2004. Efficient query reformulation in peer-data management systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 539--550. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Theobald, A. and Weikum, G. 2000. Adding relevance to XML. In WebDB (Selected Papers). 105--124. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Vdovjak, R. and Houben, G.-J. 2001. Rdf-Based architecture for semantic integration of heterogeneous information sources. In Proceedings of the Workshop on Information Integration on the Web, 51--57.Google ScholarGoogle Scholar
  34. W3C. 2007. World wide web consortium. http://www.w3.org.Google ScholarGoogle Scholar
  35. WordNet. 2007. WordNet homepage. http://wordnet.princeton.edu.Google ScholarGoogle Scholar
  36. Yu, C. and Popa, L. 2004. Constraint-Based XML query rewriting for data integration. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 371--382. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Retrieving XML data from heterogeneous sources through vague querying

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 9, Issue 2
          May 2009
          116 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/1516539
          Issue’s Table of Contents

          Copyright © 2009 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 11 May 2009
          • Accepted: 1 August 2008
          • Received: 1 August 2007
          Published in toit Volume 9, Issue 2

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!