Abstract
We propose a framework for querying heterogeneous XML data sources. The framework ensures high autonomy to participating sources as it does not rely on a global schema or on semantic mappings between schemas. The basic intuition is that of extending traditional approaches for approximate query evaluation, by providing techniques for combining partial answers coming from different sources, possibly on the basis of limited knowledge about the local schemas (i.e., key constraints). We define a query language and its associated semantics, that allows us to collect as much information as possible from several heterogeneous XML sources. We provide algorithms for query evaluation and characterize the complexity of the query language. Finally, we validate the approach in a medical application scenario.
- Abiteboul, S., Benjelloun, O., and Milo, T. 2008. The active XML project: An overview. VLDB J. (accepted for publication). Google Scholar
Digital Library
- Amer-Yahia, S., Cho, S., and Srivastava, D. 2002. Tree pattern relaxation. In Proceedings of the International Conference on Extending Database Technology (EDBT'02), 496--513. Google Scholar
Digital Library
- Amer-Yahia, S., Koudas, N., Marian, A., Srivastava, D., and Toman, D. 2005. Structure and content scoring for XML. In Proceedings of the International Conference on Very Large Databases (VLDB'05), 361--372. Google Scholar
Digital Library
- Amer-Yahia, S., Lakshmanan, L. V. S., and Pandit, S. 2004. Flexpath: Flexible structure and full-text querying for xml. In Proceedings of the ACM International Conference on Management of Data, SIGMOD. 83--94. Google Scholar
Digital Library
- Baru, C. K., Gupta, A., Ludäscher, B., Marciano, R., Papakonstantinou, Y., Velikhov, P., and Chu, V. 1999. XML-based information mediation with mix. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 597--599. Google Scholar
Digital Library
- BitTorrent. 2007. BitTorrent home page. http://www.bittorrent.com.Google Scholar
- Bonifati, A., Chang, E. Q., Ho, T., Lakshmanan, L. V. S., and Pottinger, R. 2005. Heptox: Marrying XML and heterogeneity in your p2p databases. In Proceedings of the International Conference on Very Large Databases (VLDB'05), 1267--1270. Google Scholar
Digital Library
- Camillo, S. D., Heuser, C. A., and dos Santos Mello, R. 2003. Querying heterogeneous XML sources through a conceptual schema. In Proceedings of the ER. 186--199.Google Scholar
Cross Ref
- Chen, C. X., Mihaila, G. A., Padmanabhan, S., and Rouvellou, I. 2005. Query translation scheme for heterogeneous XML data sources. In Proceedings of the Annual ACM International Workshop on Web Information and Data Management (WIDM'05), 31--38. Google Scholar
Digital Library
- Do, H. H. and Rahm, E. 2002. Coma - A system for flexible combination of schema matching approaches. In Proceedings of the International Conference on Very Large Databases (VLDB'05), 610--621. Google Scholar
Digital Library
- Doan, A., Domingos, P., and Halevy, A. Y. 2001. Reconciling schemas of disparate data sources: A machine-learning approach. In Proceedings of the ACM SIGMOD International Conference on Management of Data, 509--520. Google Scholar
Digital Library
- Fazzinga, B., Flesca, S., and Pugliese, A. 2007. Vague queries on peer-to-peer XML databases. In Proceedings of the International Conference on Database and Expert Systems Applications (DEXA'07). 287--297. Google Scholar
Digital Library
- Fuhr, N. and Großjohann, K. 2004. Xirql: An XML query language based on information retrieval concepts. ACM Trans. Inf. Syst. 22, 2, 313--356. Google Scholar
Digital Library
- Garey, M. R. and Johnson, D. S. 1979. Computers and Intractability: A Guide to the Theory of NP-Completeness. W. H. Freeman. Google Scholar
Digital Library
- Guha, S., Jagadish, H. V., Koudas, N., Srivastava, D., and Yu, T. 2006. Integrating XML data sources using approximate joins. ACM Trans. Database Syst. 31, 1, 161--207. Google Scholar
Digital Library
- Halevy, A. Y., Ives, Z. G., Madhavan, J., Mork, P., Suciu, D., and Tatarinov, I. 2004. The piazza peer data management system. IEEE Trans. Knowl. Data Eng. 16, 7, 787--798. Google Scholar
Digital Library
- Leitão, L., Calado, P., and Weis, M. 2007. Structure-based inference of XML similarity for fuzzy duplicate detection. In Proceedings of the International Conference on Information and Knowledge Management (CIKM'07), 293--302. Google Scholar
Digital Library
- Madhavan, J., Bernstein, P. A., and Rahm, E. 2001. Generic schema matching with cupid. In Proceedings of the International Conference on Very Large Databases (VLDB'01), 49--58. Google Scholar
Digital Library
- Mandreoli, F., Martoglia, R., and Tiberio, P. 2004. Approximate query answering for a heterogeneous XML document base. In Proceedings of the International Conference on Web Information Systems Engineering (WISE'04), 337--351.Google Scholar
- Manolescu, I., Florescu, D., and Kossmann, D. 2001. Answering XML queries on heterogeneous data sources. In Proceedings of the International Conference on Very Large Databases (VLDB'01), 241--250. Google Scholar
Digital Library
- Miklau, G. and Suciu, D. 2004. Containment and equivalence for a fragment of Xpath. J. ACM 51, 1, 2--45. Google Scholar
Digital Library
- Milano, D., Scannapieco, M., and Catarci, T. 2006. Structure aware XML object identification. In Proceedings of the International VLDB Workshop on Clean Databases (CleanDB).Google Scholar
- Napster. 2007. Napster homepage. http://www.napster.com.Google Scholar
- Nejdl, W., Wolf, B., Qu, C., Decker, S., Sintek, M., Naeve, A., Nilsson, M., Palmer, M., and Risch, T. 2001. Edutella: A p2p networking infrastructure based on rdf. https://edutella.dev.java.net/reports/edutella-whitepaper.pdf.Google Scholar
- Pitoura, E., Abiteboul, S., Pfoser, D., Samaras, G., and Vazirgiannis, M. 2003. Dbglobe: A service-oriented p2p system for global computing. SIGMOD Rec. 32, 3, 77--82. Google Scholar
Digital Library
- Polyzotis, N. and Garofalakis, M. N. 2006. Xsketch synopses for XML data graphs. ACM Trans. Database Syst. 31, 3, 1014--1063. Google Scholar
Digital Library
- Puhlmann, S., Weis, M., and Naumann, F. 2006. XML duplicate detection using sorted neighborhoods. In Proceedings of the International Conference on Extending Database Technology (EDBT'06), 773--791. Google Scholar
Digital Library
- Reyner, S. W. 1977. An analysis of a good algorithm for the subtree problem. SIAM J. Comput. 6, 4, 730--732.Google Scholar
Cross Ref
- Rodríguez-Gianolli, P. and Mylopoulos, J. 2001. A semantic approach to XML-based data integration. In Proceedings of the ER, 117--132. Google Scholar
Digital Library
- Schlieder, T. 2002. Schema-Driven evaluation of approximate tree-pattern queries. In Proceedings of the International Conference on Extending Database Technology (EDBT'02), 514--532. Google Scholar
Digital Library
- Tatarinov, I. and Halevy, A. Y. 2004. Efficient query reformulation in peer-data management systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 539--550. Google Scholar
Digital Library
- Theobald, A. and Weikum, G. 2000. Adding relevance to XML. In WebDB (Selected Papers). 105--124. Google Scholar
Digital Library
- Vdovjak, R. and Houben, G.-J. 2001. Rdf-Based architecture for semantic integration of heterogeneous information sources. In Proceedings of the Workshop on Information Integration on the Web, 51--57.Google Scholar
- W3C. 2007. World wide web consortium. http://www.w3.org.Google Scholar
- WordNet. 2007. WordNet homepage. http://wordnet.princeton.edu.Google Scholar
- Yu, C. and Popa, L. 2004. Constraint-Based XML query rewriting for data integration. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 371--382. Google Scholar
Digital Library
Index Terms
Retrieving XML data from heterogeneous sources through vague querying
Recommendations
An Efficient Schema-Based Technique for Querying XML Data
As data integration over the Web has become an increasing demand, there is a growing desire to use XML as a standard format for data exchange. For sharing their grammars efficiently, most of the XML documents in use are associated with a document ...
Query translation scheme for heterogeneous XML data sources
WIDM '05: Proceedings of the 7th annual ACM international workshop on Web information and data managementIn order to formulate a meaningful XML query, a user must have some knowledge of the schema of the XML documents to be queried. The query will succeed only if the schema of the actual documents is consistent with the user's information. When a user ...
Querying Compressed XML Data
Revised Selected Papers of PAKDD 2013 International Workshops on Trends and Applications in Knowledge Discovery and Data Mining - Volume 7867The exploitation of large volume of XML (eXtensible Markup Language) data with a limited storage space implies the development of a special and reliable treatment to compress data and query them. This work studies and treats these processes in order to ...






Comments