Abstract
Managing multiple versions of XML documents represents a critical requirement for many applications. Recently, there has been much work on supporting complex queries on XML data (e.g., regular path expressions, structural projections, etc.). In this article, we examine the problem of implementing efficiently such complex queries on multiversion XML documents. Our approach relies on a numbering scheme, whereby durable node numbers (DNNs) are used to preserve the order among the nodes of the XML tree while remaining invariant with respect to updates. Using the document's DNNs, we show that many complex queries are reduced to combinations of range version retrieval queries. We thus examine three alternative storage organizations/indexing schemes to efficiently evaluate range version retrieval queries in this environment. A thorough performance analysis is then presented to reveal the advantages of each scheme.
- Abiteboul, S., Kaplan, H., and Milo, T. 2001. Compact labeling schemes for ancestor queries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). ACM, New York. Google Scholar
- Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. Journal of Digital Libraries 1, 1, 68--88.Google Scholar
- Al-Khalifa, S., Jagadish, H. V., Koudas, N., Patel, J. M., Srivastava, D., and Wu, Y. 2002. Structural joins: A primitive for efficient XML query pattern matching. In Proceedings of the International Conference on Data Engineering (ICDE). 141--154. Google Scholar
- Becker, B., Gschwind, S., Ohler, T., Seeger, B., and Widmayer, P. 1996. An asymptotically optimal multiversion B-tree. VLDB Journal 5, 4, 264--275. Google Scholar
- Beckmann, N., Kriegel, H. P., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). ACM, New York, 322--331. Google Scholar
- Beech, D. and Mahbod, B. 1988. Generalized version control in an object-oriented database. In Proceedings of the International Conference on Data Engineering (ICDE). Google Scholar
- Bosak, J. 1999. The plays of Shakespeare in XML. http://www.oasis-open.org/cover/bosakShakespeare200.html.Google Scholar
- Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). ACM, New York, 310--321. Google Scholar
- Buneman, P., Khanna, S., Tajima, K., and Tan, W. 2002. Archiving scientific data. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). ACM, New York. Google Scholar
- Ceri, S., Comai, S., Damiani, E., Fraternali, P. S. P., and Tanca, L. 1999. XML-GL: A graphical language for querying and restructuring XML. In Proceedings of the WWW Conference. 1171--1187. Google Scholar
- Chamberlin, D., Robie, J., and Florescu, D. 2000. Quilt: An XML query language for hgeterogeneous data sources. In Proceedings of the International Workshop on the Web and Databases (WebDB). 1--25. Google Scholar
- Chawathe, S. and Garcia-Molina, H. 1998. Representing and querying changes in semistructured data. In Proceedings of the International Conference on Data Engineering (ICDE). Google Scholar
- Chawathe, S., Rajaraman, A., Garcia-Molina, H., and Widom, J. 1996. Change detection in hierarchically structured information. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). ACM, New York, 493--504. Google Scholar
- Chien, S.-Y., Tsotras, V. J., and Zaniolo, C. 2000. Version management of XML documents. In Proceedings of the International Workshop on the Web and Databases (WebDB). 184--200. Google Scholar
- Chien, S.-Y., Tsotras, V. J., and Zaniolo, C. 2001a. Efficient management of multiversion documents by object referencing. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 291--300. Google Scholar
- Chien, S.-Y., Tsotras, V. J., and Zaniolo, C. 2002a. Efficient schemes for managing multiversion XML documents. VLDB Journal 4, 332--353. Google Scholar
- Chien, S.-Y., Tsotras, V. J., Zaniolo, C., and Zhang, D. 2001b. Storing and querying multiversion XML documents using durable node numbers. In Proceedings of the International Conference on Web Information Systems Engineering (WISE). 232--244. Google Scholar
- Chien, S.-Y., Tsotras, V. J., Zaniolo, C., and Zhang, D. 2002b. Efficient complex query support for multiversion XML documents. In Proceedings of the International Conference on Extending Database Technology (EDBT). 161--178. Google Scholar
- Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V. J., and Zaniolo, C. 2002c. Efficient structural joins on indexed XML documents. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 263--274. Google Scholar
- Cohen, E., Kaplan, H., and Milo, T. 2002. Labeling dynamic XML trees. In Proceedings of the ACM International Symposium on Principles of Database Systems (PODS). ACM, New York. Google Scholar
- Deutsch, A., Fernandez, M., Florescu, D., Levy, A., and Suciu, D. 1999. A query language for XML. In Proceedings of the WWW Conference. 1155--1169. Google Scholar
- Diao, Y., Ficher, P., Franklin, M., and To, R. 2002. YFilter: Efficient and scalable filtering of XML documents. In Proceedings of the International Conference on Data Engineering (ICDE). Google Scholar
- Fernandez, M. and Suciu, D. 1998. Optimizing regular path expressions using graph schemas. In Proceedings of the International Conference on Data Engineering (ICDE). 14--23. Google Scholar
- Fiebig, T. and Moerkotte, G. 2000. Evaluating queries on structure with eXtended access support relations. In Proceedings of the International Workshop on the Web and Databases (WebDB). Google Scholar
- Gottlob, G., Koch, C., and Pichler, R. 2002. Efficient algorithms for processing XPath queries. In Proceedings of the International Conference on Very Large Data Bases (VLDB). Google Scholar
- Gottlob, G., Koch, C., and Pichler, R. 2003a. XPath processing in a nutshell. SIGMOD Record 32, 1, 12--19. Google Scholar
- Gottlob, G., Koch, C., and Pichler, R. 2003b. XPath query evaluation: Improving time and space efficiency. In Proceedings of the International Conference on Data Engineering (ICDE).Google Scholar
- Halverson, A., Burger, J., Galanis, L., Kini, A., Krishnamurthy, R., Rao, A., Tian, F., Viglas, S., Wang, Y., Naughton, J., and DeWitt, D. 2003. Mixed mode XML query processing. In Proceedings of the International Conference on Very Large Data Bases (VLDB). Google Scholar
- Jensen, C. and Snodgrass, R. 1999. Temporal data management. IEEE Transactions on Knowledge and Data Engineering (TKDE) 11, 1, 36--44. Google Scholar
- Jiang, H., Lu, H., Wang, W., and Ooi, B. C. 2003. XR-Tree: Indexing XML data for efficient structural join. In Proceedings of the International Conference on Data Engineering (ICDE).Google Scholar
- Jiang, L., Salzberg, B., Lomet, D., and Barrena, M. 2000. The BT-tree: A branched and temporal access method. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 451--460. Google Scholar
- Kaplan, H., Milo, T., and Shabo, R. 2002. A comparison of labeling schemes for ancestor queries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Google Scholar
- Katz, R. H. and Change, E. 1987. Managing change in a computer-aided design databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 455--462. Google Scholar
- Kumar, A., Tsotras, V. J., and Faloutsos, C. 1998. Designing access methods for bitemporal databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) 10, 1, 1--20. Google Scholar
- Landau, G. M., Schmidt, J. P., and Tsotras, V. J. 1995. Historical queries along multiple lines of time evolution. VLDB Journal 4, 4. Google Scholar
- Lanka, S. and Mays, E. 1991. Fully persistent B+ trees. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). 426--435. Google Scholar
- Leblang, D. 1994. The CM challenge: Configuration management that works. In Configuration Management, W. F. Tichy, ed. Wiley, 1--38. Google Scholar
- Levy, A., Florescu, D., Suciu, D., Kang, J., and Fernandez, M. 1997. STRUDEL---A web-site management system. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). Google Scholar
- Li, Q. and Moon, B. 2001. Indexing and querying XML data for regular path expressions. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 361--370. Google Scholar
- Lomet, D. and Salzberg, B. 1989. Access methods for multiversion data. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). 315--324. Google Scholar
- Marian, A., Abiteboul, S., Cobena, G., and Mignet, L. 2001. Change-centric management of versions in an XML warehouse. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 581--590. Google Scholar
- McHugh, J. and Widom, J. 1999. Query optimization for XML. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 315--326. Google Scholar
- Özsoyoglu, G. and Snodgrass, R. 1995. Temporal and real-time databases: A survey. IEEE Transactions on Knowledge and Data Engineering (TKDE) 7, 4 (Aug.), 513--532. Google Scholar
- Rao, P. and Moon, B. 2004. PRIX: Indexing and querying XML using prufer sequences. In Proceedings of the International Conference on Data Engineering (ICDE). Google Scholar
- Rochkind, M. J. 1975. The source code control system. IEEE Transactions on Software Engineering SE-1, 4 (Dec.), 364--370.Google Scholar
- Salzberg, B. and Tsotras, V. J. 1999. Comparison of access methods for time-evolving data. ACM Computing Surveys 31, 2, 158--221. Google Scholar
- Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D. J., and Naughton, J. F. 1999. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 302--314. Google Scholar
- Tao, Y. and Papadias, D. 2001. MV3R-Tree: A spatio-temporal access method for timestamp and interval queries. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 431--440. Google Scholar
- Tatarinov, I., Viglas, S. D., Beyer, K., Shanmugasundaram, J., Shekita, E., and Zhang, C. 2002. Storing and querying ordered XML using a relational database system. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). Google Scholar
- Tian, F., DeWitt, D. J., Chen, J., and Zhang, C. 2002. The design and performance evaluation of various XML storage strategies. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). 5--10.Google Scholar
- Tichy, W. F. 1985. RCS---A system for version control. Software--Practice & Experience 15, 7 (July), 637--654. Google Scholar
- Tsotras, V. J. and Kangelaris, N. 1995. The snapshot index: An I/O-optimal access method for timeslice queries. Journal of Information Systems 20, 3, 237--260. Google Scholar
- Vagena, Z., Moro, M., and Tsotras, V. J. 2004. Supporting branched versions on XML documents. In Proceedings of the International Workshop on Research Issues on Data Engineering (RIDE). Google Scholar
- Vagena, Z. and Tsotras, V. J. 2003. Path-expression queries over multiversion XML documents. In Proceedings of the International Workshop on the Web and Databases (WebDB).Google Scholar
- Varman, P. and Verma, R. 1997. An efficient multiversion access structure. IEEE Transactions on Knowledge and Data Engineering (TKDE) 9, 3, 391--409. Google Scholar
- Wang, H., Park, S., Fan, W., and Yu, P. 2003. ViST: A dynamic index method for querying XML data by tree structures. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). Google Scholar
- Webdav 2001. webdav, WWW distributed authoring and versioning, last modified: Jul 31, 2001. http://www.ietf.org/html.charters/webdav-charter.html.Google Scholar
- Wei, W., Haifeng, J., Lu, H., and Yu, J.-X. 2003. PBiTree coding and efficient processing of containment join. In Proceedings of the International Conference on Data Engineering (ICDE).Google Scholar
- Wong, R. and Lam, N. 2002. Managing and querying multi-version XML data with update logging. In Proceedings of the DocEng Conference. Google Scholar
- WWW Consortium 1999. XML path language (XPath), version 1.0. http://www.w3.org/TR/xpath.html.Google Scholar
- WWW Consortium 2001. XQuery 1.0: An XML query language. W3C working draft (work in progress), http://www.w3.org/TR/xquery.Google Scholar
- Zhang, D., Markowetz, A., Tsotras, V. J., Gunopulos, D., and Seeger, B. 2001. Efficient computation of temporal aggregates with range predicates. In Proceedings of the ACM International Symposium on Principles of Database Systems (PODS). Google Scholar
Index Terms
Supporting complex queries on multiversion XML documents
Recommendations
Efficient Complex Query Support for Multiversion XML Documents
EDBT '02: Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database TechnologyManaging multiple versions of XML documents represents a critical requirement for many applications. Also, there has been much recent interest in supporting complex queries on XML data (e.g., regular path expressions, structural projections, DIFF ...
Processing independent and inter-linked documents in XML databases
IRI'09: Proceedings of the 10th IEEE international conference on Information Reuse & IntegrationThe Extensible Markup Language (XML) model has recently gained huge popularity because of its ability to represent a wide variety of structured and semi-structured data. Several Query languages have been proposed for the XML data model, the most-widely ...
Converting Relational Database into XML Document
DEXA '01: Proceedings of the 12th International Workshop on Database and Expert Systems ApplicationsAbstract: XML (eXtensible Markup Language) has emerged and is gradually accepted as the standard for data interchange in the Internet world. XML databases are packaged by the key Relational Database Vendors in the market as the extender or cartridge to ...






Comments