skip to main content
article

Supporting complex queries on multiversion XML documents

Published:01 February 2006Publication History
Skip Abstract Section

Abstract

Managing multiple versions of XML documents represents a critical requirement for many applications. Recently, there has been much work on supporting complex queries on XML data (e.g., regular path expressions, structural projections, etc.). In this article, we examine the problem of implementing efficiently such complex queries on multiversion XML documents. Our approach relies on a numbering scheme, whereby durable node numbers (DNNs) are used to preserve the order among the nodes of the XML tree while remaining invariant with respect to updates. Using the document's DNNs, we show that many complex queries are reduced to combinations of range version retrieval queries. We thus examine three alternative storage organizations/indexing schemes to efficiently evaluate range version retrieval queries in this environment. A thorough performance analysis is then presented to reveal the advantages of each scheme.

References

  1. Abiteboul, S., Kaplan, H., and Milo, T. 2001. Compact labeling schemes for ancestor queries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). ACM, New York. Google ScholarGoogle Scholar
  2. Abiteboul, S., Quass, D., McHugh, J., Widom, J., and Wiener, J. L. 1997. The Lorel query language for semistructured data. Journal of Digital Libraries 1, 1, 68--88.Google ScholarGoogle Scholar
  3. Al-Khalifa, S., Jagadish, H. V., Koudas, N., Patel, J. M., Srivastava, D., and Wu, Y. 2002. Structural joins: A primitive for efficient XML query pattern matching. In Proceedings of the International Conference on Data Engineering (ICDE). 141--154. Google ScholarGoogle Scholar
  4. Becker, B., Gschwind, S., Ohler, T., Seeger, B., and Widmayer, P. 1996. An asymptotically optimal multiversion B-tree. VLDB Journal 5, 4, 264--275. Google ScholarGoogle Scholar
  5. Beckmann, N., Kriegel, H. P., Schneider, R., and Seeger, B. 1990. The R*-tree: An efficient and robust access method for points and rectangles. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). ACM, New York, 322--331. Google ScholarGoogle Scholar
  6. Beech, D. and Mahbod, B. 1988. Generalized version control in an object-oriented database. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarGoogle Scholar
  7. Bosak, J. 1999. The plays of Shakespeare in XML. http://www.oasis-open.org/cover/bosakShakespeare200.html.Google ScholarGoogle Scholar
  8. Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). ACM, New York, 310--321. Google ScholarGoogle Scholar
  9. Buneman, P., Khanna, S., Tajima, K., and Tan, W. 2002. Archiving scientific data. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). ACM, New York. Google ScholarGoogle Scholar
  10. Ceri, S., Comai, S., Damiani, E., Fraternali, P. S. P., and Tanca, L. 1999. XML-GL: A graphical language for querying and restructuring XML. In Proceedings of the WWW Conference. 1171--1187. Google ScholarGoogle Scholar
  11. Chamberlin, D., Robie, J., and Florescu, D. 2000. Quilt: An XML query language for hgeterogeneous data sources. In Proceedings of the International Workshop on the Web and Databases (WebDB). 1--25. Google ScholarGoogle Scholar
  12. Chawathe, S. and Garcia-Molina, H. 1998. Representing and querying changes in semistructured data. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarGoogle Scholar
  13. Chawathe, S., Rajaraman, A., Garcia-Molina, H., and Widom, J. 1996. Change detection in hierarchically structured information. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). ACM, New York, 493--504. Google ScholarGoogle Scholar
  14. Chien, S.-Y., Tsotras, V. J., and Zaniolo, C. 2000. Version management of XML documents. In Proceedings of the International Workshop on the Web and Databases (WebDB). 184--200. Google ScholarGoogle Scholar
  15. Chien, S.-Y., Tsotras, V. J., and Zaniolo, C. 2001a. Efficient management of multiversion documents by object referencing. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 291--300. Google ScholarGoogle Scholar
  16. Chien, S.-Y., Tsotras, V. J., and Zaniolo, C. 2002a. Efficient schemes for managing multiversion XML documents. VLDB Journal 4, 332--353. Google ScholarGoogle Scholar
  17. Chien, S.-Y., Tsotras, V. J., Zaniolo, C., and Zhang, D. 2001b. Storing and querying multiversion XML documents using durable node numbers. In Proceedings of the International Conference on Web Information Systems Engineering (WISE). 232--244. Google ScholarGoogle Scholar
  18. Chien, S.-Y., Tsotras, V. J., Zaniolo, C., and Zhang, D. 2002b. Efficient complex query support for multiversion XML documents. In Proceedings of the International Conference on Extending Database Technology (EDBT). 161--178. Google ScholarGoogle Scholar
  19. Chien, S.-Y., Vagena, Z., Zhang, D., Tsotras, V. J., and Zaniolo, C. 2002c. Efficient structural joins on indexed XML documents. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 263--274. Google ScholarGoogle Scholar
  20. Cohen, E., Kaplan, H., and Milo, T. 2002. Labeling dynamic XML trees. In Proceedings of the ACM International Symposium on Principles of Database Systems (PODS). ACM, New York. Google ScholarGoogle Scholar
  21. Deutsch, A., Fernandez, M., Florescu, D., Levy, A., and Suciu, D. 1999. A query language for XML. In Proceedings of the WWW Conference. 1155--1169. Google ScholarGoogle Scholar
  22. Diao, Y., Ficher, P., Franklin, M., and To, R. 2002. YFilter: Efficient and scalable filtering of XML documents. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarGoogle Scholar
  23. Fernandez, M. and Suciu, D. 1998. Optimizing regular path expressions using graph schemas. In Proceedings of the International Conference on Data Engineering (ICDE). 14--23. Google ScholarGoogle Scholar
  24. Fiebig, T. and Moerkotte, G. 2000. Evaluating queries on structure with eXtended access support relations. In Proceedings of the International Workshop on the Web and Databases (WebDB). Google ScholarGoogle Scholar
  25. Gottlob, G., Koch, C., and Pichler, R. 2002. Efficient algorithms for processing XPath queries. In Proceedings of the International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle Scholar
  26. Gottlob, G., Koch, C., and Pichler, R. 2003a. XPath processing in a nutshell. SIGMOD Record 32, 1, 12--19. Google ScholarGoogle Scholar
  27. Gottlob, G., Koch, C., and Pichler, R. 2003b. XPath query evaluation: Improving time and space efficiency. In Proceedings of the International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  28. Halverson, A., Burger, J., Galanis, L., Kini, A., Krishnamurthy, R., Rao, A., Tian, F., Viglas, S., Wang, Y., Naughton, J., and DeWitt, D. 2003. Mixed mode XML query processing. In Proceedings of the International Conference on Very Large Data Bases (VLDB). Google ScholarGoogle Scholar
  29. Jensen, C. and Snodgrass, R. 1999. Temporal data management. IEEE Transactions on Knowledge and Data Engineering (TKDE) 11, 1, 36--44. Google ScholarGoogle Scholar
  30. Jiang, H., Lu, H., Wang, W., and Ooi, B. C. 2003. XR-Tree: Indexing XML data for efficient structural join. In Proceedings of the International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  31. Jiang, L., Salzberg, B., Lomet, D., and Barrena, M. 2000. The BT-tree: A branched and temporal access method. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 451--460. Google ScholarGoogle Scholar
  32. Kaplan, H., Milo, T., and Shabo, R. 2002. A comparison of labeling schemes for ancestor queries. In Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms (SODA). Google ScholarGoogle Scholar
  33. Katz, R. H. and Change, E. 1987. Managing change in a computer-aided design databases. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 455--462. Google ScholarGoogle Scholar
  34. Kumar, A., Tsotras, V. J., and Faloutsos, C. 1998. Designing access methods for bitemporal databases. IEEE Transactions on Knowledge and Data Engineering (TKDE) 10, 1, 1--20. Google ScholarGoogle Scholar
  35. Landau, G. M., Schmidt, J. P., and Tsotras, V. J. 1995. Historical queries along multiple lines of time evolution. VLDB Journal 4, 4. Google ScholarGoogle Scholar
  36. Lanka, S. and Mays, E. 1991. Fully persistent B+ trees. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). 426--435. Google ScholarGoogle Scholar
  37. Leblang, D. 1994. The CM challenge: Configuration management that works. In Configuration Management, W. F. Tichy, ed. Wiley, 1--38. Google ScholarGoogle Scholar
  38. Levy, A., Florescu, D., Suciu, D., Kang, J., and Fernandez, M. 1997. STRUDEL---A web-site management system. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). Google ScholarGoogle Scholar
  39. Li, Q. and Moon, B. 2001. Indexing and querying XML data for regular path expressions. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 361--370. Google ScholarGoogle Scholar
  40. Lomet, D. and Salzberg, B. 1989. Access methods for multiversion data. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). 315--324. Google ScholarGoogle Scholar
  41. Marian, A., Abiteboul, S., Cobena, G., and Mignet, L. 2001. Change-centric management of versions in an XML warehouse. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 581--590. Google ScholarGoogle Scholar
  42. McHugh, J. and Widom, J. 1999. Query optimization for XML. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 315--326. Google ScholarGoogle Scholar
  43. Özsoyoglu, G. and Snodgrass, R. 1995. Temporal and real-time databases: A survey. IEEE Transactions on Knowledge and Data Engineering (TKDE) 7, 4 (Aug.), 513--532. Google ScholarGoogle Scholar
  44. Rao, P. and Moon, B. 2004. PRIX: Indexing and querying XML using prufer sequences. In Proceedings of the International Conference on Data Engineering (ICDE). Google ScholarGoogle Scholar
  45. Rochkind, M. J. 1975. The source code control system. IEEE Transactions on Software Engineering SE-1, 4 (Dec.), 364--370.Google ScholarGoogle Scholar
  46. Salzberg, B. and Tsotras, V. J. 1999. Comparison of access methods for time-evolving data. ACM Computing Surveys 31, 2, 158--221. Google ScholarGoogle Scholar
  47. Shanmugasundaram, J., Tufte, K., He, G., Zhang, C., DeWitt, D. J., and Naughton, J. F. 1999. Relational databases for querying XML documents: Limitations and opportunities. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 302--314. Google ScholarGoogle Scholar
  48. Tao, Y. and Papadias, D. 2001. MV3R-Tree: A spatio-temporal access method for timestamp and interval queries. In Proceedings of the International Conference on Very Large Data Bases (VLDB). 431--440. Google ScholarGoogle Scholar
  49. Tatarinov, I., Viglas, S. D., Beyer, K., Shanmugasundaram, J., Shekita, E., and Zhang, C. 2002. Storing and querying ordered XML using a relational database system. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). Google ScholarGoogle Scholar
  50. Tian, F., DeWitt, D. J., Chen, J., and Zhang, C. 2002. The design and performance evaluation of various XML storage strategies. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). 5--10.Google ScholarGoogle Scholar
  51. Tichy, W. F. 1985. RCS---A system for version control. Software--Practice & Experience 15, 7 (July), 637--654. Google ScholarGoogle Scholar
  52. Tsotras, V. J. and Kangelaris, N. 1995. The snapshot index: An I/O-optimal access method for timeslice queries. Journal of Information Systems 20, 3, 237--260. Google ScholarGoogle Scholar
  53. Vagena, Z., Moro, M., and Tsotras, V. J. 2004. Supporting branched versions on XML documents. In Proceedings of the International Workshop on Research Issues on Data Engineering (RIDE). Google ScholarGoogle Scholar
  54. Vagena, Z. and Tsotras, V. J. 2003. Path-expression queries over multiversion XML documents. In Proceedings of the International Workshop on the Web and Databases (WebDB).Google ScholarGoogle Scholar
  55. Varman, P. and Verma, R. 1997. An efficient multiversion access structure. IEEE Transactions on Knowledge and Data Engineering (TKDE) 9, 3, 391--409. Google ScholarGoogle Scholar
  56. Wang, H., Park, S., Fan, W., and Yu, P. 2003. ViST: A dynamic index method for querying XML data by tree structures. In Proceedings of the ACM/SIGMOD Annual Conference on Management of Data (SIGMOD). Google ScholarGoogle Scholar
  57. Webdav 2001. webdav, WWW distributed authoring and versioning, last modified: Jul 31, 2001. http://www.ietf.org/html.charters/webdav-charter.html.Google ScholarGoogle Scholar
  58. Wei, W., Haifeng, J., Lu, H., and Yu, J.-X. 2003. PBiTree coding and efficient processing of containment join. In Proceedings of the International Conference on Data Engineering (ICDE).Google ScholarGoogle Scholar
  59. Wong, R. and Lam, N. 2002. Managing and querying multi-version XML data with update logging. In Proceedings of the DocEng Conference. Google ScholarGoogle Scholar
  60. WWW Consortium 1999. XML path language (XPath), version 1.0. http://www.w3.org/TR/xpath.html.Google ScholarGoogle Scholar
  61. WWW Consortium 2001. XQuery 1.0: An XML query language. W3C working draft (work in progress), http://www.w3.org/TR/xquery.Google ScholarGoogle Scholar
  62. Zhang, D., Markowetz, A., Tsotras, V. J., Gunopulos, D., and Seeger, B. 2001. Efficient computation of temporal aggregates with range predicates. In Proceedings of the ACM International Symposium on Principles of Database Systems (PODS). Google ScholarGoogle Scholar

Index Terms

  1. Supporting complex queries on multiversion XML documents

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!