skip to main content
article

A compressor for effective archiving, retrieval, and updating of XML documents

Authors Info & Claims
Published:01 August 2006Publication History
Skip Abstract Section

Abstract

Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, research on compressors for XML data has been conducted. Some XML compressors do not support querying compressed data, while other XML compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods. Existing XML compressors do not provide the facility for updates on compressed XML data.In this article, we propose XPRESS, an XML compressor which supports direct updates and efficient evaluations of queries on compressed XML data. XPRESS adopts a novel encoding method called reverse arithmetic encoding, which encodes label paths of XML data and applies diverse encoding methods depending on the types of data values. Experimental results with real-life data sets show that XPRESS achieves significant improvements on query performance for compressed XML data and reasonable compression ratios. On average, the query performance of XPRESS is 2.13 times better than that of an existing XML compressor, and the compression ratio of XPRESS is about 71%. Additionally, we demonstrate the efficiency of the updates performed directly on compressed XML data.

References

  1. Aboulnaga, A., Alameldeen, A. R., and Naughton, J. F. 2001. Estimating the selectivity of xml path expressions for internet scale applications. In Proceedings of 27th International Conference on Very Large Data Bases. 591--600.]] Google ScholarGoogle Scholar
  2. Arion, A., Bonifati, A., Costa, G., D'Aguanno, S., Manolescu, I., and Pugliese, A. 2004. Efficient query evaluation over compressed xml data. In Proceedings of 9th International Conference on Extending Database Technology. 200--218.]]Google ScholarGoogle Scholar
  3. Bayardo, R. J., Gruhl, D., Josifovski, V., and Myllymaki, J. 2004. An evaluation of binary xml encoding optimizations for fast stream based xml processing. In Proceedings of WWW/2004. 17--22.]] Google ScholarGoogle Scholar
  4. Boag, S., Chamberlin, D., Fernandez, M. F., Florescu, D., Robie, J., and Simeon, J. 2002. XQuery 1.0: An XML Query Language. Working Draft, http://www.w3.org/TR/2002/WD-xquery-20020816.]]Google ScholarGoogle Scholar
  5. Bray, T., Paoli, J., Sperberg-McQueen, C. M., and Maler, E. 1998. Extensible Markup Language (XML) 1.0. W3C Recommendation, http://www.w3.org/TR/REC-xml.]]Google ScholarGoogle Scholar
  6. Chen, Z., Gehrke, J., and Korn, F. 2000. Query optimization in compressed database systems. In Proceedings of ACM SIGMOD.]] Google ScholarGoogle Scholar
  7. Cheng, J. and Ng, W. 2004. Xqzip: Querying compressed xml using structural indexing. In Proceedings of 9th International Conference on Extending Database Technology. 219--236.]]Google ScholarGoogle Scholar
  8. Clark, J. and DeRose, S. 1999. XML Path Language(XPath) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xpath.]]Google ScholarGoogle Scholar
  9. Cover, R. 2001. The XML Cover Pages. http://www.oasis-open.org/cover/xml.html.]]Google ScholarGoogle Scholar
  10. ETRI. GML Data. http://www.telematics.re.kr/board/fboard/proc/fboard_list.jsp?code=tech&md1=4&md2=3&md3=1.]]Google ScholarGoogle Scholar
  11. Fernandez, M. F. and Suciu, D. 1998. Optimizing regular path expressions using graph schemas. In Proceedings of the 14th International Conference on Data Engineering. 14--23.]] Google ScholarGoogle Scholar
  12. Fernandez, M. F., Tan, W. C., and Suciu, D. 2000. Silkroute: trading between relations and xml. WWW9 Comput. Netw. 33, 1--6 (June), 723--745.]] Google ScholarGoogle Scholar
  13. Florescu, D. and Kossman, D. 1999. Storing and querying xml data using an rdmbs. IEEE Data Engin. Bull. 22, 3 (Sept.), 27--34.]]Google ScholarGoogle Scholar
  14. Goldman, R. and Widom, J. 1997. Dataguides: Enable query formulation and optimization in semistructured databases. In Proceedings of 23rd International Conference on Very Large Data Bases. 436--445.]] Google ScholarGoogle Scholar
  15. Grust, T. 2002. Accelerating xpath location steps. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 109--120.]] Google ScholarGoogle Scholar
  16. Harold, E. R. Long Baseball Examples from The XML Bible. ibiblio, http://www.ibiblio.org/xml/books/biblegold/examples/baseball/.]] Google ScholarGoogle Scholar
  17. Howard, P. G. and Vitter, J. S. 1991. Analysis of arithmetic coding for data compression. In Proceedings of the IEEE Data Compression Conference. 3--12.]]Google ScholarGoogle Scholar
  18. Huffman, D. A. 1952. A method for the construction of minimum redandancy codes. In Proceedings of the Institute of Radio Engineers 40. 1098--1101.]]Google ScholarGoogle Scholar
  19. Li, Q. and Moon, B. 2001. Indexing and querying xml data for regular path expressions. In Proceedings of 27th International Conference on Very Large Data Bases. 361--370.]] Google ScholarGoogle Scholar
  20. Liefke, H. and Suciu, D. 2000. Xmill: An efficient compressor for xml data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 153--164.]] Google ScholarGoogle Scholar
  21. Open GIS Consortium. http://opengis.net/gml/.]]Google ScholarGoogle Scholar
  22. Salminen, A. and Tompa, F. W. 1992. Pat expressions: An algebra for text search. Acta Linguistica Hungarica 41, 1--4, 277--306.]]Google ScholarGoogle Scholar
  23. Salomon, D. 1998. Data Compression, The Complete Reference. Springer-Verlag, Berlin, Germany.]] Google ScholarGoogle Scholar
  24. Shanmugasundaram, J., Shekita, E. J., Barr, R., Carey, M. J., Lindsay, B. G., Pirahesh, H., and Reinwald, B. 2000. Efficiently publishing relational data as xml documents. In Proceedings of 26th International Conference on Very Large Data Bases. 65--76.]] Google ScholarGoogle Scholar
  25. Shannon, C. E. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27, 398--403.]]Google ScholarGoogle Scholar
  26. Shimura, T., Yoshikawa, M., and Uemura, S. 1999. Storing and retrieval of xml documents using object-relational databases. In Proceedings of 10th International Conference on Database and Expert Systems Applications (DEXA). 206--217.]] Google ScholarGoogle Scholar
  27. Tatarinov, I., Ives, Z. G., Halevy, A. Y., and Weld, D. S. 2001. Updating xml. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]] Google ScholarGoogle Scholar
  28. Tatarinov, I., Viglas, S. D., Beyer, K., Shanmugasundaram, J., Shekita, E., and Zhang, C. 2002. Storing and querying ordered xml using a relational database system. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 204--215.]] Google ScholarGoogle Scholar
  29. Tolani, P. M. and Haritsa, J. R. 2002. Xgrind: A query-friendly xml compressor. In Proceedings of 18th International Conference on Data Engineering. 225--234.]] Google ScholarGoogle Scholar
  30. UW. XML Data Repository. http://www.cs.washington.edu/research/xmldatasets/www/repository.html.]]Google ScholarGoogle Scholar
  31. Witten, I. H., Neal, R. M., and Cleary, J. G. 1987. Arithmetic coding for data compression. Comm. ACM 30, 6 (June), 520--540.]] Google ScholarGoogle Scholar
  32. Zhang, N., Kacholia, V., and Özsu, M. T. 2004. A succinct physical storage scheme for efficient evaluation of path queries in xml. In Proceedings of the 20th International Conference on Data Engineering. 54--65.]] Google ScholarGoogle Scholar

Index Terms

  1. A compressor for effective archiving, retrieval, and updating of XML documents

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Internet Technology
          ACM Transactions on Internet Technology  Volume 6, Issue 3
          August 2006
          109 pages
          ISSN:1533-5399
          EISSN:1557-6051
          DOI:10.1145/1151087
          Issue’s Table of Contents

          Copyright © 2006 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 August 2006
          Published in toit Volume 6, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!