Abstract
Like HTML, many XML documents are resident on native file systems. Since XML data is irregular and verbose, the disk space and the network bandwidth are wasted. To overcome the verbosity problem, research on compressors for XML data has been conducted. Some XML compressors do not support querying compressed data, while other XML compressors which support querying compressed data blindly encode tags and data values using predefined encoding methods. Existing XML compressors do not provide the facility for updates on compressed XML data.In this article, we propose XPRESS, an XML compressor which supports direct updates and efficient evaluations of queries on compressed XML data. XPRESS adopts a novel encoding method called reverse arithmetic encoding, which encodes label paths of XML data and applies diverse encoding methods depending on the types of data values. Experimental results with real-life data sets show that XPRESS achieves significant improvements on query performance for compressed XML data and reasonable compression ratios. On average, the query performance of XPRESS is 2.13 times better than that of an existing XML compressor, and the compression ratio of XPRESS is about 71%. Additionally, we demonstrate the efficiency of the updates performed directly on compressed XML data.
- Aboulnaga, A., Alameldeen, A. R., and Naughton, J. F. 2001. Estimating the selectivity of xml path expressions for internet scale applications. In Proceedings of 27th International Conference on Very Large Data Bases. 591--600.]] Google Scholar
- Arion, A., Bonifati, A., Costa, G., D'Aguanno, S., Manolescu, I., and Pugliese, A. 2004. Efficient query evaluation over compressed xml data. In Proceedings of 9th International Conference on Extending Database Technology. 200--218.]]Google Scholar
- Bayardo, R. J., Gruhl, D., Josifovski, V., and Myllymaki, J. 2004. An evaluation of binary xml encoding optimizations for fast stream based xml processing. In Proceedings of WWW/2004. 17--22.]] Google Scholar
- Boag, S., Chamberlin, D., Fernandez, M. F., Florescu, D., Robie, J., and Simeon, J. 2002. XQuery 1.0: An XML Query Language. Working Draft, http://www.w3.org/TR/2002/WD-xquery-20020816.]]Google Scholar
- Bray, T., Paoli, J., Sperberg-McQueen, C. M., and Maler, E. 1998. Extensible Markup Language (XML) 1.0. W3C Recommendation, http://www.w3.org/TR/REC-xml.]]Google Scholar
- Chen, Z., Gehrke, J., and Korn, F. 2000. Query optimization in compressed database systems. In Proceedings of ACM SIGMOD.]] Google Scholar
- Cheng, J. and Ng, W. 2004. Xqzip: Querying compressed xml using structural indexing. In Proceedings of 9th International Conference on Extending Database Technology. 219--236.]]Google Scholar
- Clark, J. and DeRose, S. 1999. XML Path Language(XPath) Version 1.0. W3C Recommendation, http://www.w3.org/TR/xpath.]]Google Scholar
- Cover, R. 2001. The XML Cover Pages. http://www.oasis-open.org/cover/xml.html.]]Google Scholar
- ETRI. GML Data. http://www.telematics.re.kr/board/fboard/proc/fboard_list.jsp?code=tech&md1=4&md2=3&md3=1.]]Google Scholar
- Fernandez, M. F. and Suciu, D. 1998. Optimizing regular path expressions using graph schemas. In Proceedings of the 14th International Conference on Data Engineering. 14--23.]] Google Scholar
- Fernandez, M. F., Tan, W. C., and Suciu, D. 2000. Silkroute: trading between relations and xml. WWW9 Comput. Netw. 33, 1--6 (June), 723--745.]] Google Scholar
- Florescu, D. and Kossman, D. 1999. Storing and querying xml data using an rdmbs. IEEE Data Engin. Bull. 22, 3 (Sept.), 27--34.]]Google Scholar
- Goldman, R. and Widom, J. 1997. Dataguides: Enable query formulation and optimization in semistructured databases. In Proceedings of 23rd International Conference on Very Large Data Bases. 436--445.]] Google Scholar
- Grust, T. 2002. Accelerating xpath location steps. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 109--120.]] Google Scholar
- Harold, E. R. Long Baseball Examples from The XML Bible. ibiblio, http://www.ibiblio.org/xml/books/biblegold/examples/baseball/.]] Google Scholar
- Howard, P. G. and Vitter, J. S. 1991. Analysis of arithmetic coding for data compression. In Proceedings of the IEEE Data Compression Conference. 3--12.]]Google Scholar
- Huffman, D. A. 1952. A method for the construction of minimum redandancy codes. In Proceedings of the Institute of Radio Engineers 40. 1098--1101.]]Google Scholar
- Li, Q. and Moon, B. 2001. Indexing and querying xml data for regular path expressions. In Proceedings of 27th International Conference on Very Large Data Bases. 361--370.]] Google Scholar
- Liefke, H. and Suciu, D. 2000. Xmill: An efficient compressor for xml data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 153--164.]] Google Scholar
- Open GIS Consortium. http://opengis.net/gml/.]]Google Scholar
- Salminen, A. and Tompa, F. W. 1992. Pat expressions: An algebra for text search. Acta Linguistica Hungarica 41, 1--4, 277--306.]]Google Scholar
- Salomon, D. 1998. Data Compression, The Complete Reference. Springer-Verlag, Berlin, Germany.]] Google Scholar
- Shanmugasundaram, J., Shekita, E. J., Barr, R., Carey, M. J., Lindsay, B. G., Pirahesh, H., and Reinwald, B. 2000. Efficiently publishing relational data as xml documents. In Proceedings of 26th International Conference on Very Large Data Bases. 65--76.]] Google Scholar
- Shannon, C. E. 1948. A mathematical theory of communication. Bell Syst. Tech. J. 27, 398--403.]]Google Scholar
- Shimura, T., Yoshikawa, M., and Uemura, S. 1999. Storing and retrieval of xml documents using object-relational databases. In Proceedings of 10th International Conference on Database and Expert Systems Applications (DEXA). 206--217.]] Google Scholar
- Tatarinov, I., Ives, Z. G., Halevy, A. Y., and Weld, D. S. 2001. Updating xml. In Proceedings of the ACM SIGMOD International Conference on Management of Data.]] Google Scholar
- Tatarinov, I., Viglas, S. D., Beyer, K., Shanmugasundaram, J., Shekita, E., and Zhang, C. 2002. Storing and querying ordered xml using a relational database system. In Proceedings of the ACM SIGMOD International Conference on Management of Data. 204--215.]] Google Scholar
- Tolani, P. M. and Haritsa, J. R. 2002. Xgrind: A query-friendly xml compressor. In Proceedings of 18th International Conference on Data Engineering. 225--234.]] Google Scholar
- UW. XML Data Repository. http://www.cs.washington.edu/research/xmldatasets/www/repository.html.]]Google Scholar
- Witten, I. H., Neal, R. M., and Cleary, J. G. 1987. Arithmetic coding for data compression. Comm. ACM 30, 6 (June), 520--540.]] Google Scholar
- Zhang, N., Kacholia, V., and Özsu, M. T. 2004. A succinct physical storage scheme for efficient evaluation of path queries in xml. In Proceedings of the 20th International Conference on Data Engineering. 54--65.]] Google Scholar
Index Terms
A compressor for effective archiving, retrieval, and updating of XML documents
Recommendations
An Effective GML Documents Compressor
As GML is becoming the de facto standard for geographic data storage, transmission and exchange, more and more geographic data exists in GML format. In applications, GML documents are usually very large in size because they contain a large number of ...
A space efficient XML DOM parser
In many XML applications, parsing is a key operation. When the processing involves modifying data, random access, and/or in an order different from the one in which elements are stored, a DOM parser has to be used. A major problem with using a DOM ...
XCQ: A queriable XML compression system
XML has already become the de facto standard for specifying and exchanging data on the Web. However, XML is by nature verbose and thus XML documents are usually large in size, a factor that hinders its practical usage, since it substantially increases ...






Comments