Abstract
XML-enabled publish-subscribe (pub-sub) systems have emerged as an increasingly important tool for e-commerce and Internet applications. In a typical pub-sub system, subscribed users specify their interests in a profile expressed in the XPath language. Each new data content is then matched against the user profiles so that the content is delivered only to the interested subscribers. As the number of subscribed users and their profiles can grow very large, the scalability of the service is critical to the success of pub-sub systems. In this article, we propose a novel scalable filtering system called iFiST that transforms user profiles of a twig pattern expressed in XPath into sequences using the Prüfer's method. Consequently, instead of breaking a twig pattern into multiple linear paths and matching them separately, FiST performs holistic matching of twig patterns with each incoming document in a bottom-up fashion. FiST organizes the sequences into a dynamic hash-based index for efficient filtering, and exploits the commonality among user profiles to enable shared processing during the filtering phase. We demonstrate that the holistic matching approach reduces filtering cost and memory consumption, thereby improving the scalability of FiST.
- Altinel, M. and Franklin, M. J. 2000. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the 26th VLDB Conference. 53--64. Google Scholar
Digital Library
- Apache. Apache Xerces C++ parser. http://xml.apache.org/xerces-c/.Google Scholar
- Bar-Yossef, Z., Fontoura, M., and Josifovski, V. 2004. On the memory requirements of XPath evaluation over XML streams. In Proceedings of the 23rd ACM Symposium on Principles of Database Systems. 177--188. Google Scholar
Digital Library
- Bar-Yossef, Z., Fontoura, M., and Josifovski, V. 2005. Buffering in query evaluation over XML streams. In Proceedings of the 24th ACM Symposium on Principles of Database Systems. 216--227. Google Scholar
Digital Library
- Berglund, A., Boag, S., Chamberlin, D., Fernáandez, M. F., Kay, M., Robie, J., and Siméon, J. XML path language (XPath) 2.0. http://www.w3.org/TR/xpath20/.Google Scholar
- Boag, S., Chamberlin, D., Fernández, M. F., Florescu, D., Robie, J., and Siméon, J. XQuery 1.0: An XML query language. http://www.w3.org/TR/xquery/.Google Scholar
- Bow, C., Hughes, B., and Bird, S. 2003. Towards a general model of interlinear text. In Proceedings of EMELD Workshop.Google Scholar
- Bruno, N., Gravano, L., Koudas, N., and Srivastava, D. 2003. Navigation- vs. index-based XML multi-query processing. In Proceedings of the 19th IEEE International Conference on Data Engineering. 139--150.Google Scholar
- Bruno, N., Koudas, N., and Srivastava, D. 2002. Holistic twig joins: Optimal XML pattern matching. In Proceedings of the 2002 ACM-SIGMOD Conference. Google Scholar
Digital Library
- Candan, K. S., Hsiung, W.-P., Chen, S., Tatemura, J., and Agrawal, D. 2006. AFilter: Adaptable XML filtering with prefix-caching and suffix-clustering. In Proceedings of the 32nd VLDB Conference. 559--570. Google Scholar
Digital Library
- Carzaniga, A., Rutherford, M. J., and Wolf, A. L. 2004. A routing scheme for content-based networking. In Proceedings of IEEE InfoCom 2004. 918--928.Google Scholar
- Castro, M., Druschel, P., marie Kermarrec, A., and Rowstron, A. 2002. Scribe: A large-scale and decentralized application-level multicast infrastructure. IEEE J. Select. Areas Comm. 20, 8, 1489--1499. Google Scholar
Digital Library
- Chan, C. Y., Felber, P., Garofalakis, M. N., and Rastogi, R. 2002a. Efficient filtering of XML documents with XPath expressions. In Proceedings of the 18th IEEE International Conference on Data Engineering. 235--244. Google Scholar
Digital Library
- Chan, C. Y., Felber, P., Garofalakis, M. N., and Rastogi, R. 2002b. Efficient filtering of XML documents with XPath expressions. VLDB J. 11, 4, 354--379. Google Scholar
Digital Library
- Chan, C. Y. and Ni, Y. 2007. Efficient XML data dissemination with piggybacking. In Proceedings of the ACM-SIGMOD Conference. 737--748. Google Scholar
Digital Library
- Chandramouli, B., Phillips, J., and Yang, J. 2007. Value-Based notification conditions in large-scale publish/subscribe systems. In Proceedings of the 33rd VLDB Conference. 878--889. Google Scholar
Digital Library
- Chen, S., Li, H.-G., Tatemura, J., Hsiung, W.-P., Agrawal, D., and Candan, K. S. 2006a. Twig2 Stack: Bottom-Up processing of generalized-tree-pattern queries over XML documents. In Proceedings of the 32nd VLDB Conference. 283--294. Google Scholar
Digital Library
- Chen, Y., Davidson, S. B., and Zheng, Y. 2006b. An efficient XPath query processor for XML streams. In Proceedings of the 22nd IEEE International Conference on Data Engineering. 79. Google Scholar
Digital Library
- Chiu, A.-T. and Hsu, J.-L. 2006. An automaton-based filtering system for streaming musicxml. In Proceedings of the International Conference on Semantic Web&Web Services. 177--178.Google Scholar
- Clark, J. 1999. XSL transformations (XSLT) version 1.0. http://www.w3.org/TR/xslt/.Google Scholar
- Diao, Y., Altinel, M., Michael J. Franklin, H. Z., and Fischer, P. 2003. Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans. Datab. Syst. 28, 4, 467--516. Google Scholar
Digital Library
- Diao, Y., Rizvi, S., and Franklin, M. J. 2004. Towards an Internet-scale xml dissemination service. In Proceedings of the International Conference on Very Large Databases. 612--623. Google Scholar
Digital Library
- Diaz, A. L. and Lovell, D. 1999. XML generator. http://www.alphaworks.ibm.com/tech/xmlgenerator.Google Scholar
- Fenner, W. and Srivastava, D. 2005. XTreeNet: Scalable overlay networks for XML content dissemination and querying. In Proceedings of the 10th International Workshop on Web Content Caching and Distribution. 41--46. Google Scholar
Digital Library
- Gong, X., Yan, Y., Qian, W., and Zhou, A. 2005. Bloom filter-based XML packets filtering for millions of path queries. In Proceedings of the 21st IEEE International Conference on Data Engineering. 890--901. Google Scholar
Digital Library
- Green, T. J., Gupta, A., Miklau, G., Onizuka, M., and Suciu, D. 2004. Processing XML streams with deterministic automata and stream indexes. ACM Trans. Datab. Syst. 29, 4, 752--788. Google Scholar
Digital Library
- Green, T. J., Miklau, G., Onizuka, M., and Suciu, D. 2003. Processing XML streams with deterministic automata. In Proceedings of the 9th International Conference on Database Theory. 173--189. Google Scholar
Digital Library
- Gupta, A. K. and Suciu, D. 2003. Stream processing of XPath queries with predicates. In Proceedings of the ACM-SIGMOD Conference. ACM Press, 419--430. Google Scholar
Digital Library
- He, B., Luo, Q., and Choi, B. 2005. Cache-Conscious automata for XML filtering. In Proceedings of the 21st IEEE International Conference on Data Engineering. 878--889. Google Scholar
Digital Library
- He, B., Luo, Q., and Choi, B. 2006. Cache-Conscious automata for XML filtering. IEEE Trans. Knowl. Data Eng. 18, 12, 1629--1644. Google Scholar
Digital Library
- Hong, M., Demers, A. J., Gehrke, J., Koch, C., Riedewald, M., and White, W. M. 2007. Massively multi-query join processing in publish/subscribe systems. In Proceedings of the ACM-SIGMOD Conference. 761--772. Google Scholar
Digital Library
- Hou, S. and Jacobsen, H.-A. 2006. Predicate-Based filtering of XPath expressions. In Proceedings of the 22nd IEEE International Conference on Data Engineering. 53. Google Scholar
Digital Library
- Kwon, J., Rao, P., Moon, B., and Lee, S. 2005. FiST: Scalable XML document filtering by sequencing twig patterns. In Proceedings of the 31st VLDB Conference. 217--228. Google Scholar
Digital Library
- Kwon, J., Rao, P., Moon, B., and Lee, S. 2007. Value-Based predicate filtering of streaming XML data. In Proceedings of the 1st International Workshop on Data Management in Ubiquitous Computing. 266--271.Google Scholar
- Kwon, J., Rao, P., Moon, B., and Lee, S. 2008. Value-Based predicate filtering of XML documents. Data Knowl. Eng. 67, 1, 51--73. Google Scholar
Digital Library
- Lewis, W. D. Personal communications. http://zimmer.csufresno.edu/~wlewis/.Google Scholar
- Li, G., Hou, S., and Jacobsen, H.-A. 2008. Routing of XML and XPath queries in data dissemination networks. In Proceedings of the 28th International Conference on Distributed Computing Systems. 627--638. Google Scholar
Digital Library
- Li, Q. and Moon, B. 2001. Indexing and querying XML data for regular path expressions. In Proceedings of the 27th VLDB Conference. 361--370. Google Scholar
Digital Library
- Lu, J., Chen, T., and Ling, T. W. 2004. Efficient processing of xml twig patterns with parent child edges: A look-ahead approach. In Proceedings of ACM the 13th International Conference on Information and Knowledge Management. 533--542. Google Scholar
Digital Library
- Ludäscher, B., Mukhopadhyay, P., and Papakonstantinou, Y. 2002. A transducer-based XML query processor. In Proceedings of the 28th VLDB Conference. 227--238. Google Scholar
Digital Library
- Megginson, D. Simple API for XML. http://sax.sourceforge.net/.Google Scholar
- Miliaraki, I., Kaoudi, Z., and Koubarakis, M. 2008. Xml data dissemination using automata on top of structured overlay networks. In Proceedings of the 17th International World Wide Web Conference. ACM, New York, 865--874. Google Scholar
Digital Library
- Milo, T., Zur, T., and Verbin, E. 2007. Boosting topic-based publish-subscribe systems with dynamic clustering. In Proceedings of the ACM-SIGMOD Conference. 749--760. Google Scholar
Digital Library
- Moro, M. M., Bakalov, P., and Tsotras, V. J. 2007. Early profile pruning on XML-aware publish-subscribe systems. In Proceedings of the 33rd VLDB Conference. 866--877. Google Scholar
Digital Library
- Müller, K. 2004. Semi-Automatic construction of a question treebank. In Proceedings of the 4th International Conference on Language Resources and Evaluation.Google Scholar
- MusicXML. MusicXML definition. http://www.recordare.com//xml.html.Google Scholar
- NITF. NITF: News industry text format. http://www.nitf.org/.Google Scholar
- Peng, F. and Chawathe, S. S. 2003. XPath queries on streaming data. In Proceedings of the ACM-SIGMOD Conference. ACM Press, 431--442. Google Scholar
Digital Library
- Prüfer, H. 1918. Neuer Beweis eines satzes über permutationen. Archiv für Mathematik und Physik 27, 142--144.Google Scholar
- Ramasubramanian, V., Peterson, R., and Sirer, E. G. 2006. Corona: A high performance publish-subscribe system for the World Wide Web. In Proceedings of the 3rd Conference on Networked Systems Design&Implementation (NSDI'06). USENIX Association, 2--2. Google Scholar
Digital Library
- Rao, P., Cappos, J., Khare, V., Moon, B., and Zhang, B. 2007. Net-x: Unified data-centric Internet services. In Proceedings of 3rd International Workshop on Networking Meets Databases (NetDB'07). Google Scholar
Digital Library
- Rao, P. and Moon, B. 2004. PRIX: Indexing and querying XML using Prüfer sequences. In Proceedings of the 20th IEEE International Conference on Data Engineering. 288--299. Google Scholar
Digital Library
- Rao, P. and Moon, B. 2006. Sequencing XML data and query twigs for fast pattern matching. ACM Trans. Datab. Syst. 31, 1, 299--345. Google Scholar
Digital Library
- Shah, R., Ramzan, Z., Jain, R., Dendukuri, R., and Anjum, F. 2004. Efficient dissemination of personalized information using content-based multicast. IEEE Trans. Mobile Comput. 3, 4, 394--408. Google Scholar
Digital Library
- Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., and Myllymaki, J. 2004. Implementing a scalable XML publish/subscribe system using a relational database system. In Proceedings of the ACM-SIGMOD Conference. 479--490. Google Scholar
Digital Library
- Treebank. The Penn treebank project. http://www.cis.upenn.edu/~treebank/.Google Scholar
- XMark. XMark - An XML benchmark project. http://www.xml-benchmark.org/.Google Scholar
- Yan, T. W. and Garcia-Molina, H. 1999. The SIFT information dissemination system. ACM Trans. Datab. Syst. 24, 4, 529--565. Google Scholar
Digital Library
Index Terms
Fast XML document filtering by sequencing twig patterns
Recommendations
Value-based predicate filtering of XML documents
In recent years, publish-subscribe systems based on XML filtering have received much attention in ubiquitous computing environments and Internet applications. The main challenge is to process a large number of content against millions of user ...
Sequencing XML data and query twigs for fast pattern matching
We propose a new way of indexing XML documents and processing twig patterns in an XML database. Every XML document in the database can be transformed into a sequence of labels by prüfer's method that constructs a one-to-one correspondence between trees ...
XML filtering with XPath expressions containing parent and ancestor axes
More and more XML data is generated and used for data exchange. In this paper, we address the problem of filtering XML documents with large number of XPath expressions, which may contain 'ancestor' and 'parent' axes. XPath expressions with these axes ...






Comments