ABSTRACT
We present a novel approach for filtering XML documents using nondeterministic finite automata and distributed hash tables. Our approach differs architecturally from recent proposals that deal with distributed XML filtering; they assume an XML broker architecture, whereas our solution is built on top of distributed hash tables. The essence of our work is a distributed implementation of YFilter, a state-of-the-art automata-based XML filtering system on top of Chord. We experimentally evaluate our approach and demonstrate that our algorithms can scale to millions of XPath queries under various filtering scenarios, and also exhibit very good load balancing properties.
- DBLP XML records. http://dblp.uni-trier.de/xml/.Google Scholar
- IBM XML Generator. http://www.alphaworks.ibm.com/tech/xmlgenerator.Google Scholar
- XMark: An XML Benchmark Project. http://www.xml-benchmark.org/.Google Scholar
- YFilter 1.0 release. http://yfilter.cs.umass.edu/code_release.htm.Google Scholar
- M. Altinel and M. J. Franklin. Efficient Filtering of XML Documents for Selective Dissemination of Information. In VLDB 2000. Google Scholar
Digital Library
- D. Barbosa, L. Mignet, and P. Veltri. Studying the XML Web: Gathering Statistics from an XML Sample. World Wide Web, 9(2):187--212, 2006. Google Scholar
Digital Library
- A. R. Bharambe, M. Agrawal, and S. Seshan. Mercury: Supporting Scalable Multi-attribute Range Queries. In SIGCOMM 2004. Google Scholar
Digital Library
- A. Bonifati, U. Matrangolo, A. Cuzzocrea, and M. Jain. XPath Lookup Queries in P2P Networks. In WIDM 2004. Google Scholar
Digital Library
- N. Bruno, L. Gravano, N. Koudas, and D. Srivastava. Navigation- vs. Index-Based XML Multi-Query Processing. In ICDE 2003.Google Scholar
- C. Y. Chan, P. Felber, M. N. Garofalakis, and R. Rastogi. Efficient Filtering of XML Documents with XPath Expressions. In ICDE 2002.Google Scholar
- C. Y. Chan and Y. Ni. Efficient XML Data Dissemination with Piggybacking. In SIGMOD 2007. Google Scholar
Digital Library
- R. Chand and P. A. Felber. A Scalable Protocol for Content-Based Routing in Overlay Networks. In NCA 2003. Google Scholar
Digital Library
- J. Clark and S. J. DeRose. XML Path Language (XPath) Version 1.0. World Wide Web Consortium, Recommendation, November 1999.Google Scholar
- M. P. Consens and T. Milo. Optimizing Queries on Files. In SIGMOD 1994. Google Scholar
Digital Library
- Y. Diao, M. Altinel, M. J. Franklin, H. Zhang, and P. Fischer. Path Sharing and Predicate Evaluation for High-Performance XML Filtering. ACM TODS, 28(4):467--516, 2003. Google Scholar
Digital Library
- Y. Diao, S. Rizvi, and M. J. Franklin. Towards an Internet-Scale XML Dissemination Service. In VLDB 2004. Google Scholar
Digital Library
- P. Felber, C.-Y. Chan, M. Garofalakis, and R. Rastogi. Scalable Filtering of XML Data for Web Services. IEEE Internet Computing, 7(1):49--57, 2003. Google Scholar
Digital Library
- D. Florescu, C. Hillery, D. Kossmann, P. Lucas, F. Riccardi, T. Westmann, J. Carey, and A. Sundararajan. The BEA Streaming XQuery Processor. The VLDB Journal, 13(3):294--315, 2004. Google Scholar
Digital Library
- L. Galanis, Y. Wang, S. Jeffery, and D. J. DeWitt. Locating Data Sources in Large Distributed Systems. In VLDB 2003. Google Scholar
Digital Library
- X. Gong, W. Qian, Y. Yan, and A. Zhou. Bloom Filter-Based XML Packets Filtering for Millions of Path Queries. In ICDE 2005. Google Scholar
Digital Library
- T. J. Green, A. Gupta, G. Miklau, M. Onizuka, and D. Suciu. Processing XML Streams with Deterministic Automata and Stream Indexes. ACM Trans. Database Syst., 29(4):752--788, 2004. Google Scholar
Digital Library
- A. Gupta, O. D. Sahin, D. Agrawal, and A. E. Abbadi. Meghdoot: Content-based publish/subscribe over P2P networks. In Middleware 2004. Google Scholar
Digital Library
- A. K. Gupta and D. Suciu. Stream Processing of XPath Queries with Predicates. In SIGMOD 2003. Google Scholar
Digital Library
- J. E. Hopcroft, R. Motwani, Rotwani, and J. D. Ullman. Introduction to Automata Theory, Languages and Computability. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, 2000. Google Scholar
Digital Library
- S. Hou and H.-A. Jacobsen. Predicate-based Filtering of XPath Expressions. In ICDE 2006. Google Scholar
Digital Library
- G. Koloniari and E. Pitoura. Content-based Routing of Path Queries in Peer-to-Peer Systems. In EDBT 2004.Google Scholar
- M. M. Moro, P. Bakalov, and V. J. Tsotras. Early Profile Pruning on XML-aware Publish/Subscribe Systems. In VLDB 2007. Google Scholar
Digital Library
- F. Peng and S. S. Chawathe. XPath queries on streaming data. In SIGMOD 2003. Google Scholar
Digital Library
- A. C. Snoeren, K. Conley, and D. K. Gifford. Mesh-Based Content Routing using XML. SOSP 2001, 35(5):160--173, 2001. Google Scholar
Digital Library
- I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications. In SIGCOMM 2001. Google Scholar
Digital Library
- C. Tryfonopoulos, S. Idreos, and M. Koubarakis. Publish/Subscribe Functionality in IR Environments using Structured Overlay Networks. In SIGIR 2005. Google Scholar
Digital Library
- H. Uchiyama, M. Onizuka, and T. Honishi. Distributed XML Stream Filtering System with High Scalability. In ICDE 2005. Google Scholar
Digital Library
- A. Zhou, W. Qian, X. Gong, and M. Zhou. Sonnet: An Efficient Distributed Content-Based Dissemination Broker (Poster paper). In SIGMOD 2007. Google Scholar
Digital Library
Index Terms
Xml data dissemination using automata on top of structured overlay networks
Recommendations
Key-based consistency and availability in structured overlay networks
HPDC '08: Proceedings of the 17th international symposium on High performance distributed computingStructured Overlay Networks (SONs) provide a promising platform for high performance applications since they are scalable, fault-tolerant and self-managing. SONs provide lookup services that map keys to nodes that can be used as processing or storage ...
Validating XML Constraints Using Automata
ICIS '09: Proceedings of the 2009 Eigth IEEE/ACIS International Conference on Computer and Information ScienceIn this paper, we address the problem of validating constraints in XML. In particular, we concentrate on the three commonly discussed types of constraints, functional dependencies, keys and foreign keys. Unranked bottom-up tree automata can be used to ...
Exploiting the synergy between gossiping and structured overlays
Gossip-based computer networkingIn this position paper we argue for exploiting the synergy between gossip-based algorithms and structured overlay networks (SON). These two strands of research have both aimed at building fault-tolerant, dynamic, self-managing, and large-scale ...





Comments