Abstract
Publish/subscribe systems have emerged in recent years as a promising paradigm for offering various popular notification services. In this context, many XML filtering systems have been proposed to efficiently identify XML data that matches user interests expressed as queries in an XML query language like XPath. However, in order to offer XML filtering functionality on an Internet-scale, we need to deploy such a service in a distributed environment, avoiding bottlenecks that can deteriorate performance. In this work, we design and implement FoXtrot, a system for filtering XML data that combines the strengths of automata for efficient filtering and distributed hash tables for building a fully distributed system. Apart from structural-matching, performed using automata, we also discuss different methods for evaluating value-based predicates. We perform an extensive experimental evaluation of our system, FoXtrot, on a local cluster and on the PlanetLab network and demonstrate that it can index millions of user queries, achieving a high indexing and filtering throughput. At the same time, FoXtrot exhibits very good load-balancing properties and improves its performance as we increase the size of the network.
- Aberer, K., Cudr_e-Mauroux, P., Datta, A., Despotovic, Z., Hauswirth, M., Punceva, M., and Schmidt, R. 2003. P-Grid: A self-organizing structured P2P system. SIGMOD Record 32, 3, 29--33. Google Scholar
Digital Library
- Abiteboul, S., Manolescu, I., Polyzotis, N., Preda, N., and Sun, C. 2008. XML processing in DHT networks. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE'08). IEEE, Los Alamitos, CA, 606--615. Google Scholar
Digital Library
- Aekaterinidis, I. and Triantafillou, P. 2006. PastryStrings: A comprehensive content-based publish/subscribe DHT network. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06). IEEE, Los Alamitos, CA, 23--. Google Scholar
Digital Library
- Altinel, M. and Franklin, M. J. 2000. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB'00). Morgan Kaufmann, San Francisco, CA, 53--64. Google Scholar
Digital Library
- Aspnes, J. and Shah, G. 2003. Skip graphs. In Proceedings of the14th Annual ACM-SIAM Symposium on Discrete algorithms (SODA'03). SIAM, Philadelphia, PA, 384--393. Google Scholar
Digital Library
- Balakrishnan, H., Kaashoek, M. F., Karger, D., Morris, R., and Stoica, I. 2003. Looking up data in p2p systems. Comm. ACM 46, 43--48. Google Scholar
Digital Library
- Barbosa, D., Mignet, L., and Veltri, P. 2006. Studying the XML Web: Gathering statistics from an XML sample. World Wide Web 9, 2, 187--212. Google Scholar
Digital Library
- Bonifati, A., Matrangolo, U., Cuzzocrea, A., and Jain, M. 2004. XPath lookup queries in P2P networks. In Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management (WIDM'04). ACM, New York, 48--55. Google Scholar
Digital Library
- Bruno, N., Gravano, L., Koudas, N., and Srivastava, D. 2003. Navigation- vs. index-based XML multiquery processing. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). IEEE, Los Alamitos, CA, 139--150.Google Scholar
- Chan, C. Y., Felber, P., Garofalakis, M. N., and Rastogi, R. 2002. Efficient Filtering of XML documents with XPath expressions. In Proceedings of the 18th International Conference on Data Engineering (ICDE'02). IEEE, Los Alamitos, CA, 235. Google Scholar
Digital Library
- Chan, C. Y. and Ni, Y. 2007. Efficient XML Data dissemination with piggybacking. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, 737--748. Google Scholar
Digital Library
- Chand, R. and Felber, P. 2008. Scalable distribution of XML content with XNet. IEEE Trans. Parallel Distrib. Syst. 19, 4, 447--461. Google Scholar
Digital Library
- Chand, R. and Felber, P. A. 2003. A scalable protocol for content-based routing in overlay networks. In Proceedings of the 2nd IEEE International Symposium on Network Computing and Applications (NCA'03). IEEE, Los Alamitos, CA, 123--. Google Scholar
Digital Library
- Clark, J. and DeRose, S. J. 1999. XML path language (XPath). Version 1.0. World Wide Web Consortium, Recommendation. http://www.w3.org/TR/xpath.Google Scholar
- Consens, M. P. and Milo, T. 1994. Optimizing queries on files. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'94). ACM, New York, 301--312. Google Scholar
Digital Library
- Diao, Y., Altinel, M., Franklin, M. J., Zhang, H., and Fischer, P. 2003. Path sharing and predicate evaluation for high-performance XML ffltering. ACM Trans. Datab. Syst. 28, 4, 467--516. Google Scholar
Digital Library
- Diao, Y., Rizvi, S., and Franklin, M. J. 2004. Towards an internet-scale XML dissemination service. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'04). VLDB Endowment, 612--623. Google Scholar
Digital Library
- Felber, P., Chan, C., Garofalakis, M., and Rastogi, R. 2003. Scalable filtering of XML data for Web services. IEEE Internet Comput 7, 1, 49--57. Google Scholar
Digital Library
- Fenner, W., Rabinovich, M., Ramakrishnan, K. K., Srivastava, D., and Zhang, Y. 2005. XTreeNet: Scalable overlay networks for XML content dissemination and querying (synopsis). In Proceedings of the 10th International Workshop on Web Content Caching and Distribution (WCW'05). IEEE, Los Alamitos, CA, 41--46. Google Scholar
Digital Library
- FreePastry release 2009. FreePastry 2.1 release. http://www.freepastry.org/FreePastry/.Google Scholar
- Galanis, L., Wang, Y., Jeffery, S., and DeWitt, D. J. 2003. Locating data sources in large distributed systems. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'03). VLDB Endowment,874--885. Google Scholar
Digital Library
- Gong, X., Qian, W., Yan, Y., and Zhou, A. 2005. Bloom filter-based XML packets filtering for millions of path queries. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 890--901. Google Scholar
Digital Library
- Gupta, A. K. and Suciu, D. 2003. Stream processing of XPath queries with predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). ACM, New York, 419--430. Google Scholar
Digital Library
- Hopcroft, J. E., Motwani, R., Rotwani, and Ullman, J. D. 2000. Introduction to Automata Theory, Languages and Computability. Addison-Wesley, Boston, MA. Google Scholar
Digital Library
- Hou, S. and Jacobsen, H. A. 2006. Predicate-based filtering of XPath expressions. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). IEEE, Los Alamitos, CA, 53--. Google Scholar
Digital Library
- IBM XML.1999. Generator 1999. IBM XML Generator. http://www.alphaworks.ibm.com/xmlgenerator.Google Scholar
- Jagadish, H. V., Ooi, B. C., Tan, K., and Vu, Q. H. 2005. BATON:A balanced tree structure for peer-to-peer networks. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). VLDB Endowment, 661--672. Google Scholar
Digital Library
- Jagadish, H. V., Ooi, B. C., Tan, K., Vu, Q. H., and Zhang, R. 2006. Speeding up search in peer-to-peer networks with a multi-way tree structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). ACM, New York, 1--12. Google Scholar
Digital Library
- Kannan, J., Yang, B., Shenker, S., Sharma, P., Banerjee, S., Basu, S., and Ju Lee, S. 2006. Smartseer: Using a dht to process continuous queries over peer-to-peer networks. In Proceedings of the IEEE INFOCOM.Google Scholar
Cross Ref
- Koloniari, G. and Pitoura, E. 2004. Content-based routing of path queries in peer-to-peer systems. In Proceedings of the Advances in Database Technology (EDBT'04). Springer, 29--47.Google Scholar
- Liarou, E., Idreos, S., and Koubarakis, M. 2006. Evaluating conjunctive triple pattern queries over large structured overlay networks. In Proceedings of the International Semantic Web Conference. 399--413. Google Scholar
Digital Library
- Lua, E. K., Crowcroft, J., Pias, M., Sharma, R., and Lim, S. 2005. A survey and comparison of peer-to-peer overlay network schemes. IEEE Comm. Surv. Tutorials, 7, 2, 72--93. Google Scholar
Digital Library
- Manola, F. and Miller, E. 2004. RDF primer: W3c recommendation. Decision Support Systems.Google Scholar
- Miliaraki, I. 2011. Distributed filtering and dissemination of XML data in peer-to-peer systems. Ph.D. dissertations, Department of Informatics and Telecommunications, National and Kapodistrian, University of Athens.Google Scholar
- Miliaraki, I., Kaoudi, Z., and Koubarakis, M. 2008. XML data dissemination using automata on top of structured overlay networks. In Proceedings of the 17th International World Wide Web Conference (WWW'08). ACM, New York, 865--874. Google Scholar
Digital Library
- Miliaraki, I. and Koubarakis, M. 2010. Distributed structural and value XML filtering. In Proceedings of the 4th ACM International Conference on Distributed Event-Based Systems (DEBS'10). ACM, New York, 2--13. Google Scholar
Digital Library
- Moro, M. M., Bakalov, P., and Tsotras, V. J. 2007. Early profile pruning on XML-aware publish/subscribe systems. In Proceedings of the 33rd International Conference on Very large Data Bases (VLDB'07). VLDB Endowment, 866--877. Google Scholar
Digital Library
- Papaemmanouil, O. and Cetintemel, U. 2005. SemCast: Semantic multicast for content-based data dissemination. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 242--253. Google Scholar
Digital Library
- Perez, J., Arenas, M., and Gutierrez, C. 2010. nSPARQL: A navigational language for RDF. Web Semant. 8, 255--270. Google Scholar
Digital Library
- Ramabhadran, S., Ratnasamy, S., Hellerstein, J. M., and Shenker, S. 2004. Brief announcement: Prefix hash tree. In Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing (PODC'04). ACM, New York, 368--368. Google Scholar
Digital Library
- Ramasubramanian, V., Peterson, R., and Sirer, E. G. 2006. Corona: A high performance publish/subscribe system for the World Wide Web. In Proceedings of the 3rd Conference on Networked Systems Design & Implementation (NSDI'06). Vol. 3. USENIX Association, Berkeley, CA, 2--2. Google Scholar
Digital Library
- Rao, P. R. and Moon, B. 2009. Locating XML documents in a peer-to-peer network using distributed hash tables. IEEE Trans. Knowl. Data Eng. 21, 12, 1737--1752. Google Scholar
Digital Library
- Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. 2001. A scalable content addressable network. SIGCOMM Comput. Commun. Rev. 31, 161--172. Google Scholar
Digital Library
- Rhea, S., Chun, B.-G., Kubiatowicz, J., and Shenker, S. 2005. Fixing the embarrassing slowness of OpenDHT on PlanetLab. In Proceedings of the 2nd Conference on Real, Large Distributed Systems (WORLDS'05). Vol. 2, USENIX Association, Berkeley, CA, 25--30. Google Scholar
Digital Library
- Rowstron, A. and Druschel, P. 2001. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed System Platforms (Middleware '01). Springer, Berlin, 329--350. Google Scholar
Digital Library
- Skobeltsyn, G., Hauswirth, M., and Aberer, K. 2005. Efficient processing of XPath queries with structured overlay networks. In Proceedings of the OTM Conferences. 1243--1260. Google Scholar
Digital Library
- Snoeren, A. C., Conley, K., and Gifford, D. K. 2001. Mesh-based content routing using XML. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 160--173. Google Scholar
Digital Library
- Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'01). ACM, New York, 149--160. Google Scholar
Digital Library
- Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., and Myllymaki, J. 2004. Implementing a scalable XML publish/subscribe system using relational database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). ACM, New York, 479--490. Google Scholar
Digital Library
- Tryfonopoulos, C., Idreos, S., and Koubarakis, M. 2005. Publish/subscribe functionality in IR environments using structured overlay networks. In Proceedings of the SIGIR. 322--329. Google Scholar
Digital Library
- Uchiyama, H., Onizuka, M., and Honishi, T. 2005. Distributed XML stream filtering system with high scalability. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 968--977. Google Scholar
Digital Library
- Voulgaris, S., Riviere, E., Kermarrec, A.-M., and van Steen, M. 2006. Sub-2-sub: Self-organizing content-based publish/subscribe for dynamic large scale collaborative networks. In Proceedings of the IPTPS.Google Scholar
- XMark 2001. XMark: An XML benchmark project. http://www.xml-benchmark.org/.Google Scholar
- YFilter release. 2004. YFilter 1.0 release. http://yfilter.cs.umass.edu/code_release.htm.Google Scholar
- Zhang, C., Krishnamurthy, A., and Wang, R. Y. 2005. Brushwood: Distributed trees in peer-to-peer systems. In Peer-to-Peer Systems IV, 4th International Workshop (IPTPS'05). Lecture Notes in Computer Science, Vol. 3640, Springer, Berlin, 47--57. Google Scholar
Digital Library
- Zhou, A., Qian, W., Gong, X., and Zhou, M. 2007. Sonnet: An efficient distributed content-based dissemination broker (poster paper). In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, 1094--1096. Google Scholar
Digital Library
- Zhou, M. and Wu, Y. 2010. XML-based RDF data management for efficient query processing. In Proceedings of the 13th International Workshop on the Web and Databases (WebDB'10). ACM, New York, 3:1--3:6. Google Scholar
Digital Library
Index Terms
FoXtrot: Distributed structural and value XML filtering
Recommendations
Cache-Conscious Automata for XML Filtering
Hardware cache behavior is an important factor in the performance of memory-resident, data-intensive systems such as XML filtering engines. A key data structure in several recent XML filters is the automaton, which is used to represent the long-running ...
Efficient string-based XML stream prefiltering
ADC '12: Proceedings of the Twenty-Third Australasian Database Conference - Volume 124Whenever huge XML documents have to be evaluated according to a given XPath or XQuery query, parsing the whole document in form of e. g. SAX events is the baseline that is common to all evaluators. But typically only few parts of the document are really ...
Exploiting the synergy between gossiping and structured overlays
Gossip-based computer networkingIn this position paper we argue for exploiting the synergy between gossip-based algorithms and structured overlay networks (SON). These two strands of research have both aimed at building fault-tolerant, dynamic, self-managing, and large-scale ...






Comments