skip to main content
research-article

FoXtrot: Distributed structural and value XML filtering

Authors Info & Claims
Published:02 October 2012Publication History
Skip Abstract Section

Abstract

Publish/subscribe systems have emerged in recent years as a promising paradigm for offering various popular notification services. In this context, many XML filtering systems have been proposed to efficiently identify XML data that matches user interests expressed as queries in an XML query language like XPath. However, in order to offer XML filtering functionality on an Internet-scale, we need to deploy such a service in a distributed environment, avoiding bottlenecks that can deteriorate performance. In this work, we design and implement FoXtrot, a system for filtering XML data that combines the strengths of automata for efficient filtering and distributed hash tables for building a fully distributed system. Apart from structural-matching, performed using automata, we also discuss different methods for evaluating value-based predicates. We perform an extensive experimental evaluation of our system, FoXtrot, on a local cluster and on the PlanetLab network and demonstrate that it can index millions of user queries, achieving a high indexing and filtering throughput. At the same time, FoXtrot exhibits very good load-balancing properties and improves its performance as we increase the size of the network.

References

  1. Aberer, K., Cudr_e-Mauroux, P., Datta, A., Despotovic, Z., Hauswirth, M., Punceva, M., and Schmidt, R. 2003. P-Grid: A self-organizing structured P2P system. SIGMOD Record 32, 3, 29--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Abiteboul, S., Manolescu, I., Polyzotis, N., Preda, N., and Sun, C. 2008. XML processing in DHT networks. In Proceedings of the 24th IEEE International Conference on Data Engineering (ICDE'08). IEEE, Los Alamitos, CA, 606--615. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Aekaterinidis, I. and Triantafillou, P. 2006. PastryStrings: A comprehensive content-based publish/subscribe DHT network. In Proceedings of the 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06). IEEE, Los Alamitos, CA, 23--. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Altinel, M. and Franklin, M. J. 2000. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the 26th International Conference on Very Large Data Bases (VLDB'00). Morgan Kaufmann, San Francisco, CA, 53--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Aspnes, J. and Shah, G. 2003. Skip graphs. In Proceedings of the14th Annual ACM-SIAM Symposium on Discrete algorithms (SODA'03). SIAM, Philadelphia, PA, 384--393. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Balakrishnan, H., Kaashoek, M. F., Karger, D., Morris, R., and Stoica, I. 2003. Looking up data in p2p systems. Comm. ACM 46, 43--48. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Barbosa, D., Mignet, L., and Veltri, P. 2006. Studying the XML Web: Gathering statistics from an XML sample. World Wide Web 9, 2, 187--212. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Bonifati, A., Matrangolo, U., Cuzzocrea, A., and Jain, M. 2004. XPath lookup queries in P2P networks. In Proceedings of the 6th Annual ACM International Workshop on Web Information and Data Management (WIDM'04). ACM, New York, 48--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Bruno, N., Gravano, L., Koudas, N., and Srivastava, D. 2003. Navigation- vs. index-based XML multiquery processing. In Proceedings of the 19th International Conference on Data Engineering (ICDE'03). IEEE, Los Alamitos, CA, 139--150.Google ScholarGoogle Scholar
  10. Chan, C. Y., Felber, P., Garofalakis, M. N., and Rastogi, R. 2002. Efficient Filtering of XML documents with XPath expressions. In Proceedings of the 18th International Conference on Data Engineering (ICDE'02). IEEE, Los Alamitos, CA, 235. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Chan, C. Y. and Ni, Y. 2007. Efficient XML Data dissemination with piggybacking. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, 737--748. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Chand, R. and Felber, P. 2008. Scalable distribution of XML content with XNet. IEEE Trans. Parallel Distrib. Syst. 19, 4, 447--461. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Chand, R. and Felber, P. A. 2003. A scalable protocol for content-based routing in overlay networks. In Proceedings of the 2nd IEEE International Symposium on Network Computing and Applications (NCA'03). IEEE, Los Alamitos, CA, 123--. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Clark, J. and DeRose, S. J. 1999. XML path language (XPath). Version 1.0. World Wide Web Consortium, Recommendation. http://www.w3.org/TR/xpath.Google ScholarGoogle Scholar
  15. Consens, M. P. and Milo, T. 1994. Optimizing queries on files. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'94). ACM, New York, 301--312. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Diao, Y., Altinel, M., Franklin, M. J., Zhang, H., and Fischer, P. 2003. Path sharing and predicate evaluation for high-performance XML ffltering. ACM Trans. Datab. Syst. 28, 4, 467--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Diao, Y., Rizvi, S., and Franklin, M. J. 2004. Towards an internet-scale XML dissemination service. In Proceedings of the 20th International Conference on Very Large Data Bases (VLDB'04). VLDB Endowment, 612--623. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Felber, P., Chan, C., Garofalakis, M., and Rastogi, R. 2003. Scalable filtering of XML data for Web services. IEEE Internet Comput 7, 1, 49--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Fenner, W., Rabinovich, M., Ramakrishnan, K. K., Srivastava, D., and Zhang, Y. 2005. XTreeNet: Scalable overlay networks for XML content dissemination and querying (synopsis). In Proceedings of the 10th International Workshop on Web Content Caching and Distribution (WCW'05). IEEE, Los Alamitos, CA, 41--46. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. FreePastry release 2009. FreePastry 2.1 release. http://www.freepastry.org/FreePastry/.Google ScholarGoogle Scholar
  21. Galanis, L., Wang, Y., Jeffery, S., and DeWitt, D. J. 2003. Locating data sources in large distributed systems. In Proceedings of the 29th International Conference on Very Large Data Bases (VLDB'03). VLDB Endowment,874--885. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Gong, X., Qian, W., Yan, Y., and Zhou, A. 2005. Bloom filter-based XML packets filtering for millions of path queries. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 890--901. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Gupta, A. K. and Suciu, D. 2003. Stream processing of XPath queries with predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'03). ACM, New York, 419--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Hopcroft, J. E., Motwani, R., Rotwani, and Ullman, J. D. 2000. Introduction to Automata Theory, Languages and Computability. Addison-Wesley, Boston, MA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Hou, S. and Jacobsen, H. A. 2006. Predicate-based filtering of XPath expressions. In Proceedings of the 22nd International Conference on Data Engineering (ICDE'06). IEEE, Los Alamitos, CA, 53--. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. IBM XML.1999. Generator 1999. IBM XML Generator. http://www.alphaworks.ibm.com/xmlgenerator.Google ScholarGoogle Scholar
  27. Jagadish, H. V., Ooi, B. C., Tan, K., and Vu, Q. H. 2005. BATON:A balanced tree structure for peer-to-peer networks. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB'05). VLDB Endowment, 661--672. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Jagadish, H. V., Ooi, B. C., Tan, K., Vu, Q. H., and Zhang, R. 2006. Speeding up search in peer-to-peer networks with a multi-way tree structure. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'06). ACM, New York, 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kannan, J., Yang, B., Shenker, S., Sharma, P., Banerjee, S., Basu, S., and Ju Lee, S. 2006. Smartseer: Using a dht to process continuous queries over peer-to-peer networks. In Proceedings of the IEEE INFOCOM.Google ScholarGoogle ScholarCross RefCross Ref
  30. Koloniari, G. and Pitoura, E. 2004. Content-based routing of path queries in peer-to-peer systems. In Proceedings of the Advances in Database Technology (EDBT'04). Springer, 29--47.Google ScholarGoogle Scholar
  31. Liarou, E., Idreos, S., and Koubarakis, M. 2006. Evaluating conjunctive triple pattern queries over large structured overlay networks. In Proceedings of the International Semantic Web Conference. 399--413. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Lua, E. K., Crowcroft, J., Pias, M., Sharma, R., and Lim, S. 2005. A survey and comparison of peer-to-peer overlay network schemes. IEEE Comm. Surv. Tutorials, 7, 2, 72--93. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Manola, F. and Miller, E. 2004. RDF primer: W3c recommendation. Decision Support Systems.Google ScholarGoogle Scholar
  34. Miliaraki, I. 2011. Distributed filtering and dissemination of XML data in peer-to-peer systems. Ph.D. dissertations, Department of Informatics and Telecommunications, National and Kapodistrian, University of Athens.Google ScholarGoogle Scholar
  35. Miliaraki, I., Kaoudi, Z., and Koubarakis, M. 2008. XML data dissemination using automata on top of structured overlay networks. In Proceedings of the 17th International World Wide Web Conference (WWW'08). ACM, New York, 865--874. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Miliaraki, I. and Koubarakis, M. 2010. Distributed structural and value XML filtering. In Proceedings of the 4th ACM International Conference on Distributed Event-Based Systems (DEBS'10). ACM, New York, 2--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Moro, M. M., Bakalov, P., and Tsotras, V. J. 2007. Early profile pruning on XML-aware publish/subscribe systems. In Proceedings of the 33rd International Conference on Very large Data Bases (VLDB'07). VLDB Endowment, 866--877. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Papaemmanouil, O. and Cetintemel, U. 2005. SemCast: Semantic multicast for content-based data dissemination. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 242--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Perez, J., Arenas, M., and Gutierrez, C. 2010. nSPARQL: A navigational language for RDF. Web Semant. 8, 255--270. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Ramabhadran, S., Ratnasamy, S., Hellerstein, J. M., and Shenker, S. 2004. Brief announcement: Prefix hash tree. In Proceedings of the 23rd Annual ACM Symposium on Principles of Distributed Computing (PODC'04). ACM, New York, 368--368. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Ramasubramanian, V., Peterson, R., and Sirer, E. G. 2006. Corona: A high performance publish/subscribe system for the World Wide Web. In Proceedings of the 3rd Conference on Networked Systems Design & Implementation (NSDI'06). Vol. 3. USENIX Association, Berkeley, CA, 2--2. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Rao, P. R. and Moon, B. 2009. Locating XML documents in a peer-to-peer network using distributed hash tables. IEEE Trans. Knowl. Data Eng. 21, 12, 1737--1752. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Ratnasamy, S., Francis, P., Handley, M., Karp, R., and Shenker, S. 2001. A scalable content addressable network. SIGCOMM Comput. Commun. Rev. 31, 161--172. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Rhea, S., Chun, B.-G., Kubiatowicz, J., and Shenker, S. 2005. Fixing the embarrassing slowness of OpenDHT on PlanetLab. In Proceedings of the 2nd Conference on Real, Large Distributed Systems (WORLDS'05). Vol. 2, USENIX Association, Berkeley, CA, 25--30. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Rowstron, A. and Druschel, P. 2001. Pastry: Scalable, decentralized object location, and routing for large-scale peer-to-peer systems. In Proceedings of the IFIP/ACM International Conference on Distributed System Platforms (Middleware '01). Springer, Berlin, 329--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Skobeltsyn, G., Hauswirth, M., and Aberer, K. 2005. Efficient processing of XPath queries with structured overlay networks. In Proceedings of the OTM Conferences. 1243--1260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Snoeren, A. C., Conley, K., and Gifford, D. K. 2001. Mesh-based content routing using XML. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP'01). ACM, New York, 160--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Stoica, I., Morris, R., Karger, D., Kaashoek, M. F., and Balakrishnan, H. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (SIGCOMM'01). ACM, New York, 149--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Tian, F., Reinwald, B., Pirahesh, H., Mayr, T., and Myllymaki, J. 2004. Implementing a scalable XML publish/subscribe system using relational database systems. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'04). ACM, New York, 479--490. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Tryfonopoulos, C., Idreos, S., and Koubarakis, M. 2005. Publish/subscribe functionality in IR environments using structured overlay networks. In Proceedings of the SIGIR. 322--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Uchiyama, H., Onizuka, M., and Honishi, T. 2005. Distributed XML stream filtering system with high scalability. In Proceedings of the 21st International Conference on Data Engineering (ICDE'05). IEEE, Los Alamitos, CA, 968--977. Google ScholarGoogle ScholarDigital LibraryDigital Library
  52. Voulgaris, S., Riviere, E., Kermarrec, A.-M., and van Steen, M. 2006. Sub-2-sub: Self-organizing content-based publish/subscribe for dynamic large scale collaborative networks. In Proceedings of the IPTPS.Google ScholarGoogle Scholar
  53. XMark 2001. XMark: An XML benchmark project. http://www.xml-benchmark.org/.Google ScholarGoogle Scholar
  54. YFilter release. 2004. YFilter 1.0 release. http://yfilter.cs.umass.edu/code_release.htm.Google ScholarGoogle Scholar
  55. Zhang, C., Krishnamurthy, A., and Wang, R. Y. 2005. Brushwood: Distributed trees in peer-to-peer systems. In Peer-to-Peer Systems IV, 4th International Workshop (IPTPS'05). Lecture Notes in Computer Science, Vol. 3640, Springer, Berlin, 47--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Zhou, A., Qian, W., Gong, X., and Zhou, M. 2007. Sonnet: An efficient distributed content-based dissemination broker (poster paper). In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'07). ACM, New York, 1094--1096. Google ScholarGoogle ScholarDigital LibraryDigital Library
  57. Zhou, M. and Wu, Y. 2010. XML-based RDF data management for efficient query processing. In Proceedings of the 13th International Workshop on the Web and Databases (WebDB'10). ACM, New York, 3:1--3:6. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FoXtrot: Distributed structural and value XML filtering

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on the Web
          ACM Transactions on the Web  Volume 6, Issue 3
          September 2012
          133 pages
          ISSN:1559-1131
          EISSN:1559-114X
          DOI:10.1145/2344416
          Issue’s Table of Contents

          Copyright © 2012 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 October 2012
          • Accepted: 1 May 2012
          • Revised: 1 March 2012
          • Received: 1 May 2011
          Published in tweb Volume 6, Issue 3

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!