skip to main content
research-article

A study on parallelizing XML path filtering using accelerators

Published:10 March 2014Publication History
Skip Abstract Section

Abstract

Publish-subscribe systems present the state of the art in information dissemination to multiple users. Such systems have evolved from simple topic-based to the current XML-based systems. XML-based pub-sub systems provide users with more flexibility by allowing the formulation of complex queries on the content as well as the structure of the streaming messages. Messages that match a given user query are forwarded to the user. This article examines how to exploit the parallelism found in XPath filtering. Using an incoming XML stream, parsing and matching thousands of user profiles are performed simultaneously by matching engines. We show the benefits and trade-offs of mapping the proposed filtering approach onto FPGAs, processing streams of XML at wire speed, and GPUs, providing the flexibility of software. This is in contrast to conventional approaches bound by the sequential aspect of software computing, associated with a large memory footprint. By converting XPath expressions into custom stacks, our solution is the first to provide support for complex XPath structural constructs, such as parent-child and ancestor descendant relations, whilst allowing wildcarding and recursion. The measured speedups resulting from the GPU and FPGA accelerations versus single-core CPUs are up to 6.6X and 2.5 orders of magnitude, respectively. The FPGA approaches are up to 31X faster than software running on 12 CPU cores.

References

  1. Shurug Al-Khalifa, H. V. Jagadish, Nick Kodus, Jignesh Patel, Divesh Srivastava, and Yuqing Wu. 2002. Structural Joins: A primitive for efficient XML query pattern matching. In Proceedings of the 18th International Conference on Data Engineering. IEEE Computer Society, 141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Mehmet Altinel and Michael J. Franklin. 2000. Efficient filtering of XML documents for selective dissemination of information. In Proceedings of the 26th International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., San Francisco, CA, 53--64. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Naiyong K. Ao, Fan Zhang, Di Wu, Douglas Stones, Gang Wang, Xiaoguang Liu, Jing Liu, and Lin Sheng. 2011. Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proc. VLDB Endow. 4, 8, 470--481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Denilson Barbosa, Alberto Mendelzon, John Keenleyside, and Kelly Lyons. 2002. ToXgene: A template-based data generator for XML. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. K. Selçuk Candan, Wang-Pin Hsiung, Songting Chen, Junichi Tatemura, and Divyakant Agrawal. 2006. AFilter: Adaptable XML filtering with prefix-caching suffix-clustering. In Proceedings of the 32nd International Conference on Very Large Data Bases. VLDB Endowment, 559--570. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C.-Y. Chan, P. Felber, M. Garofalakis, and R. Rastogi. 2002. Efficient filtering of XML documents with XPath expressions. VLDB J. 11, 4, 354--379. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Zefu Dai, Nick Ni, and Jianwen Zhu. 2010. A 1 cycle-per-byte XML parsing accelerator. In Proceedings of the 18th International Symposium on FPGAs. ACM, New York, NY, 199--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Yanlei Diao, Mehmet Altinel, Michael J. Franklin, Hao Zhang, and Peter Fischer. 2003. Path sharing and predicate evaluation for high-performance XML filtering. ACM Trans. Datab. Syst. 28, 4, 467--516. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fadi El-Hassan and Dan Ionescu. 2009. SCBXP: An efficient hardware-based XML parsing technique. In Proceedings of the 5th Southern Conference on Programmable Logic. IEEE, 45--50.Google ScholarGoogle ScholarCross RefCross Ref
  10. Gang Gou and Rada Chirkova. 2007. Efficient algorithms for evaluating XPath over streams. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 269--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Todd J. Green, Ashish Gupta, Gerome Miklau, Makoto Onizuka, and Dan Suciu. 2004. Processing XML streams with deterministic automata and stream indexes. ACM Trans. Datab. Syst. 29, 4, 752--788. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Ashish Kumar Gupta and Dan Suciu. 2003. Stream processing of XPath queries with predicates. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 419--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Bingsheng He, Qiong Luo, and Byron Choi. 2006. Cache-conscious automata for XML filtering. IEEE Trans. Knowl. Data Eng. 18, 12, 1629--1644. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Bingsheng He, Ke Yang, Rui Fang, Mian Lu, Naga Govindaraju, Qiong Luo, and Pedro Sander. 2008. Relational joins on graphics processors. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD'08). ACM, New York, NY, 511--524. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony Nguyen, Tim Kaldewey, Victor Lee, Scott Brandt, and Pradeep Dubey. 2010. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. In Proceedings of the ACM SIGMOD International Conference on Management of Data. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Joonho Kwon, Praveen Rao, Bongki Moon, and Sukho Lee. 2005. FiST: Scalable XML document filtering by sequencing twig patterns. In Proceedings of the 31st International Conference on Very Large Data Bases. VLDB Endowment, 217--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. M. D. Lieberman, J. Sankaranarayanan, and H. Samet. 2008. A fast similarity join algorithm using graphics processing units. In Proceedings of the IEEE 24th International Conference on Data Engineering (ICDE'08). IEEE, 1111--1120. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bertram Ludäscher, Pratik Mukhopadhyay, and Yannis Papakonstantinou. 2002. A transducer-based XML query processor. In Proceedings of the 28th International Conference on Very Large Data Bases. VLDB Endowment, 227--238. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. V. Lunteren, T. Engbersen, J. Bostian, B. Carey, and C. Larsson. 2004. XML accelerator engine. In Proceedings of the 1st International Workshop on High Performance XML Processing. Springer Berlin.Google ScholarGoogle Scholar
  20. Abhishek Mitra, Marcos R. Vieira, Petko Bakalov, Walid Najjar, and Vassilis J. Tsotras. 2009. Boosting XML filtering through a scalable FPGA-based architecture. In Proceedings of the 4th Conference on Innovative Data Systems Research. ACM.Google ScholarGoogle Scholar
  21. Mirella M. Moro, Petko Bakalov, and Vassilis J. Tsotras. 2007. Early profile pruning on XML-aware publish-subscribe systems. In Proceedings of the 33rd International Conference on Very Large Data Bases. VLDB Endowment, 866--877. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. R. Moussalli, R. Halstead, M. Salloum, W. Najjar, and V. J. Tsotras. 2011a. Efficient XML path filtering using GPUs. In Proceedings of the Workshop on Accelerating Data Management Systems (ADMS).Google ScholarGoogle Scholar
  23. Roger Moussalli, Mariam Salloum, Walid Najjar, and Vassilis Tsotras. 2010. Accelerating XML query matching through custom stack generation on FPGAs. In Proceedings of the 5th International Conference on High Performance Embedded Architectures and Compilers. Lecture Notes in Computer Science, vol. 5952. Springer, Berlin, 141--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. Moussalli, M. Salloum, W. Najjar, and V. J. Tsotras. 2011b. Massively parallel XML twig filtering using dynamic programming on FPGAs. In Proceedings of the IEEE 27th International Conference on Data Engineering (ICDE). IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Rene Mueller, Jens Teubner, and Gustavo Alonso. 2009. Streams on wires: A query compiler for FPGAs. Proc. VLDB Endow. 2, 1, 229--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Feng Peng and Sudarshan S. Chawathe. 2003. XPath queries on streaming data. In Proceedings of the ACM SIGMOD International Conference on Management of Data. ACM, New York, NY, 431--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Mohammad Sadoghi, Martin Labrecque, Harsh Singh, Warren Shum, and Hans-Amo Jacobsen. 2010. Efficient event processing through reconfigurable hardware for algorithmic trading. In Proceedings of the International Conference on Very Large Data Bases (VLDB).Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Mariam Salloum and V. J. Tsotras. 2009. Efficient and scalable sequence-based XML filtering system. In Proceedings of the 12th International Workshop on the Web and Databases (WebDB). ACM.Google ScholarGoogle Scholar
  29. Pranav S. Vaidya, Jaehwan John Lee, Francis Bowen, Yingzi Du, Chandima H. Nadungodage, and Yuni Xia. 2010. Symbiote: A reconfigurable logic assisted data stream management system (RLADSMS). In Proceedings of the International Conference on Management of Data. ACM, New York, NY, 1147--1150. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A study on parallelizing XML path filtering using accelerators

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!