skip to main content
research-article

Griffin: uniting CPU and GPU in information retrieval systems for intra-query parallelism

Published:10 February 2018Publication History
Skip Abstract Section

Abstract

Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led researchers to consider a wide range of optimizations to reduce response latency, including query processing parallelization and acceleration with co-processors such as GPUs. However, previous work runs queries either on GPU or CPU, ignoring the fact that the best processor for a given query depends on the query's characteristics, which may change as the processing proceeds.

We present Griffin, an IR systems that dynamically combines GPU- and CPU-based algorithms to process individual queries according to their characteristics. Griffin uses state-of-the-art CPU-based query processing techniques and incorporates a novel approach to GPU-based query evaluation. Our GPU-based approach, as far as we know, achieves the best available GPU search performance by leveraging a new compression scheme and exploiting an advanced merge-based intersection algorithm. We evaluate Griffin with real world queries and datasets, and show that it improves query performance by 10x compared to a highly optimized CPU-only implementation, and 1.5x compared to our GPU-approach running alone. We also find that Griffin helps reduce the 95th-, 99th-, and 99.9th-percentile query response time by 10.4x, 16.1x, and 26.8x, respectively.

References

  1. http://trec.nist.gov/.Google ScholarGoogle Scholar
  2. https://developer.nvidia.com/about-cuda.Google ScholarGoogle Scholar
  3. http://docs.nvidia.com/cuda/cuda-math-api/group__CUDA_MATH__INTRINSIC__IN T.html#group_CUDA__MATH_INTRINSIC_INT_1g43c9c7d2b9ebf202ff1ef5769989be46.Google ScholarGoogle Scholar
  4. https://nvlabs.github.io/moderngpu/intro.html#libraries.Google ScholarGoogle Scholar
  5. http://www.lemurproject.org/clueweb12.php.Google ScholarGoogle Scholar
  6. S. R. Agrawal, V. Pistol, J. Pang, J. Tran, D. Tarjan, and A. R. Lebeck. Rhythm: Harnessing data parallel hardware for server workloads. In ASPLOS, pages 19--34, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Alabi, J. D. Blanchard, B. Gordon, and R. Steinbach. Fast k-selection algorithms for graphics processing units. JEA, 17:1--29, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. N. Ao, F. Zhang, D. Wu, D. S. Stones, G. Wang, X. Liu, J. Liu, and S. Lin. Efficient parallel lists intersection and index compression algorithms using graphics processing units. PVLDB, 4(8):470--481, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. N. Bansal, K. Dhamdhere, and A. Sinha. Non-clairvoyant scheduling for minimizing mean slowdown. Algorithmica, 40(4):305--318, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. L. A. Barroso and U. Hölzle. The case for energy-proportional computing. Computer, 40(12):33--37, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. J. S. Culpepper and A. Moffat. Efficient set intersection for inverted indexing. TOIS, 29(1):1--25, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance ir query processing. In WWW, pages 421--430, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. P. Elias. Efficient storage and retrieval by content and address of static files. JACM, 21(2):246--260, 1974. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. D. Feitelson. A survey of scheduling in multiprogrammed parallel systems, 1994. Research report. IBM T.J. Watson Research Center, 1994.Google ScholarGoogle Scholar
  15. O. Green, R. McColl, and D. A. Bader. Gpu merge path: A gpu merging algorithm. In ICS, pages 331--340, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. M. E. Haque, Y. h. Eom, Y. He, S. Elnikety, R. Bianchini, and K. S. McKinley. Few-to-many: Incremental parallelism for reducing tail latency in interactive services. In ASPLOS, pages 161--175, 2015. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In ISCA, pages 314--325, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In EuroSys, pages 155--168, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Jeon, S. Kim, S.-w. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Predictive parallelization: Taming tail latencies in web search. In SIGIR, pages 253--262, 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. J. Li, H.-W. Tseng, C. Lin, Y. Papakonstantinou, and S. Swanson. Hippogriffdb: Balancing i/o and gpu bandwidth in big data analytics. PVLDB, 9:1647--1658, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Y. Liu, H. Tseng, M. Gahagan, J. Li, Y. Jin, and S. Swanson. Hippogriff: Efficiently moving data in heterogeneous computing systems. In ICCD, pages 376--379, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  22. Y. Liu, H. Tseng, and S. Swanson. Spmario: Scale up mapreduce with i/o-oriented scheduling for the GPU. In ICCD, pages 384--387, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  23. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. TOCS, 11(2):146--178, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk. Merge path - parallel merging made simple. In IPDPSW, pages 1611--1618, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using dope: The degree of parallelism executive. In PLDI, pages 26--37, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR, pages 232--241, 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Singhal. Modern information retrieval: a brief overview. IEEE Data Engineering Bulletin, 24(4):35--43, 2001.Google ScholarGoogle Scholar
  29. T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In SIGIR, pages 219--225, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira. Posting list intersection on multicore architectures. In SIGIR, pages 963--972, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. S. Vigna. Quasi-succinct indices. In WSDM, pages 83--92, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. J. Wang, C. Lin, R. He, M. Chae, Y. Papakonstantinou, and S. Swanson. Milc: Inverted list compression in memory. PVLDB, 10(8):853--864, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. J. Wang, C. Lin, Y. Papakonstantinou, and S. Swanson. An experimental study of bitmap compression vs. inverted list compression. In SIGMOD, pages 993--1008, 2017. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. J. Wang, E. Lo, M. L. Yiu, J. Tong, G. Wang, and X. Liu. The impact of solid state drive on search engine cache management. In SIGIR, pages 693--702, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. J. Wang, D. Park, Y.-S. Kee, Y. Papakonstantinou, and S. Swanson. Ssd in-storage computing for list intersection. In DaMoN, pages 1--7, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. J. Wang, D. Park, Y. Papakonstantinou, and S. Swanson. Ssd in-storage computing for search engines. TC, 2016.Google ScholarGoogle Scholar
  37. D. Wu, F. Zhang, N. Ao, G. Wang, X. Liu, and J. Liu. Efficient lists intersection by cpu-gpu cooperative computing. In IPDPSW, pages 1--8, 2010.Google ScholarGoogle Scholar
  38. H. Yan, S. Ding, and T. Suel. Inverted index compression and query processing with optimized document ordering. In WWW, pages 401--410, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. In WWW, pages 387--396, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. J. Zobel and A. Moffat. Inverted files for text search engines. CSUR, 38(2), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In ICDE, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Griffin: uniting CPU and GPU in information retrieval systems for intra-query parallelism

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 53, Issue 1
          PPoPP '18
          January 2018
          426 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3200691
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
            February 2018
            442 pages
            ISBN:9781450349826
            DOI:10.1145/3178487

          Copyright © 2018 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 10 February 2018

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!