Abstract
Interactive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led researchers to consider a wide range of optimizations to reduce response latency, including query processing parallelization and acceleration with co-processors such as GPUs. However, previous work runs queries either on GPU or CPU, ignoring the fact that the best processor for a given query depends on the query's characteristics, which may change as the processing proceeds.
We present Griffin, an IR systems that dynamically combines GPU- and CPU-based algorithms to process individual queries according to their characteristics. Griffin uses state-of-the-art CPU-based query processing techniques and incorporates a novel approach to GPU-based query evaluation. Our GPU-based approach, as far as we know, achieves the best available GPU search performance by leveraging a new compression scheme and exploiting an advanced merge-based intersection algorithm. We evaluate Griffin with real world queries and datasets, and show that it improves query performance by 10x compared to a highly optimized CPU-only implementation, and 1.5x compared to our GPU-approach running alone. We also find that Griffin helps reduce the 95th-, 99th-, and 99.9th-percentile query response time by 10.4x, 16.1x, and 26.8x, respectively.
- http://trec.nist.gov/.Google Scholar
- https://developer.nvidia.com/about-cuda.Google Scholar
- http://docs.nvidia.com/cuda/cuda-math-api/group__CUDA_MATH__INTRINSIC__IN T.html#group_CUDA__MATH_INTRINSIC_INT_1g43c9c7d2b9ebf202ff1ef5769989be46.Google Scholar
- https://nvlabs.github.io/moderngpu/intro.html#libraries.Google Scholar
- http://www.lemurproject.org/clueweb12.php.Google Scholar
- S. R. Agrawal, V. Pistol, J. Pang, J. Tran, D. Tarjan, and A. R. Lebeck. Rhythm: Harnessing data parallel hardware for server workloads. In ASPLOS, pages 19--34, 2014. Google Scholar
Digital Library
- T. Alabi, J. D. Blanchard, B. Gordon, and R. Steinbach. Fast k-selection algorithms for graphics processing units. JEA, 17:1--29, 2012. Google Scholar
Digital Library
- N. Ao, F. Zhang, D. Wu, D. S. Stones, G. Wang, X. Liu, J. Liu, and S. Lin. Efficient parallel lists intersection and index compression algorithms using graphics processing units. PVLDB, 4(8):470--481, 2011. Google Scholar
Digital Library
- N. Bansal, K. Dhamdhere, and A. Sinha. Non-clairvoyant scheduling for minimizing mean slowdown. Algorithmica, 40(4):305--318, 2004. Google Scholar
Digital Library
- L. A. Barroso and U. Hölzle. The case for energy-proportional computing. Computer, 40(12):33--37, 2007. Google Scholar
Digital Library
- J. S. Culpepper and A. Moffat. Efficient set intersection for inverted indexing. TOIS, 29(1):1--25, 2010. Google Scholar
Digital Library
- S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance ir query processing. In WWW, pages 421--430, 2009. Google Scholar
Digital Library
- P. Elias. Efficient storage and retrieval by content and address of static files. JACM, 21(2):246--260, 1974. Google Scholar
Digital Library
- D. Feitelson. A survey of scheduling in multiprogrammed parallel systems, 1994. Research report. IBM T.J. Watson Research Center, 1994.Google Scholar
- O. Green, R. McColl, and D. A. Bader. Gpu merge path: A gpu merging algorithm. In ICS, pages 331--340, 2012. Google Scholar
Digital Library
- M. E. Haque, Y. h. Eom, Y. He, S. Elnikety, R. Bianchini, and K. S. McKinley. Few-to-many: Incremental parallelism for reducing tail latency in interactive services. In ASPLOS, pages 161--175, 2015. Google Scholar
Digital Library
- V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In ISCA, pages 314--325, 2010. Google Scholar
Digital Library
- M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In EuroSys, pages 155--168, 2013. Google Scholar
Digital Library
- M. Jeon, S. Kim, S.-w. Hwang, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Predictive parallelization: Taming tail latencies in web search. In SIGIR, pages 253--262, 2014. Google Scholar
Digital Library
- J. Li, H.-W. Tseng, C. Lin, Y. Papakonstantinou, and S. Swanson. Hippogriffdb: Balancing i/o and gpu bandwidth in big data analytics. PVLDB, 9:1647--1658, 2016. Google Scholar
Digital Library
- Y. Liu, H. Tseng, M. Gahagan, J. Li, Y. Jin, and S. Swanson. Hippogriff: Efficiently moving data in heterogeneous computing systems. In ICCD, pages 376--379, 2016.Google Scholar
Cross Ref
- Y. Liu, H. Tseng, and S. Swanson. Spmario: Scale up mapreduce with i/o-oriented scheduling for the GPU. In ICCD, pages 384--387, 2016.Google Scholar
Cross Ref
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. Google Scholar
Digital Library
- C. McCann, R. Vaswani, and J. Zahorjan. A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors. TOCS, 11(2):146--178, 1993. Google Scholar
Digital Library
- S. Odeh, O. Green, Z. Mwassi, O. Shmueli, and Y. Birk. Merge path - parallel merging made simple. In IPDPSW, pages 1611--1618, 2012. Google Scholar
Digital Library
- A. Raman, H. Kim, T. Oh, J. W. Lee, and D. I. August. Parallelism orchestration using dope: The degree of parallelism executive. In PLDI, pages 26--37, 2011. Google Scholar
Digital Library
- S. E. Robertson and S. Walker. Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In SIGIR, pages 232--241, 1994. Google Scholar
Digital Library
- A. Singhal. Modern information retrieval: a brief overview. IEEE Data Engineering Bulletin, 24(4):35--43, 2001.Google Scholar
- T. Strohman, H. Turtle, and W. B. Croft. Optimization strategies for complex queries. In SIGIR, pages 219--225, 2005. Google Scholar
Digital Library
- S. Tatikonda, B. B. Cambazoglu, and F. P. Junqueira. Posting list intersection on multicore architectures. In SIGIR, pages 963--972, 2011. Google Scholar
Digital Library
- S. Vigna. Quasi-succinct indices. In WSDM, pages 83--92, 2013. Google Scholar
Digital Library
- J. Wang, C. Lin, R. He, M. Chae, Y. Papakonstantinou, and S. Swanson. Milc: Inverted list compression in memory. PVLDB, 10(8):853--864, 2017. Google Scholar
Digital Library
- J. Wang, C. Lin, Y. Papakonstantinou, and S. Swanson. An experimental study of bitmap compression vs. inverted list compression. In SIGMOD, pages 993--1008, 2017. Google Scholar
Digital Library
- J. Wang, E. Lo, M. L. Yiu, J. Tong, G. Wang, and X. Liu. The impact of solid state drive on search engine cache management. In SIGIR, pages 693--702, 2013. Google Scholar
Digital Library
- J. Wang, D. Park, Y.-S. Kee, Y. Papakonstantinou, and S. Swanson. Ssd in-storage computing for list intersection. In DaMoN, pages 1--7, 2016. Google Scholar
Digital Library
- J. Wang, D. Park, Y. Papakonstantinou, and S. Swanson. Ssd in-storage computing for search engines. TC, 2016.Google Scholar
- D. Wu, F. Zhang, N. Ao, G. Wang, X. Liu, and J. Liu. Efficient lists intersection by cpu-gpu cooperative computing. In IPDPSW, pages 1--8, 2010.Google Scholar
- H. Yan, S. Ding, and T. Suel. Inverted index compression and query processing with optimized document ordering. In WWW, pages 401--410, 2009. Google Scholar
Digital Library
- J. Zhang, X. Long, and T. Suel. Performance of compressed inverted list caching in search engines. In WWW, pages 387--396, 2008. Google Scholar
Digital Library
- J. Zobel and A. Moffat. Inverted files for text search engines. CSUR, 38(2), 2006. Google Scholar
Digital Library
- M. Zukowski, S. Heman, N. Nes, and P. Boncz. Super-scalar ram-cpu cache compression. In ICDE, 2006. Google Scholar
Digital Library
Index Terms
Griffin: uniting CPU and GPU in information retrieval systems for intra-query parallelism
Recommendations
Griffin: uniting CPU and GPU in information retrieval systems for intra-query parallelism
PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingInteractive information retrieval services, such as enterprise search and document search, must provide relevant results with consistent, low response times in the face of rapidly growing data sets and query loads. These growing demands have led ...
yaSpMV: yet another SpMV framework on GPUs
PPoPP '14SpMV is a key linear algebra algorithm and has been widely used in many important application domains. As a result, numerous attempts have been made to optimize SpMV on GPUs to leverage their massive computational throughput. Although the previous work ...
A Cross-Platform SpMV Framework on Many-Core Architectures
Sparse Matrix-Vector multiplication (SpMV) is a key operation in engineering and scientific computing. Although the previous work has shown impressive progress in optimizing SpMV on many-core architectures, load imbalance and high memory bandwidth ...







Comments