skip to main content
research-article

FlexSaaS: A Reconfigurable Accelerator for Web Search Selection

Published:17 February 2019Publication History
Skip Abstract Section

Abstract

Web search engines deploy large-scale selection services on CPUs to identify a set of web pages that match user queries. An FPGA-based accelerator can exploit various levels of parallelism and provide a lower latency, higher throughput, more energy-efficient solution than commodity CPUs. However, maintaining such a customized accelerator in a commercial search engine is challenging because selection services are changed often. This article presents our design for FlexSaaS (Flexible Selection as a Service), an FPGA-based accelerator for web search selection. To address efficiency and flexibility challenges, FlexSaaS abstracts computing models and separates memory access from computation. Specifically, FlexSaaS (i) contains a reconfigurable number of matching processors that can handle various possible query plans, (ii) decouples index stream reading from matching computation to fetch and decode index files, and (iii) includes a universal memory accessor that hides the complex memory hierarchy and reduces host data access latency. Evaluated on FPGAs in the selection service of a commercial web search--the Bing web search engine—FlexSaaS can be evolved quickly to adapt to new updates. Compared to the software baseline, FlexSaaS on Arria 10 reduces average latency by 30% and increases throughput by 1.5×.

References

  1. Intel. n.d. Altera SDK for OpenCL. Available at https://www.altera.com/.Google ScholarGoogle Scholar
  2. Intel. n.d. Intel Vtune Amplifier. Retrieved January 26, 2019 from https://software.intel.com/en-us/intel-vtune-amplifier-xe.Google ScholarGoogle Scholar
  3. OpenMP. n.d. OpenMP Home Page. Retrieved January 26, 2019 from https://www.openmp.org/.Google ScholarGoogle Scholar
  4. Xilinx. n.d. SDAccel Development Environment. Available at https://www.xilinx.com/.Google ScholarGoogle Scholar
  5. Falk Scholer, Hugh E. Williams, John Yiannis, and Justin Zobel. 2002. Compression of inverted indexes for fast query evaluation. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 222--229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Vo Ngoc Anh and Alistair Moffat. 2005. Inverted index compression using word-aligned binary codes. Information Retrieval 8, 1 (2005), 151--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Naiyong Ao, Fan Zhang, Di Wu, Douglas S. Stones, Gang Wang, Xiaoguang Liu, et al. 2011. Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proceedings of the VLDB Endowment 4, 8 (2011), 470--481. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Dan Blandford and Guy Blelloch. 2002. Index compression through document reordering. In Proceedings of the 2002 Data Compression Conference (DCC’02). IEEE, Los Alamitos, CA, 342--351. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sergey Brin and Lawrence Page. 1998. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the 7th International World Wide Web Conference (WWW’98). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Andrei Z. Broder and Monika Rauch Henzinger. 1998. Information retrieval on the web. In Proceedings of the 39th Annual Symposium on Foundations of Computer Science (FOCS’98). 6.Google ScholarGoogle Scholar
  11. Jared Casper and Kunle Olukotun. 2014. Hardware acceleration of database operations. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 151--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Adrian M. Caulfield, Eric S. Chung, Andrew Putnam, Hari Angepat, Jeremy Fowers, Michael Haselman, et al. 2016. A cloud-scale acceleration architecture. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’16). IEEE, Los Alamitos, CA, 1--13. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Jeffrey Dean. 2009. Challenges in building large-scale information retrieval systems: Invited talk. In Proceedings of the 2nd ACM International Conference on Web Search and Data Mining. ACM, New York, NY, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shuai Ding, Jinru He, Hao Yan, and Torsten Suel. 2009. Using graphics processors for high performance IR query processing. In Proceedings of the 18th International Conference on World Wide Web. ACM, New York, NY, 421--430. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Tiziano Fagni, Raffaele Perego, Fabrizio Silvestri, and Salvatore Orlando. 2006. Boosting the performance of web search engines: Caching and prefetching query results by exploiting historical usage data. ACM Transactions on Information Systems 24, 1 (2006), 51--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, et al. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Norman P. Jouppi, Cliff Young, Nishant Patil, David Patterson, Gaurav Agrawal, Raminder Bajwa, et al. 2017. In-datacenter performance analysis of a tensor processing unit. arXiv:1704.04760. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. 2008. NVIDIA Tesla: A unified graphics and computing architecture. IEEE Micro 28, 2 (2008), 39--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Shaoli Liu, Zidong Du, Jinhua Tao, Dong Han, Tao Luo, Yuan Xie, et al. 2016. Cambricon: An instruction set architecture for neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 393--405. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Ikuo Magaki, Moein Khazraee, Luis Vega Gutierrez, and Michael Bedford Taylor. 2016. ASIC clouds: Specializing the datacenter. In Proceedings of the 43rd International Symposium on Computer Architecture. IEEE, Los Alamitos, CA, 178--190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. DAC Manning. 1995. Introduction. In Introduction to Industrial Minerals. Springer, 1--16.Google ScholarGoogle Scholar
  22. Jian Ouyang, Shiding Lin, Wei Qi, Yong Wang, Bo Yu, and Song Jiang. 2014. SDA: Software-defined accelerator for large-scale DNN systems. In Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS’14). IEEE, Los Alamitos, CA, 1--23.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jian Ouyang, Wei Qi, Yong Wang, Yichen Tu, Jing Wang, and Bowen Jia2016. SDA: Software-defined accelerator for general-purpose big data analysis system. In Proceedings of the 2016 IEEE Hot Chips 28 Symposium (HCS’16). IEEE, Los Alamitos, CA, 1--23.Google ScholarGoogle Scholar
  24. Raghu Prabhakar, Yaqi Zhang, David Koeplinger, Matt Feldman, Tian Zhao, Stefan Hadjis, et al. 2017. Plasticine: A reconfigurable architecture for parallel patterns. In Proceedings of the 44th Annual International Symposium on Computer Architecture. ACM, New York, NY, 389--402. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, et al. 2014. A reconfigurable fabric for accelerating large-scale datacenter services. In Proceedings of the ACM/IEEE 41st International Symposium on Computer Architecture (ISCA’14). IEEE, Los Alamitos, CA, 13--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Rimon Tadros. 2015. Accelerating Web Search Using GPUs. Ph.D. Dissertation. University of British Columbia.Google ScholarGoogle Scholar
  27. Wim Vanderbauwhede, Leif Azzopardi, and Mahmoud Moadeli. 2009. FPGA-accelerated information retrieval: High-efficiency document filtering. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL’09). IEEE, Los Alamitos, CA, 417--422.Google ScholarGoogle ScholarCross RefCross Ref
  28. Di Wu, Fan Zhang, Naiyong Ao, Fang Wang, Xiaoguang Liu, and Gang Wang. 2009. A batched GPU algorithm for set intersection. In Proceedings of the 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN’09). IEEE, Los Alamitos, CA, 752--756. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Di Wu, Fan Zhang, Naiyong Ao, Gang Wang, Xiaoguang Liu, and Jing Liu. 2010. Efficient lists intersection by CPU-GPU cooperative computing. In Proceedings of the IEEE International Symposium on Parallel and Distributed Processing, Workshops, and Phd Forum (IPDPSW’10). IEEE, Los Alamitos, CA, 1--8.Google ScholarGoogle Scholar
  30. Jing Yan, Zhan-Xiang Zhao, Ning-Yi Xu, Xi Jin, Lin-Tao Zhang, and Feng-Hsiung Hsu. 2012. Efficient query processing for web search engine with FPGAs. In Proceedings of the IEEE 20th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’12). IEEE, Los Alamitos, CA, 97--100. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Jiangong Zhang, Xiaohui Long, and Torsten Suel. 2008. Performance of compressed inverted list caching in search engines. In Proceedings of the 17th International Conference on World Wide Web. ACM, New York, NY, 387--396. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Justin Zobel and Alistair Moffat. 2006. Inverted files for text search engines. ACM Computing Surveys 38, 2 (2006), 6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Marcin Zukowski, Sandor Heman, Niels Nes, and Peter Boncz. 2006. Super-scalar RAM-CPU cache compression. In Proceedings of the 22nd International Conference on Data Engineering (ICDE’06). IEEE, Los Alamitos, CA, 59. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. FlexSaaS: A Reconfigurable Accelerator for Web Search Selection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 12, Issue 1
          March 2019
          115 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3310278
          • Editor:
          • Deming Chen
          Issue’s Table of Contents

          Copyright © 2019 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 17 February 2019
          • Accepted: 1 December 2018
          • Revised: 1 October 2018
          • Received: 1 June 2018
          Published in trets Volume 12, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed
        • Article Metrics

          • Downloads (Last 12 months)13
          • Downloads (Last 6 weeks)1

          Other Metrics

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!