skip to main content
research-article
Public Access

Exploiting accelerators for efficient high dimensional similarity search

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

Similarity search finds the most similar matches in an object collection for a given query; making it an important problem across a wide range of disciplines such as web search, image recognition and protein sequencing. Practical implementations of High Dimensional Similarity Search (HDSS) search across billions of possible solutions for multiple queries in real time, making its performance and efficiency a significant challenge. Existing clusters and datacenters use commercial multicore hardware to perform search, which may not provide the optimal performance and performance per Watt.

This work explores the performance, power and cost benefits of using throughput accelerators like GPUs to perform similarity search for query cohorts even under tight deadlines. We propose optimized implementations of similarity search for both the host and the accelerator. Augmenting existing Xeon servers with accelerators results in a 3× improvement in throughput per machine, resulting in a more than 2.5× reduction in cost of ownership, even for discounted Xeon servers. Replacing a Xeon based cluster with an accelerator based cluster for similarity search reduces the total cost of ownership by more than 6× to 16× while consuming significantly less power than an ARM based cluster.

References

  1. S. R. Agrawal. Harnessing Data Parallel Hardware for Server Workloads. PhD thesis, Duke University, 2015.Google ScholarGoogle Scholar
  2. S. R. Agrawal, V. Pistol, J. Pang, J. Tran, D. Tarjan, and A. R. Lebeck. Rhythm: Harnessing data parallel hardware for server workloads. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 19--34, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2305-5. doi: 10.1145/2541940.2541956. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. L. A. Barroso, J. Clidaras, and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition. 2013. doi: 10.2200/S00516ED2V01Y201306CAC024. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv., 33(3):322--373, Sept. 2001. ISSN 0360-0300. doi: 10.1145/502807.502809. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. URL http://dl.acm.org/citation.cfm?id=297805.297827. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Dalton, L. Olson, and N. Bell. Optimizing sparse matrix---matrix multiplication for the gpu. ACM Trans. Math. Softw., 41(4):25:1--25:20, Oct. 2015. ISSN 0098-3500. doi: 10.1145/2699470. URL http://doi.acm.org/10.1145/2699470. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance IR query processing. In Proceedings of the 18th International Conference on World Wide Web, WWW'09, pages 421--430, New York, USA, 2009. ACM. ISBN 978-1-60558-487-4. doi: 10.1145/1526709.1526766. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. W. Dong. High-dimensional Similarity Search for Large Datasets. PhD thesis, Princeton, NJ, USA, 2011. AAI3481579. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 365--376, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0472-6. doi: 10.1145/2000064.2000108. URL http://doi.acm.org/10.1145/2000064.2000108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 37--48, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-0759-8. doi: 10.1145/2150976. 2150982. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pages 518--529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1-55860-615-7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. Gish and D. J. States. Identification of protein coding regions by database similarity search. Nat Genet, 3(3):266--272, Mar. 1993. doi: 10.1038/ng0393-266.Google ScholarGoogle ScholarCross RefCross Ref
  13. N. Goharian, T. El-Ghazawi, and D. Grossman. Enterprise text processing: a sparse matrix approach. In Information Technology: Coding and Computing, 2001. Proceedings. International Conference on, pages 71--75, Apr. 2001. doi: 10.1109/ITCC.2001.918768. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Google Zeitgeist 2012. Google zeitgeist 2012. http://www.google.com/zeitgeist/2012/#the-world.Google ScholarGoogle Scholar
  15. T. H. Hetherington, T. G. Rogers, L. Hsu, M. O'Connor, and T. M. Aamodt. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS '12, pages 88--98, Washington, DC, USA, 2012. IEEE Computer Society. ISBN 978-1-4673-1143-4. doi: 10.1109/ISPASS.2012.6189209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. T. H. Hetherington, M. O'Connor, and T. M. Aamodt. Memcachedgpu: Scaling-up scale-out key-value stores. In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC '15, pages 43--57, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3651-2. doi: 10.1145/2806777.2806836. URL http://doi.acm.org/10.1145/2806777.2806836. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC '98, pages 604--613, New York, NY, USA, 1998. ACM. ISBN 0-89791-962-9. doi: 10.1145/276698.276876. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Intel. Advancing moore's law in 2014the road to 14 nm. 2014. URL http://www.intel.com/content/www/us/en/silicon-innovations/advancing-moores-law-in-2014-presentation.html.Google ScholarGoogle Scholar
  19. V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In Proceedings of the 37th annual international symposium on Computer architecture, ISCA '10, pages 314--325, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0053-7. doi: 10.1145/1815961.1816002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 155--168, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1994-2. doi: 10.1145/2465351.2465367. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. Micro, IEEE, 31(5):7--17, Sept. 2011. ISSN 0272-1732. doi: 10.1109/MM.2011.89. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. T. Kgil, A. Saidi, N. Binkert, S. Reinhardt, K. Flautner, and T. Mudge. Picoserver: Using 3D stacking technology to build energy efficient servers. J. Emerg. Technol. Comput. Syst., 4(4):16:1--16:34, Nov. 2008. ISSN 1550-4832. doi: 10.1145/1412587.1412589. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. Koomey. A simple model for determining true total cost of ownership for data centers.Google ScholarGoogle Scholar
  24. J. R. Larus and M. Parkes. Using Cohort Scheduling to Enhance Server Performance (Extended Abstract). In LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems, pages 182--187, New York, NY, USA, 2001. ACM. ISBN 1-58113-425-8. doi: 10.1145/384197.384222. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. D. Lipman and W. Pearson. Rapid and sensitive protein similarity searches. Science, 227(4693):1435--1441, 1985. doi: 10.1126/science.2983426. URL http://www.sciencemag.org/content/227/4693/1435.abstract.Google ScholarGoogle ScholarCross RefCross Ref
  26. W. Liu and B. Vinter. A framework for general sparse matrix-matrix multiplication on gpus and heterogeneous processors. J. Parallel Distrib. Comput., 85(C):47--61, Nov. 2015. ISSN 0743-7315. doi: 10.1016/j.jpdc.2015.06.010. URL http://dx.doi.org/10.1016/j.jpdc.2015.06.010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kocberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi. Scaleout processors. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages --, Washington DC, USA, 2012. IEEE Computer Society. ISBN 978-1-4503-1642-2. doi: 10.1145/2337159.2337217. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. ISBN 0521865719, 9780521865715. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. T. Mudge and U. Holzle. Challenges and opportunities for extremely energy-efficient processors. IEEE Micro, 30(4):20--24, July 2010. ISSN 0272-1732. doi: 10.1109/MM.2010.61. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. M. A. Patwary, N. R. Satish, N. Sundaram, J. Park, M. J. Anderson, S. G. Vadlamudi, D. Das, S. G. Pudov, V. O. Pirogov, and P. Dubey. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In High Performance Computing, pages 48--57. Springer, 2015.Google ScholarGoogle ScholarCross RefCross Ref
  31. A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In 41st Annual International Symposium on Computer Architecture (ISCA), June 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12): 1349--1380, Dec. 2000. ISSN 0162-8828. doi: 10.1109/34.895972. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the web: The public and their queries. J. Am. Soc. Inf. Sci. Technol., 52(3):226--234, Feb. 2001. ISSN 1532-2882. doi: 10.1002/1097-4571(2000)9999:9999⟨::AID-ASI1591⟩3.3.CO;2-I. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proc. VLDB Endow., 6(14):1930--1941, Sept. 2013. ISSN 2150-8097. URL http://dl.acm.org/citation.cfm?id=2556549.2556574. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. H. Sundmaeker, P. Guillemin, P. Friess, and S. Woelfflé. Vision and challenges for realising the internet of things. Cluster of European Research Projects on the Internet of Things, European Commision, 2010.Google ScholarGoogle Scholar
  36. Verizon. State of the market the internet of things 2015. 2015. URL http://www.verizonenterprise.com/resources/reports/rp_state-of-market-the-market-the-internet-of-things-2015_en_xg.pdf.Google ScholarGoogle Scholar

Index Terms

  1. Exploiting accelerators for efficient high dimensional similarity search

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 51, Issue 8
          PPoPP '16
          August 2016
          405 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/3016078
          Issue’s Table of Contents
          • cover image ACM Conferences
            PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
            February 2016
            420 pages
            ISBN:9781450340922
            DOI:10.1145/2851141

          Copyright © 2016 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 27 February 2016

          Check for updates

          Qualifiers

          • research-article

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!