Abstract
Similarity search finds the most similar matches in an object collection for a given query; making it an important problem across a wide range of disciplines such as web search, image recognition and protein sequencing. Practical implementations of High Dimensional Similarity Search (HDSS) search across billions of possible solutions for multiple queries in real time, making its performance and efficiency a significant challenge. Existing clusters and datacenters use commercial multicore hardware to perform search, which may not provide the optimal performance and performance per Watt.
This work explores the performance, power and cost benefits of using throughput accelerators like GPUs to perform similarity search for query cohorts even under tight deadlines. We propose optimized implementations of similarity search for both the host and the accelerator. Augmenting existing Xeon servers with accelerators results in a 3× improvement in throughput per machine, resulting in a more than 2.5× reduction in cost of ownership, even for discounted Xeon servers. Replacing a Xeon based cluster with an accelerator based cluster for similarity search reduces the total cost of ownership by more than 6× to 16× while consuming significantly less power than an ARM based cluster.
- S. R. Agrawal. Harnessing Data Parallel Hardware for Server Workloads. PhD thesis, Duke University, 2015.Google Scholar
- S. R. Agrawal, V. Pistol, J. Pang, J. Tran, D. Tarjan, and A. R. Lebeck. Rhythm: Harnessing data parallel hardware for server workloads. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 19--34, New York, NY, USA, 2014. ACM. ISBN 978-1-4503-2305-5. doi: 10.1145/2541940.2541956. Google Scholar
Digital Library
- L. A. Barroso, J. Clidaras, and U. Hölzle. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Second Edition. 2013. doi: 10.2200/S00516ED2V01Y201306CAC024. Google Scholar
Digital Library
- C. Böhm, S. Berchtold, and D. A. Keim. Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases. ACM Comput. Surv., 33(3):322--373, Sept. 2001. ISSN 0360-0300. doi: 10.1145/502807.502809. Google Scholar
Digital Library
- S. Brin and L. Page. The anatomy of a large-scale hypertextual web search engine. In Proceedings of the Seventh International Conference on World Wide Web 7, WWW7, pages 107--117, Amsterdam, The Netherlands, The Netherlands, 1998. Elsevier Science Publishers B. V. URL http://dl.acm.org/citation.cfm?id=297805.297827. Google Scholar
Digital Library
- S. Dalton, L. Olson, and N. Bell. Optimizing sparse matrix---matrix multiplication for the gpu. ACM Trans. Math. Softw., 41(4):25:1--25:20, Oct. 2015. ISSN 0098-3500. doi: 10.1145/2699470. URL http://doi.acm.org/10.1145/2699470. Google Scholar
Digital Library
- S. Ding, J. He, H. Yan, and T. Suel. Using graphics processors for high performance IR query processing. In Proceedings of the 18th International Conference on World Wide Web, WWW'09, pages 421--430, New York, USA, 2009. ACM. ISBN 978-1-60558-487-4. doi: 10.1145/1526709.1526766. Google Scholar
Digital Library
- W. Dong. High-dimensional Similarity Search for Large Datasets. PhD thesis, Princeton, NJ, USA, 2011. AAI3481579. Google Scholar
Digital Library
- H. Esmaeilzadeh, E. Blem, R. St. Amant, K. Sankaralingam, and D. Burger. Dark silicon and the end of multicore scaling. In Proceedings of the 38th Annual International Symposium on Computer Architecture, ISCA '11, pages 365--376, New York, NY, USA, 2011. ACM. ISBN 978-1-4503-0472-6. doi: 10.1145/2000064.2000108. URL http://doi.acm.org/10.1145/2000064.2000108. Google Scholar
Digital Library
- M. Ferdman, A. Adileh, O. Kocberber, S. Volos, M. Alisafaee, D. Jevdjic, C. Kaynak, A. D. Popescu, A. Ailamaki, and B. Falsafi. Clearing the clouds: a study of emerging scale-out workloads on modern hardware. In Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 37--48, New York, NY, USA, 2012. ACM. ISBN 978-1-4503-0759-8. doi: 10.1145/2150976. 2150982. Google Scholar
Digital Library
- A. Gionis, P. Indyk, and R. Motwani. Similarity search in high dimensions via hashing. In Proceedings of the 25th International Conference on Very Large Data Bases, VLDB '99, pages 518--529, San Francisco, CA, USA, 1999. Morgan Kaufmann Publishers Inc. ISBN 1-55860-615-7. Google Scholar
Digital Library
- W. Gish and D. J. States. Identification of protein coding regions by database similarity search. Nat Genet, 3(3):266--272, Mar. 1993. doi: 10.1038/ng0393-266.Google Scholar
Cross Ref
- N. Goharian, T. El-Ghazawi, and D. Grossman. Enterprise text processing: a sparse matrix approach. In Information Technology: Coding and Computing, 2001. Proceedings. International Conference on, pages 71--75, Apr. 2001. doi: 10.1109/ITCC.2001.918768. Google Scholar
Digital Library
- Google Zeitgeist 2012. Google zeitgeist 2012. http://www.google.com/zeitgeist/2012/#the-world.Google Scholar
- T. H. Hetherington, T. G. Rogers, L. Hsu, M. O'Connor, and T. M. Aamodt. Characterizing and evaluating a key-value store application on heterogeneous CPU-GPU systems. In Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software, ISPASS '12, pages 88--98, Washington, DC, USA, 2012. IEEE Computer Society. ISBN 978-1-4673-1143-4. doi: 10.1109/ISPASS.2012.6189209. Google Scholar
Digital Library
- T. H. Hetherington, M. O'Connor, and T. M. Aamodt. Memcachedgpu: Scaling-up scale-out key-value stores. In Proceedings of the Sixth ACM Symposium on Cloud Computing, SoCC '15, pages 43--57, New York, NY, USA, 2015. ACM. ISBN 978-1-4503-3651-2. doi: 10.1145/2806777.2806836. URL http://doi.acm.org/10.1145/2806777.2806836. Google Scholar
Digital Library
- P. Indyk and R. Motwani. Approximate nearest neighbors: Towards removing the curse of dimensionality. In Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC '98, pages 604--613, New York, NY, USA, 1998. ACM. ISBN 0-89791-962-9. doi: 10.1145/276698.276876. Google Scholar
Digital Library
- Intel. Advancing moore's law in 2014the road to 14 nm. 2014. URL http://www.intel.com/content/www/us/en/silicon-innovations/advancing-moores-law-in-2014-presentation.html.Google Scholar
- V. Janapa Reddi, B. C. Lee, T. Chilimbi, and K. Vaid. Web search using mobile cores: quantifying and mitigating the price of efficiency. In Proceedings of the 37th annual international symposium on Computer architecture, ISCA '10, pages 314--325, New York, NY, USA, 2010. ACM. ISBN 978-1-4503-0053-7. doi: 10.1145/1815961.1816002. Google Scholar
Digital Library
- M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner. Adaptive parallelism for web search. In Proceedings of the 8th ACM European Conference on Computer Systems, EuroSys '13, pages 155--168, New York, NY, USA, 2013. ACM. ISBN 978-1-4503-1994-2. doi: 10.1145/2465351.2465367. Google Scholar
Digital Library
- S. Keckler, W. Dally, B. Khailany, M. Garland, and D. Glasco. GPUs and the future of parallel computing. Micro, IEEE, 31(5):7--17, Sept. 2011. ISSN 0272-1732. doi: 10.1109/MM.2011.89. Google Scholar
Digital Library
- T. Kgil, A. Saidi, N. Binkert, S. Reinhardt, K. Flautner, and T. Mudge. Picoserver: Using 3D stacking technology to build energy efficient servers. J. Emerg. Technol. Comput. Syst., 4(4):16:1--16:34, Nov. 2008. ISSN 1550-4832. doi: 10.1145/1412587.1412589. Google Scholar
Digital Library
- J. Koomey. A simple model for determining true total cost of ownership for data centers.Google Scholar
- J. R. Larus and M. Parkes. Using Cohort Scheduling to Enhance Server Performance (Extended Abstract). In LCTES '01: Proceedings of the ACM SIGPLAN workshop on Languages, compilers and tools for embedded systems, pages 182--187, New York, NY, USA, 2001. ACM. ISBN 1-58113-425-8. doi: 10.1145/384197.384222. Google Scholar
Digital Library
- D. Lipman and W. Pearson. Rapid and sensitive protein similarity searches. Science, 227(4693):1435--1441, 1985. doi: 10.1126/science.2983426. URL http://www.sciencemag.org/content/227/4693/1435.abstract.Google Scholar
Cross Ref
- W. Liu and B. Vinter. A framework for general sparse matrix-matrix multiplication on gpus and heterogeneous processors. J. Parallel Distrib. Comput., 85(C):47--61, Nov. 2015. ISSN 0743-7315. doi: 10.1016/j.jpdc.2015.06.010. URL http://dx.doi.org/10.1016/j.jpdc.2015.06.010. Google Scholar
Digital Library
- P. Lotfi-Kamran, B. Grot, M. Ferdman, S. Volos, O. Kocberber, J. Picorel, A. Adileh, D. Jevdjic, S. Idgunji, E. Ozer, and B. Falsafi. Scaleout processors. In Proceedings of the 39th Annual International Symposium on Computer Architecture, ISCA '12, pages --, Washington DC, USA, 2012. IEEE Computer Society. ISBN 978-1-4503-1642-2. doi: 10.1145/2337159.2337217. Google Scholar
Digital Library
- C. D. Manning, P. Raghavan, and H. Schütze. Introduction to Information Retrieval. Cambridge University Press, New York, NY, USA, 2008. ISBN 0521865719, 9780521865715. Google Scholar
Digital Library
- T. Mudge and U. Holzle. Challenges and opportunities for extremely energy-efficient processors. IEEE Micro, 30(4):20--24, July 2010. ISSN 0272-1732. doi: 10.1109/MM.2010.61. Google Scholar
Digital Library
- M. M. A. Patwary, N. R. Satish, N. Sundaram, J. Park, M. J. Anderson, S. G. Vadlamudi, D. Das, S. G. Pudov, V. O. Pirogov, and P. Dubey. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In High Performance Computing, pages 48--57. Springer, 2015.Google Scholar
Cross Ref
- A. Putnam, A. Caulfield, E. Chung, D. Chiou, K. Constantinides, J. Demme, H. Esmaeilzadeh, J. Fowers, G. P. Gopal, J. Gray, M. Haselman, S. Hauck, S. Heil, A. Hormati, J.-Y. Kim, S. Lanka, J. Larus, E. Peterson, S. Pope, A. Smith, J. Thong, P. Y. Xiao, and D. Burger. A reconfigurable fabric for accelerating large-scale datacenter services. In 41st Annual International Symposium on Computer Architecture (ISCA), June 2014. Google Scholar
Digital Library
- A. W. M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain. Content-based image retrieval at the end of the early years. Pattern Analysis and Machine Intelligence, IEEE Transactions on, 22(12): 1349--1380, Dec. 2000. ISSN 0162-8828. doi: 10.1109/34.895972. Google Scholar
Digital Library
- A. Spink, D. Wolfram, M. B. J. Jansen, and T. Saracevic. Searching the web: The public and their queries. J. Am. Soc. Inf. Sci. Technol., 52(3):226--234, Feb. 2001. ISSN 1532-2882. doi: 10.1002/1097-4571(2000)9999:9999⟨::AID-ASI1591⟩3.3.CO;2-I. Google Scholar
Digital Library
- N. Sundaram, A. Turmukhametova, N. Satish, T. Mostak, P. Indyk, S. Madden, and P. Dubey. Streaming similarity search over one billion tweets using parallel locality-sensitive hashing. Proc. VLDB Endow., 6(14):1930--1941, Sept. 2013. ISSN 2150-8097. URL http://dl.acm.org/citation.cfm?id=2556549.2556574. Google Scholar
Digital Library
- H. Sundmaeker, P. Guillemin, P. Friess, and S. Woelfflé. Vision and challenges for realising the internet of things. Cluster of European Research Projects on the Internet of Things, European Commision, 2010.Google Scholar
- Verizon. State of the market the internet of things 2015. 2015. URL http://www.verizonenterprise.com/resources/reports/rp_state-of-market-the-market-the-internet-of-things-2015_en_xg.pdf.Google Scholar
Index Terms
Exploiting accelerators for efficient high dimensional similarity search
Recommendations
Exploiting accelerators for efficient high dimensional similarity search
PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel ProgrammingSimilarity search finds the most similar matches in an object collection for a given query; making it an important problem across a wide range of disciplines such as web search, image recognition and protein sequencing. Practical implementations of High ...
A comparative investigation of device-specific mechanisms for exploiting HPC accelerators
GPGPU-8: Proceedings of the 8th Workshop on General Purpose Processing using GPUsA variety of computational accelerators have been greatly improved in recent years. Intel's MIC (Many Integrated Core) and both GPU architectures, NVIDIA's Kepler and AMD's Graphics Core Next, all represent real innovations in the field of HPC. Based ...
Multi-GPU DGEMM and High Performance Linpack on Highly Energy-Efficient Clusters
High Performance Linpack can maximize requirements throughout a computer system. An efficient multi-GPU double-precision general matrix multiply (DGEMM), together with adjustments to the HPL, is required to utilize a heterogeneous computer to its full ...






Comments