skip to main content

Bridging the gap between deep learning and sparse matrix format selection

Published:10 February 2018Publication History
Skip Abstract Section

Abstract

This work presents a systematic exploration on the promise and special challenges of deep learning for sparse matrix format selection---a problem of determining the best storage format for a matrix to maximize the performance of Sparse Matrix Vector Multiplication (SpMV). It describes how to effectively bridge the gap between deep learning and the special needs of the pillar HPC problem through a set of techniques on matrix representations, deep learning structure, and cross-architecture model migrations. The new solution cuts format selection errors by two thirds, and improves SpMV performance by 1.73X on average over the state of the art.

Skip Supplemental Material Section

Supplemental Material

References

  1. F. Agakov, E. Bonilla, J. Cavazos, B. Franke, G. Fursin, M. F. P. O'boyle, J. Thomson, M. Toussaint, and C. K. I. Williams. 2006. Using machine learning to focus iterative optimization. In International Symposium on Code Generation and Optimization. 295--305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. L. Almagor, Keith D. Cooper, Alexander Grosul, Timothy J. Harvey, Steven W. Reeves, Devika Subramanian, Linda Torczon, and Todd Waterman. 2004. Finding effective compilation sequences.. In LCTES'04. 231--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Jason Ansel, Cy Chan, Yee Lok Wong, Marek Olszewski, Qin Zhao, Alan Edelman, and Saman Amarasinghe. 2009. PetaBricks: A Language and Compiler for Algorithmic Choice. In PLDI. Dublin, Ireland. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. H. Anzt, J. Dongarra, M. Kreutzer, G. Wellein, and M. KÃűhler. 2016. Efficiency of General Krylov Methods on GPUs - An Experimental Study. In 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 683--691.Google ScholarGoogle Scholar
  5. Amir Beck and Marc Teboulle. 2009. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences 2, 1 (2009), 183--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Nathan Bell and Michael Garland. 2009. Implementing Sparse Matrix-vector Multiplication on Throughput-oriented Processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). ACM, New York, NY, USA, Article 18, 11 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sanjukta Bhowmick, Brice Toth, and Padma Raghavan. 2009. Towards low-cost, high-accuracy classifiers for linear solver selection. In International Conference on Computational Science. Springer, 463--472. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jeff Bolz, Ian Farmer, Eitan Grinspun, and Peter Schröoder. 2003. Sparse Matrix Solvers on the GPU: Conjugate Gradients and Multigrid. In ACM SIGGRAPH 2003 Papers (SIGGRAPH '03). ACM, New York, NY, USA, 917--924. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Sergey Brin and Lawrence Page. 1998. The Anatomy of a Large-scale Hypertextual Web Search Engine. In Proceedings of the Seventh International Conference on World Wide Web 7 (WWW7). Elsevier Science Publishers B. V., Amsterdam, The Netherlands, The Netherlands, 107--117. http://dl.acm.org/citation.cfm?id=297805.297827 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Jee W. Choi, Amik Singh, and Richard W. Vuduc. 2010. Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs. In Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '10). ACM, New York, NY, USA, 115--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011), 25 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Y. Ding, J. Ansel, K. Veeramachaneni, X. Shen, U. O'Reilly, and S. Amarasinghe. 2015. Autotuning Algorithmic Choice for Input Sensitivity. In Proceedings of the 36th annual ACM SIGPLAN conference on Programming Language Design and Implementation. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. D. Falgout. 2006. An Introduction to Algebraic Multigrid Computing. Computing in Science Engineering 8, 6 (Nov 2006), 24--33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Grigori Fursin, Cupertino Miranda, Olivier Temam, Mircea Namolaru, Elad Yom-Tov, Ayal Zaks, Bilha Mendelson, Edwin Bonilla, John Thomson, Hugh Leather, Chris Williams, Michael O'Boyle, Phil Barnard, Elton Ashton, Eric Courtois, and Francois Bodin. 2008. MILEPOST GCC: machine learning based research compiler. In Proceedings of the GCC Developers' Summit.Google ScholarGoogle Scholar
  15. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Deep residual learning for image recognition. arXiv preprint arXiv:1512.03385 (2015).Google ScholarGoogle Scholar
  16. Marat F. Khairoutdinov and David A. Randall. 2001. A cloud resolving model as a cloud parameterization in the NCAR Community Climate System Model: Preliminary results. Geophysical Research Letters 28, 18 (2001), 3617--3620.Google ScholarGoogle ScholarCross RefCross Ref
  17. Kornilios Kourtis, Vasileios Karakasis, Georgios Goumas, and Nectarios Koziris. 2011. CSX: An Extended Compression Format for Spmv on Shared Memory Systems. In Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming (PPoPP '11). ACM, New York, NY, USA, 247--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems. 1097--1105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Daniel Langr and Pavel Tvrdik. 2016. Evaluation Criteria for Sparse Matrix Storage Formats. IEEE Trans. Parallel Distrib. Syst. 27, 2 (Feb. 2016), 428--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jiajia Li, Guangming Tan, Mingyu Chen, and Ninghui Sun. 2013. SMAT: An Input Adaptive Auto-tuner for Sparse Matrix-vector Multiplication. In Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '13). ACM, New York, NY, USA, 117--126. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Weifeng Liu. 2016. Benchmark SpMV using CSR5. https://github.com/bhSPARSE/Benchmark_SpMV_using_CSR5. (2016).Google ScholarGoogle Scholar
  22. Weifeng Liu, Ang Li, Jonathan D. Hogg, Iain S. Duff, and Brian Vinter. 2017. Fast Synchronization-Free Algorithms for Parallel Sparse Triangular Solves with Multiple Right-Hand Sides. Concurrency and Computation: Practice and Experience 29, 21 (2017), e4244-n/a.Google ScholarGoogle ScholarCross RefCross Ref
  23. Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, 339--350. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Weifeng Liu and Brian Vinter. 2015. Speculative Segmented Sum for Sparse Matrix-vector Multiplication on Heterogeneous Processors. Parallel Comput. 49, C (Nov. 2015), 179--193. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Xing Liu, Mikhail Smelyanskiy, Edmond Chow, and Pradeep Dubey. 2013. Efficient Sparse Matrix-vector Multiplication on x86-based Many-core Processors. In Proceedings of the 27th International ACM Conference on International Conference on Supercomputing (ICS '13). ACM, New York, NY, USA, 273--282. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Duane Merrill and Michael Garland. 2016. Merge-based Sparse Matrix-vector Multiplication (SpMV) Using the CSR Storage Format. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, Article 43, 2 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Kevin P Murphy. 2012. Machine learning: a probabilistic perspective. MIT press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M Naumov, LS Chien, P Vandermersch, and U Kapasi. 2010. CUSPARSE Library: A Set of Basic Linear Algebra Subroutines for Sparse Matrices. In GPU Technology Conference, Vol. 2070.Google ScholarGoogle Scholar
  29. Eunjung Park, L.-N. Pouche, J. Cavazos, A. Cohen, and P. Sadayappan. 2011. Predictive modeling in a polyhedral optimization space. In IEEE/ACM International Symposium on Code Generation and Optimization. 119 --129. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Ali Sharif Razavian, Hossein Azizpour, Josephine Sullivan, and Stefan Carlsson. 2014. CNN Features off-the-shelf: an Astounding Baseline for Recognition. CoRR abs/1403.6382 (2014). http://arxiv.org/abs/1403.6382Google ScholarGoogle Scholar
  31. Yousef Saad. 1994. SPARSKIT : a basic tool kit for sparse matrix computations. Technical Report. University of Minnesota.Google ScholarGoogle Scholar
  32. Naser Sedaghati, Te Mu, Louis-Noel Pouchet, Srinivasan Parthasarathy, and P. Sadayappan. 2015. Automatic Selection of Sparse Matrix Representation on GPUs. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS '15). ACM, New York, NY, USA, 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Bor-Yiing Su and Kurt Keutzer. 2012. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS '12). ACM, New York, NY, USA, 353--364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2015. Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition. 1--9.Google ScholarGoogle ScholarCross RefCross Ref
  35. Guangming Tan, Junhong Liu, and Jiajia Li. 2018. Design and Implementation of Adaptive SpMV Library for Multicore and Manycore Architecture. ACM Trans. Math. Softw. (To appear) (2018).Google ScholarGoogle Scholar
  36. K. Tian, Y. Jiang, E. Zhang, and X. Shen. 2010. An Input-Centric Paradigm for Program Dynamic Optimizations. In the Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Richard Wilson Vuduc. 2003. Automatic Performance Tuning of Sparse Matrix Kernels. Ph.D. Dissertation. AAI3121741.Google ScholarGoogle Scholar
  38. Richard W. Vuduc and Hyun-Jin Moon. 2005. Fast Sparse Matrix-vector Multiplication by Exploiting Variable Block Structure. In Proceedings of the First International Conference on High Performance Computing and Communications (HPCC'05). Springer-Verlag, Berlin, Heidelberg, 807--816. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Endong Wang, Qing Zhang, Bo Shen, Guangyong Zhang, Xiaowei Lu, Qing Wu, and Yajuan Wang. 2014. Intel math kernel library. In High-Performance Computing on the Intel® Xeon Phi™. Springer, 167--188.Google ScholarGoogle Scholar
  40. Xinliang Wang, Weifeng Liu, Wei Xue, and Li Wu. 2018. swSpTRSV: a Fast Sparse Triangular Solve with Sparse Level Tile Layout on Sunway Architectures. In Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (To appear) (PPoPP '18). Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2009. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. Parallel Comput. 35, 3 (2009), 178 -- 194. Revolutionary Technologies for Acceleration of Emerging Petascale Applications. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Biwei Xie, Jianfeng Zhan, Zhen Jia, Wanling Gao, Lixin Zhang, and Xu Liu. 2018. CVR: Efficient SpMV Vectorization on X86 Processors. The 2018 International Symposium on Code Generation and Optimization (To appear) (2018).Google ScholarGoogle Scholar
  43. Shengen Yan, Chao Li, Yunquan Zhang, and Huiyang Zhou. 2014. yaSpMV: Yet Another SpMV Framework on GPUs. In Proceedings of the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '14). ACM, New York, NY, USA, 107--118. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks? CoRR abs/1411.1792 (2014). http://arxiv.org/abs/1411.1792 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Bridging the gap between deep learning and sparse matrix format selection

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!