skip to main content
research-article
Open Access
Artifacts Evaluated & Functional

Format abstraction for sparse tensor algebra compilers

Published:24 October 2018Publication History
Skip Abstract Section

Abstract

This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code generator where new formats can be added as plugins. We then describe six implementations of the interface that compose to form the dense, CSR/CSF, COO, DIA, ELL, and HASH tensor formats and countless variants thereof. With these implementations at hand, our code generator can generate code to compute any tensor algebra expression on any combination of the aforementioned formats.

To demonstrate our technique, we have implemented it in the taco tensor algebra compiler. Our modular code generator design makes it simple to add support for new tensor formats, and the performance of the generated code is competitive with hand-optimized implementations. Furthermore, by extending taco to support a wider range of formats specialized for different application and data characteristics, we can improve end-user application performance. For example, if input data is provided in the COO format, our technique allows computing a single matrix-vector multiplication directly with the data in COO, which is up to 3.6× faster than by first converting the data to CSR.

Skip Supplemental Material Section

Supplemental Material

a123-chou.webm

References

  1. Karin A. Remington. 1996. NIST Sparse BLAS User’s Guide. (08 1996).Google ScholarGoogle Scholar
  2. Animashree Anandkumar, Rong Ge, Daniel Hsu, Sham M. Kakade, and Matus Telgarsky. 2014. Tensor Decompositions for Learning Latent Variable Models. J. Mach. Learn. Res. 15, Article 1 (Jan. 2014), 60 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Gilad Arnold. 2011. Data-Parallel Language for Correct and Efficient Sparse Matrix Codes. Ph.D. Dissertation. University of California, Berkeley. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Gilad Arnold, Johannes Hölzl, Ali Sinan Köksal, Rastislav Bodík, and Mooly Sagiv. 2010. Specifying and Verifying Sparse Matrix Codes. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP ’10). ACM, New York, NY, USA, 249–260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alexander A. Auer, Gerald Baumgartner, David E. Bernholdt, Alina Bibireata, Venkatesh Choppella, Daniel Cociorva, Xiaoyang Gao, Robert Harrison, Sriram Krishnamoorthy, Sandhya Krishnan, Chi-Chung Lam, Qingda Lu, Marcel Nooijen, Russell Pitzer, J. Ramanujam, P. Sadayappan, and Alexander Sibiryakov. 2006. Automatic code generation for many-body electronic structure methods: the tensor contraction engine. Molecular Physics 104, 2 (2006), 211–228.Google ScholarGoogle ScholarCross RefCross Ref
  6. Brett W Bader and Tamara G Kolda. 2007. Efficient MATLAB computations with sparse and factored tensors. SIAM Journal on Scientific Computing 30, 1 (2007), 205–231. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. M. Baskaran, B. Meister, N. Vasilache, and R. Lethin. 2012. Efficient and scalable computations with sparse tensors. In 2012 IEEE Conference on High Performance Extreme Computing. 1–6.Google ScholarGoogle Scholar
  8. Nathan Bell and Michael Garland. 2008. Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Technical Report NVR-2008-004. NVIDIA Corporation.Google ScholarGoogle Scholar
  9. Aart JC Bik and Harry AG Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. ACM, 416–424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aart JC Bik and Harry AG Wijshoff. 1994. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing. Springer, 57–75.Google ScholarGoogle Scholar
  11. Aydin Buluç, Jeremy T Fineman, Matteo Frigo, John R Gilbert, and Charles E Leiserson. 2009. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures. ACM, 233–244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Aydin Buluç and John R. Gilbert. 2008. On the representation and multiplication of hypersparse matrices. In IEEE International Symposium on Parallel and Distributed Processing, (IPDPS). 1–11.Google ScholarGoogle Scholar
  13. Y. H. Chen, T. Krishna, J. S. Emer, and V. Sze. 2017. Eyeriss: An Energy-Efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks. IEEE Journal of Solid-State Circuits 52, 1 (Jan 2017), 127–138.Google ScholarGoogle ScholarCross RefCross Ref
  14. Timothy A Davis. 2006. Direct methods for sparse linear systems. SIAM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw. 38, 1, Article 1 (Dec. 2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Eduardo F. D’Azevedo, Mark R. Fahey, and Richard T. Mills. 2005. Vectorized Sparse Matrix Multiply for Compressed Row Storage Format. In Proceedings of the 5th International Conference on Computational Science - Volume Part I (ICCS’05). Springer-Verlag, Berlin, Heidelberg, 99–106. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Evgeny Epifanovsky, Michael Wormit, Tomasz Kuś, Arie Landau, Dmitry Zuev, Kirill Khistyaev, Prashant Manohar, Ilya Kaliman, Andreas Dreuw, and Anna I Krylov. 2013. New implementation of high-level correlated methods using a general block tensor library for high-performance electronic structure calculations. Journal of computational chemistry 34, 26 (2013), 2293–2309.Google ScholarGoogle ScholarCross RefCross Ref
  18. Richard Feynman, Robert B. Leighton, and Matthew L. Sands. 1963. The Feynman Lectures on Physics. Vol. 3. Addison-Wesley.Google ScholarGoogle Scholar
  19. Peter Gottschling, David S. Wise, and Michael D. Adams. 2007. Representation-transparent Matrix Algorithms with Scalable Performance. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS ’07). ACM, New York, NY, USA, 116–125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Eun-jin Im and Katherine Yelick. 1998. Model-Based Memory Hierarchy Optimizations for Sparse Matrices. In In Workshop on Profile and Feedback-Directed Compilation.Google ScholarGoogle Scholar
  21. Intel. 2012. Intel math kernel library reference manual. Technical Report. 630813-051US, 2012. http://software.intel.com/ sites/products/documentation/hpc/mkl/mklman/mklman.pdf .Google ScholarGoogle Scholar
  22. Eric Jones, Travis Oliphant, Pearu Peterson, et al. 2001. SciPy: Open source scientific tools for Python. http://www.scipy.org/ {Online; accessed <today>}.Google ScholarGoogle Scholar
  23. David R. Kincaid, Thomas C. Oppe, and David M. Young. 1989. ITPACKV 2D User’s Guide.Google ScholarGoogle Scholar
  24. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang. 1, OOPSLA, Article 77 (Oct. 2017), 29 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Donald Ervin Knuth. 1973. The art of computer programming: sorting and searching. Vol. 3. Pearson Education.Google ScholarGoogle Scholar
  26. P. Koanantakool, A. Azad, A. BuluÃğ, D. Morozov, S. Y. Oh, L. Oliker, and K. Yelick. 2016. Communication-Avoiding Parallel Sparse-Dense Matrix-Matrix Multiplication. In 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 842–853.Google ScholarGoogle Scholar
  27. Joseph C Kolecki. 2002. An Introduction to Tensors for Students of Physics and Engineering. Unixenguaedu 7, September (2002), 29.Google ScholarGoogle Scholar
  28. Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Ph.D. Dissertation. Cornell. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par’97 Parallel Processing. Springer, 318–327. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Jiajia Li, Casey Battaglino, Ioakeim Perros, Jimeng Sun, and Richard Vuduc. 2015. An input-adaptive and in-place approach to dense tensor-times-matrix multiply. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, 76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. B. Liu, C. Wen, A. D. Sarwate, and M. M. Dehnavi. 2017. A Unified Optimization Approach for Sparse Tensor Operations on GPUs. In 2017 IEEE International Conference on Cluster Computing (CLUSTER). 47–57.Google ScholarGoogle Scholar
  32. Devin Matthews. 2017. High-Performance Tensor Contraction without Transposition. Technical Report.Google ScholarGoogle Scholar
  33. Kathryn S McKinley, Steve Carr, and Chau-Wen Tseng. 1996. Improving data locality with loop transformations. ACM Transactions on Programming Languages and Systems (TOPLAS) 18, 4 (1996), 424–453. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. John Michael McNamee. 1971. Algorithm 408: a sparse matrix package (part I){F4}. Commun. ACM 14, 4 (1971), 265–273. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In High Performance Embedded Architectures and Compilers, Yale N. Patt, Pierfrancesco Foglia, Evelyn Duesterwald, Paolo Faraboschi, and Xavier Martorell (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 111–125. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. National Institute of Standards and Technology. 2013. Matrix Market: File Formats. http://math.nist.gov/MatrixMarket/ formats.htmlGoogle ScholarGoogle Scholar
  37. Thomas Nelson, Geoffrey Belter, Jeremy G. Siek, Elizabeth Jessup, and Boyana Norris. 2015. Reliable Generation of High-Performance Matrix Algebra. ACM Trans. Math. Softw. 41, 3, Article 18 (June 2015), 27 pages. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. William Pugh and Tatiana Shpeisman. 1999. SIPR: A new framework for generating efficient code for sparse matrix computations. In Languages and Compilers for Parallel Computing. Springer, 213–229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Hongbo Rong, Jongsoo Park, Lingxiang Xiang, Todd A. Anderson, and Mikhail Smelyanskiy. 2016. Sparso: Context-driven Optimizations of Sparse Linear Algebra. In Proceedings of the 2016 International Conference on Parallel Architectures and Compilation. ACM, 247–259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017a. FROSTT file formats. http://frostt.io/tensors/file-formats.htmlGoogle ScholarGoogle Scholar
  42. Shaden Smith, Jee W. Choi, Jiajia Li, Richard Vuduc, Jongsoo Park, Xing Liu, and George Karypis. 2017b. FROSTT: The Formidable Repository of Open Sparse Tensors and Tools. http://frostt.io/Google ScholarGoogle Scholar
  43. Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. ACM, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Shaden Smith, Niranjay Ravindran, Nicholas Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 61–70. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Edgar Solomonik, Devin Matthews, Jeff R Hammond, John F Stanton, and James Demmel. 2014. A massively parallel tensor contraction framework for coupled-cluster computations. J. Parallel and Distrib. Comput. 74, 12 (2014), 3176–3190. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Daniele G Spampinato and Markus Püschel. 2014. A basic linear algebra compiler. In Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. ACM, 23. Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Paul Springer and Paolo Bientinesi. 2016. Design of a high-performance GEMM-like Tensor-Tensor Multiplication. arXiv preprint arXiv:1607.00145 (2016).Google ScholarGoogle Scholar
  48. Paul Stodghill. 1997. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. Ph.D. Dissertation. Cornell. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Bor-Yiing Su and Kurt Keutzer. 2012. clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs. In Proceedings of the 26th ACM International Conference on Supercomputing (ICS ’12). ACM, New York, NY, USA, 353–364. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. The SciPy community. 2018a. scipy.sparse.dok_matrix – SciPy v1.1.0 Reference Guide. https://docs.scipy.org/doc/scipy/ reference/generated/scipy.sparse.dok_matrix.html .Google ScholarGoogle Scholar
  51. The SciPy community. 2018b. scipy.sparse.lil_matrix – SciPy v1.1.0 Reference Guide. https://docs.scipy.org/doc/scipy/ reference/generated/scipy.sparse.lil_matrix.html .Google ScholarGoogle Scholar
  52. Scott Thibault, Lenore Mullin, and Matt Insall. 1994. Generating Indexing Functions of Regularly Sparse Arrays for Array Compilers.Google ScholarGoogle Scholar
  53. William F Tinney and John W Walker. 1967. Direct solutions of sparse network equations by optimally ordered triangular factorization. Proc. IEEE 55, 11 (1967), 1801–1809.Google ScholarGoogle ScholarCross RefCross Ref
  54. Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). 521–532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  55. Michael E. Wolf and Monica S. Lam. 1991. A Data Locality Optimizing Algorithm. SIGPLAN Not. 26, 6 (May 1991), 30–44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Michael Joseph Wolfe. 1982. Optimizing Supercompilers for Supercomputers. Ph.D. Dissertation. University of Illinois at Urbana-Champaign, Champaign, IL, USA. AAI8303027.Google ScholarGoogle Scholar
  57. Albert-Jan N. Yzelman and Rob H. Bisseling. 2012. A Cache-Oblivious Sparse Matrix–Vector Multiplication Scheme Based on the Hilbert Curve. In Progress in Industrial Mathematics at ECMI 2010, Michael Günther, Andreas Bartel, Markus Brunk, Sebastian Schöps, and Michael Striebel (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 627–633.Google ScholarGoogle Scholar

Index Terms

  1. Format abstraction for sparse tensor algebra compilers

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!