skip to main content
research-article
Open Access
Artifacts Evaluated & Functional / v1.1

Compilation of dynamic sparse tensor algebra

Published:31 October 2022Publication History
Skip Abstract Section

Abstract

Many applications, from social network graph analytics to control flow analysis, compute on sparse data that evolves over the course of program execution. Such data can be represented as dynamic sparse tensors and efficiently stored in formats (data layouts) that utilize pointer-based data structures like block linked lists, binary search trees, B-trees, and C-trees among others. These specialized formats support fast in-place modification and are thus better suited than traditional, array-based data structures like CSR for storing dynamic sparse tensors. However, different dynamic sparse tensor formats have distinct benefits and drawbacks, and performing different computations on tensors that are stored in different formats can require vastly dissimilar code that are not straightforward to correctly implement and optimize.

This paper shows how a compiler can generate efficient code to compute tensor algebra operations on dynamic sparse tensors that may be stored in a wide range of disparate formats. We propose a language for precisely specifying recursive, pointer-based data structures, and we show how this language can express many different dynamic data structures, including all the ones named above as well as many more. We then describe how, given high-level specifications of such dynamic data structures, a compiler can emit code to efficiently access and compute on dynamic sparse tensors that are stored in the aforementioned data structures.

We evaluate our technique and find it generates efficient dynamic sparse tensor algebra kernels that have performance comparable to, if not better than, state-of-the-art libraries and frameworks such as PAM, Aspen, STINGER, and Terrace. At the same time, our technique supports a wider range of tensor algebra operations---such as those that simultaneously compute with static and dynamic sparse tensors---than Aspen, STINGER, and Terrace, while also achieving significantly better performance than PAM for those same operations.

References

  1. Seher Acer, Oguz Selvitopi, and Cevdet Aykanat. 2016. Improving Performance of Sparse Matrix Dense Matrix Multiplication on Large-Scale Parallel Systems. Parallel Comput., 59, C (2016), nov, 71–96. issn:0167-8191 https://doi.org/10.1016/j.parco.2016.10.001 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Gilad Arnold, Johannes Hölzl, Ali Sinan Köksal, Rastislav Bodík, and Mooly Sagiv. 2010. Specifying and Verifying Sparse Matrix Codes. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP ’10). ACM, New York, NY, USA. 249–260. isbn:978-1-60558-794-3 https://doi.org/10.1145/1863543.1863581 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, and John D. Owens. 2020. Dynamic Graphs on the GPU. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 739–748. https://doi.org/10.1109/IPDPS47924.2020.00081 Google ScholarGoogle ScholarCross RefCross Ref
  4. Ariful Azad and Ayd∈ Buluç. 2017. A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 688–697. https://doi.org/10.1109/IPDPS.2017.76 Google ScholarGoogle ScholarCross RefCross Ref
  5. M. Baskaran, B. Meister, N. Vasilache, and R. Lethin. 2012. Efficient and scalable computations with sparse tensors. In 2012 IEEE Conference on High Performance Extreme Computing. 1–6. https://doi.org/10.1109/HPEC.2012.6408676 Google ScholarGoogle ScholarCross RefCross Ref
  6. Nathan Bell and Michael Garland. 2008. Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Corporation. Google ScholarGoogle Scholar
  7. Aart JC Bik. 1996. Compiler Support for Sparse Matrix Computations. Ph. D. Dissertation. Leiden University. Google ScholarGoogle Scholar
  8. Aart J.C. Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., jun, issn:1544-3566 https://doi.org/10.1145/3544559 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Aart JC Bik and Harry AG Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. 416–424. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aart JC Bik and Harry AG Wijshoff. 1994. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing. Springer, 57–75. Google ScholarGoogle Scholar
  11. Federico Busato, Oded Green, Nicola Bombieri, and David A. Bader. 2018. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs. In 2018 IEEE High Performance extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC.2018.8547541 Google ScholarGoogle ScholarCross RefCross Ref
  12. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, Oct., 30 pages. issn:2475-1421 Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2020. Automatic Generation of Efficient Sparse Tensor Format Conversion Routines. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 823–838. isbn:9781450376136 https://doi.org/10.1145/3385412.3385963 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press. isbn:0262033844 Google ScholarGoogle Scholar
  15. Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), Article 1, Dec., issn:0098-3500 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2019. Low-Latency Graph Streaming Using Compressed Purely-Functional Trees. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 918–934. isbn:9781450367127 https://doi.org/10.1145/3314221.3314598 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. David Ediger, Rob McColl, Jason Riedy, and David A. Bader. 2012. STINGER: High performance data structure for streaming graphs. In 2012 IEEE Conference on High Performance Extreme Computing. 1–5. https://doi.org/10.1109/HPEC.2012.6408680 Google ScholarGoogle ScholarCross RefCross Ref
  18. Gaël Guennebaud and Benoît Jacob. 2010. Eigen v3. http://eigen.tuxfamily.org. Google ScholarGoogle Scholar
  19. Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly Sagiv. 2011. Data Representation Synthesis. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). Association for Computing Machinery, New York, NY, USA. 38–49. isbn:9781450306638 https://doi.org/10.1145/1993498.1993504 Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Rawn Henry, Olivia Hsu, Rohan Yadav, Stephen Chou, Kunle Olukotun, Saman Amarasinghe, and Fredrik Kjolstad. 2021. Compilation of Sparse Array Programming Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 128, Oct., 29 pages. https://doi.org/10.1145/3485505 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive Sparse Tiling for Sparse Matrix Multiplication. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). Association for Computing Machinery, New York, NY, USA. 300–314. isbn:9781450362252 https://doi.org/10.1145/3293883.3295712 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Intel. 2020. Intel oneAPI Math Kernel Library Developer Reference. https://www.intel.com/content/dam/develop/external/us/en/documents/onemkl-developerreference-c.pdf Google ScholarGoogle Scholar
  23. James King, Thomas Gilray, Robert M. Kirby, and Matthew Might. 2016. Dynamic Sparse-Matrix Allocation on GPUs. In High Performance Computing, Julian M. Kunkel, Pavan Balaji, and Jack Dongarra (Eds.). Springer International Publishing, Cham. 61–80. isbn:978-3-319-41321-1 Google ScholarGoogle Scholar
  24. Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. 180–192. isbn:978-1-7281-1436-1 http://dl.acm.org/citation.cfm?id=3314872.3314894 Google ScholarGoogle Scholar
  25. Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Etienne Kneuss, Ivan Kuraj, Viktor Kuncak, and Philippe Suter. 2013. Synthesis modulo Recursive Functions. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’13). Association for Computing Machinery, New York, NY, USA. 407–426. isbn:9781450323741 https://doi.org/10.1145/2509136.2509555 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Ph. D. Dissertation. Cornell University. Google ScholarGoogle Scholar
  28. Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par’97 Parallel Processing. Springer, 318–327. Google ScholarGoogle Scholar
  29. Pradeep Kumar and H. Howie Huang. 2019. GraphOne: A Data Store for Real-time Analytics on Evolving Graphs. In 17th USENIX Conference on File and Storage Technologies (FAST 19). USENIX Association, Boston, MA. 249–263. isbn:978-1-939133-09-0 https://www.usenix.org/conference/fast19/presentation/kumar Google ScholarGoogle Scholar
  30. Darya Kurilova and Derek Rayside. 2013. On the Simplicity of Synthesizing Linked Data Structure Operations. SIGPLAN Not., 49, 3 (2013), oct, 155–158. issn:0362-1340 https://doi.org/10.1145/2637365.2517225 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Tobin J. Lehman and Michael J. Carey. 1986. A Study of Index Structures for Main Memory Database Management Systems. In Proceedings of the 12th International Conference on Very Large Data Bases (VLDB ’86). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 294–303. isbn:0934613184 Google ScholarGoogle Scholar
  32. Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical Storage of Sparse Tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’18). IEEE Press, Piscataway, NJ, USA. Article 19, 15 pages. https://doi.org/10.1109/SC.2018.00022 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS ’15). ACM, New York, NY, USA. 339–350. isbn:978-1-4503-3559-1 https://doi.org/10.1145/2751205.2751209 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Peter Macko, Virendra J. Marathe, Daniel W. Margo, and Margo I. Seltzer. 2015. LLAMA: Efficient graph analytics using Large Multiversioned Arrays. In 2015 IEEE 31st International Conference on Data Engineering. 363–374. https://doi.org/10.1109/ICDE.2015.7113298 Google ScholarGoogle ScholarCross RefCross Ref
  35. Tim Mattson, David Bader, Jon Berry, Ayd∈ Buluç, Jack Dongarra, Christos Faloutsos, John Feo, John R. Gilbert, Joseph Gonzalez, Bruce Hendrickson, Jeremy Kepner, Charles E Leiserson, Andrew Lumsdaine, David Padua, Stephen Poole, Steve Reinhardt, Michael Stonebraker, Steve Wallach, and Andrew Yoo. 2013. Standards for Graph Algorithm Primitives. In IEEE High Performance Extreme Computing Conference. IEEE, 1–2. https://doi.org/10.1109/HPEC.2013.6670338 Google ScholarGoogle ScholarCross RefCross Ref
  36. Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In High Performance Embedded Architectures and Compilers, Yale N. Patt, Pierfrancesco Foglia, Evelyn Duesterwald, Paolo Faraboschi, and Xavier Martorell (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 111–125. isbn:978-3-642-11515-8 Google ScholarGoogle Scholar
  37. L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The PageRank citation ranking: Bringing order to the Web. In Proceedings of the 7th International World Wide Web Conference. Brisbane, Australia. 161–172. citeseer.nj.nec.com/page98pagerank.html Google ScholarGoogle Scholar
  38. Prashant Pandey, Brian Wheatman, Helen Xu, and Aydin Buluc. 2021. Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD/PODS ’21). Association for Computing Machinery, New York, NY, USA. 1372–1385. isbn:9781450383431 https://doi.org/10.1145/3448016.3457313 Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2016. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. arxiv:1608.01409. Google ScholarGoogle Scholar
  40. Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 522–538. isbn:9781450342612 https://doi.org/10.1145/2908080.2908093 Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. William Pugh and Tatiana Shpeisman. 1999. SIPR: A new framework for generating efficient code for sparse matrix computations. In Languages and Compilers for Parallel Computing. Springer, 213–229. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Xiaokang Qiu and Armando Solar-Lezama. 2017. Natural Synthesis of Provably-Correct Data-Structure Manipulations. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 65, oct, 28 pages. https://doi.org/10.1145/3133889 Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, and Trishul Chilimbi. 2017. Optimizing CNNs on Multicores for Scalability, Performance and Goodput. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). ACM, New York, NY, USA. 267–280. isbn:978-1-4503-4465-4 https://doi.org/10.1145/3037697.3037745 Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Derek Rayside, Vajihollah Montaghami, Francesca Leung, Albert Yuen, Kevin Xu, and Daniel Jackson. 2012. Synthesizing Iterators from Abstraction Functions. SIGPLAN Not., 48, 3 (2012), sep, 31–40. issn:0362-1340 https://doi.org/10.1145/2480361.2371407 Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM. Google ScholarGoogle Scholar
  46. Dipanjan Sengupta and Shuaiwen Leon Song. 2017. EvoGraph: On-the-Fly Efficient Mining of Evolving Graphs on GPU. In High Performance Computing, Julian M. Kunkel, Rio Yokota, Pavan Balaji, and David Keyes (Eds.). Springer International Publishing, Cham. 97–119. isbn:978-3-319-58667-0 Google ScholarGoogle Scholar
  47. Mo Sha, Yuchen Li, Bingsheng He, and Kian-Lee Tan. 2017. Accelerating Dynamic Graph Analytics on GPUs. Proc. VLDB Endow., 11, 1 (2017), sep, 107–120. issn:2150-8097 https://doi.org/10.14778/3151113.3151122 Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. SIGPLAN Not., 48, 8 (2013), feb, 135–146. issn:0362-1340 https://doi.org/10.1145/2517327.2442530 Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Rishabh Singh and Armando Solar-Lezama. 2011. Synthesizing Data Structure Manipulations from Storyboards. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA. 289–299. isbn:9781450304436 https://doi.org/10.1145/2025113.2025153 Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Shaden Smith, Niranjay Ravindran, Nicholas Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 61–70. issn:1530-2075 Google ScholarGoogle Scholar
  52. Paul Stodghill. 1997. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. Ph. D. Dissertation. Cornell University. Google ScholarGoogle Scholar
  53. Yihan Sun, Daniel Ferizovic, and Guy E. Belloch. 2018. PAM: Parallel Augmented Maps. SIGPLAN Not., 53, 1 (2018), feb, 290–304. issn:0362-1340 https://doi.org/10.1145/3200691.3178509 Google ScholarGoogle ScholarDigital LibraryDigital Library
  54. Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR. https://doi.org/10.48550/ARXIV.2102.05187 Google ScholarGoogle Scholar
  55. Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). 521–532. Google ScholarGoogle ScholarDigital LibraryDigital Library
  56. Martin Winter, Rhaleb Zayer, and Markus Steinberger. 2017. Autonomous, independent management of dynamic graphs on GPUs. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC.2017.8091058 Google ScholarGoogle ScholarCross RefCross Ref
  57. Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: Efficient Vectorization of SpMV on x86 Processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO 2018). ACM, New York, NY, USA. 149–162. isbn:978-1-4503-5617-6 https://doi.org/10.1145/3168818 Google ScholarGoogle ScholarDigital LibraryDigital Library
  58. Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293–302. https://doi.org/10.1109/BigData.2017.8257937 Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Compilation of dynamic sparse tensor algebra

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Article Metrics

            • Downloads (Last 12 months)282
            • Downloads (Last 6 weeks)24

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!