Abstract
Many applications, from social network graph analytics to control flow analysis, compute on sparse data that evolves over the course of program execution. Such data can be represented as dynamic sparse tensors and efficiently stored in formats (data layouts) that utilize pointer-based data structures like block linked lists, binary search trees, B-trees, and C-trees among others. These specialized formats support fast in-place modification and are thus better suited than traditional, array-based data structures like CSR for storing dynamic sparse tensors. However, different dynamic sparse tensor formats have distinct benefits and drawbacks, and performing different computations on tensors that are stored in different formats can require vastly dissimilar code that are not straightforward to correctly implement and optimize.
This paper shows how a compiler can generate efficient code to compute tensor algebra operations on dynamic sparse tensors that may be stored in a wide range of disparate formats. We propose a language for precisely specifying recursive, pointer-based data structures, and we show how this language can express many different dynamic data structures, including all the ones named above as well as many more. We then describe how, given high-level specifications of such dynamic data structures, a compiler can emit code to efficiently access and compute on dynamic sparse tensors that are stored in the aforementioned data structures.
We evaluate our technique and find it generates efficient dynamic sparse tensor algebra kernels that have performance comparable to, if not better than, state-of-the-art libraries and frameworks such as PAM, Aspen, STINGER, and Terrace. At the same time, our technique supports a wider range of tensor algebra operations---such as those that simultaneously compute with static and dynamic sparse tensors---than Aspen, STINGER, and Terrace, while also achieving significantly better performance than PAM for those same operations.
- Seher Acer, Oguz Selvitopi, and Cevdet Aykanat. 2016. Improving Performance of Sparse Matrix Dense Matrix Multiplication on Large-Scale Parallel Systems. Parallel Comput., 59, C (2016), nov, 71–96. issn:0167-8191 https://doi.org/10.1016/j.parco.2016.10.001
Google Scholar
Digital Library
- Gilad Arnold, Johannes Hölzl, Ali Sinan Köksal, Rastislav Bodík, and Mooly Sagiv. 2010. Specifying and Verifying Sparse Matrix Codes. In Proceedings of the 15th ACM SIGPLAN International Conference on Functional Programming (ICFP ’10). ACM, New York, NY, USA. 249–260. isbn:978-1-60558-794-3 https://doi.org/10.1145/1863543.1863581
Google Scholar
Digital Library
- Muhammad A. Awad, Saman Ashkiani, Serban D. Porumbescu, and John D. Owens. 2020. Dynamic Graphs on the GPU. In 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 739–748. https://doi.org/10.1109/IPDPS47924.2020.00081
Google Scholar
Cross Ref
- Ariful Azad and Ayd∈ Buluç. 2017. A Work-Efficient Parallel Sparse Matrix-Sparse Vector Multiplication Algorithm. In 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 688–697. https://doi.org/10.1109/IPDPS.2017.76
Google Scholar
Cross Ref
- M. Baskaran, B. Meister, N. Vasilache, and R. Lethin. 2012. Efficient and scalable computations with sparse tensors. In 2012 IEEE Conference on High Performance Extreme Computing. 1–6. https://doi.org/10.1109/HPEC.2012.6408676
Google Scholar
Cross Ref
- Nathan Bell and Michael Garland. 2008. Efficient Sparse Matrix-Vector Multiplication on CUDA. NVIDIA Corporation.
Google Scholar
- Aart JC Bik. 1996. Compiler Support for Sparse Matrix Computations. Ph. D. Dissertation. Leiden University.
Google Scholar
- Aart J.C. Bik, Penporn Koanantakool, Tatiana Shpeisman, Nicolas Vasilache, Bixia Zheng, and Fredrik Kjolstad. 2022. Compiler Support for Sparse Tensor Computations in MLIR. ACM Trans. Archit. Code Optim., jun, issn:1544-3566 https://doi.org/10.1145/3544559
Google Scholar
Digital Library
- Aart JC Bik and Harry AG Wijshoff. 1993. Compilation techniques for sparse matrix computations. In Proceedings of the 7th international conference on Supercomputing. 416–424.
Google Scholar
Digital Library
- Aart JC Bik and Harry AG Wijshoff. 1994. On automatic data structure selection and code generation for sparse computations. In Languages and Compilers for Parallel Computing. Springer, 57–75.
Google Scholar
- Federico Busato, Oded Green, Nicola Bombieri, and David A. Bader. 2018. Hornet: An Efficient Data Structure for Dynamic Sparse Graphs and Matrices on GPUs. In 2018 IEEE High Performance extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC.2018.8547541
Google Scholar
Cross Ref
- Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2018. Format Abstraction for Sparse Tensor Algebra Compilers. Proc. ACM Program. Lang., 2, OOPSLA (2018), Article 123, Oct., 30 pages. issn:2475-1421
Google Scholar
Digital Library
- Stephen Chou, Fredrik Kjolstad, and Saman Amarasinghe. 2020. Automatic Generation of Efficient Sparse Tensor Format Conversion Routines. In Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2020). Association for Computing Machinery, New York, NY, USA. 823–838. isbn:9781450376136 https://doi.org/10.1145/3385412.3385963
Google Scholar
Digital Library
- Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, Third Edition (3rd ed.). The MIT Press. isbn:0262033844
Google Scholar
- Timothy A. Davis and Yifan Hu. 2011. The University of Florida Sparse Matrix Collection. ACM Trans. Math. Softw., 38, 1 (2011), Article 1, Dec., issn:0098-3500
Google Scholar
Digital Library
- Laxman Dhulipala, Guy E. Blelloch, and Julian Shun. 2019. Low-Latency Graph Streaming Using Compressed Purely-Functional Trees. In Proceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2019). Association for Computing Machinery, New York, NY, USA. 918–934. isbn:9781450367127 https://doi.org/10.1145/3314221.3314598
Google Scholar
Digital Library
- David Ediger, Rob McColl, Jason Riedy, and David A. Bader. 2012. STINGER: High performance data structure for streaming graphs. In 2012 IEEE Conference on High Performance Extreme Computing. 1–5. https://doi.org/10.1109/HPEC.2012.6408680
Google Scholar
Cross Ref
- Gaël Guennebaud and Benoît Jacob. 2010. Eigen v3. http://eigen.tuxfamily.org.
Google Scholar
- Peter Hawkins, Alex Aiken, Kathleen Fisher, Martin Rinard, and Mooly Sagiv. 2011. Data Representation Synthesis. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’11). Association for Computing Machinery, New York, NY, USA. 38–49. isbn:9781450306638 https://doi.org/10.1145/1993498.1993504
Google Scholar
Digital Library
- Rawn Henry, Olivia Hsu, Rohan Yadav, Stephen Chou, Kunle Olukotun, Saman Amarasinghe, and Fredrik Kjolstad. 2021. Compilation of Sparse Array Programming Models. Proc. ACM Program. Lang., 5, OOPSLA (2021), Article 128, Oct., 29 pages. https://doi.org/10.1145/3485505
Google Scholar
Digital Library
- Changwan Hong, Aravind Sukumaran-Rajam, Israt Nisa, Kunal Singh, and P. Sadayappan. 2019. Adaptive Sparse Tiling for Sparse Matrix Multiplication. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming (PPoPP ’19). Association for Computing Machinery, New York, NY, USA. 300–314. isbn:9781450362252 https://doi.org/10.1145/3293883.3295712
Google Scholar
Digital Library
- Intel. 2020. Intel oneAPI Math Kernel Library Developer Reference. https://www.intel.com/content/dam/develop/external/us/en/documents/onemkl-developerreference-c.pdf
Google Scholar
- James King, Thomas Gilray, Robert M. Kirby, and Matthew Might. 2016. Dynamic Sparse-Matrix Allocation on GPUs. In High Performance Computing, Julian M. Kunkel, Pavan Balaji, and Jack Dongarra (Eds.). Springer International Publishing, Cham. 61–80. isbn:978-3-319-41321-1
Google Scholar
- Fredrik Kjolstad, Peter Ahrens, Shoaib Kamil, and Saman Amarasinghe. 2019. Tensor Algebra Compilation with Workspaces. 180–192. isbn:978-1-7281-1436-1 http://dl.acm.org/citation.cfm?id=3314872.3314894
Google Scholar
- Fredrik Kjolstad, Shoaib Kamil, Stephen Chou, David Lugato, and Saman Amarasinghe. 2017. The Tensor Algebra Compiler. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 77, Oct., 29 pages. issn:2475-1421 https://doi.org/10.1145/3133901
Google Scholar
Digital Library
- Etienne Kneuss, Ivan Kuraj, Viktor Kuncak, and Philippe Suter. 2013. Synthesis modulo Recursive Functions. In Proceedings of the 2013 ACM SIGPLAN International Conference on Object Oriented Programming Systems Languages and Applications (OOPSLA ’13). Association for Computing Machinery, New York, NY, USA. 407–426. isbn:9781450323741 https://doi.org/10.1145/2509136.2509555
Google Scholar
Digital Library
- Vladimir Kotlyar. 1999. Relational Algebraic Techniques for the Synthesis of Sparse Matrix Programs. Ph. D. Dissertation. Cornell University.
Google Scholar
- Vladimir Kotlyar, Keshav Pingali, and Paul Stodghill. 1997. A relational approach to the compilation of sparse matrix programs. In Euro-Par’97 Parallel Processing. Springer, 318–327.
Google Scholar
- Pradeep Kumar and H. Howie Huang. 2019. GraphOne: A Data Store for Real-time Analytics on Evolving Graphs. In 17th USENIX Conference on File and Storage Technologies (FAST 19). USENIX Association, Boston, MA. 249–263. isbn:978-1-939133-09-0 https://www.usenix.org/conference/fast19/presentation/kumar
Google Scholar
- Darya Kurilova and Derek Rayside. 2013. On the Simplicity of Synthesizing Linked Data Structure Operations. SIGPLAN Not., 49, 3 (2013), oct, 155–158. issn:0362-1340 https://doi.org/10.1145/2637365.2517225
Google Scholar
Digital Library
- Tobin J. Lehman and Michael J. Carey. 1986. A Study of Index Structures for Main Memory Database Management Systems. In Proceedings of the 12th International Conference on Very Large Data Bases (VLDB ’86). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA. 294–303. isbn:0934613184
Google Scholar
- Jiajia Li, Jimeng Sun, and Richard Vuduc. 2018. HiCOO: Hierarchical Storage of Sparse Tensors. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC ’18). IEEE Press, Piscataway, NJ, USA. Article 19, 15 pages. https://doi.org/10.1109/SC.2018.00022
Google Scholar
Digital Library
- Weifeng Liu and Brian Vinter. 2015. CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication. In Proceedings of the 29th ACM on International Conference on Supercomputing (ICS ’15). ACM, New York, NY, USA. 339–350. isbn:978-1-4503-3559-1 https://doi.org/10.1145/2751205.2751209
Google Scholar
Digital Library
- Peter Macko, Virendra J. Marathe, Daniel W. Margo, and Margo I. Seltzer. 2015. LLAMA: Efficient graph analytics using Large Multiversioned Arrays. In 2015 IEEE 31st International Conference on Data Engineering. 363–374. https://doi.org/10.1109/ICDE.2015.7113298
Google Scholar
Cross Ref
- Tim Mattson, David Bader, Jon Berry, Ayd∈ Buluç, Jack Dongarra, Christos Faloutsos, John Feo, John R. Gilbert, Joseph Gonzalez, Bruce Hendrickson, Jeremy Kepner, Charles E Leiserson, Andrew Lumsdaine, David Padua, Stephen Poole, Steve Reinhardt, Michael Stonebraker, Steve Wallach, and Andrew Yoo. 2013. Standards for Graph Algorithm Primitives. In IEEE High Performance Extreme Computing Conference. IEEE, 1–2. https://doi.org/10.1109/HPEC.2013.6670338
Google Scholar
Cross Ref
- Alexander Monakov, Anton Lokhmotov, and Arutyun Avetisyan. 2010. Automatically Tuning Sparse Matrix-Vector Multiplication for GPU Architectures. In High Performance Embedded Architectures and Compilers, Yale N. Patt, Pierfrancesco Foglia, Evelyn Duesterwald, Paolo Faraboschi, and Xavier Martorell (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg. 111–125. isbn:978-3-642-11515-8
Google Scholar
- L. Page, S. Brin, R. Motwani, and T. Winograd. 1998. The PageRank citation ranking: Bringing order to the Web. In Proceedings of the 7th International World Wide Web Conference. Brisbane, Australia. 161–172. citeseer.nj.nec.com/page98pagerank.html
Google Scholar
- Prashant Pandey, Brian Wheatman, Helen Xu, and Aydin Buluc. 2021. Terrace: A Hierarchical Graph Container for Skewed Dynamic Graphs. In Proceedings of the 2021 International Conference on Management of Data (SIGMOD/PODS ’21). Association for Computing Machinery, New York, NY, USA. 1372–1385. isbn:9781450383431 https://doi.org/10.1145/3448016.3457313
Google Scholar
Digital Library
- Jongsoo Park, Sheng Li, Wei Wen, Ping Tak Peter Tang, Hai Li, Yiran Chen, and Pradeep Dubey. 2016. Faster CNNs with Direct Sparse Convolutions and Guided Pruning. arxiv:1608.01409.
Google Scholar
- Nadia Polikarpova, Ivan Kuraj, and Armando Solar-Lezama. 2016. Program Synthesis from Polymorphic Refinement Types. In Proceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’16). Association for Computing Machinery, New York, NY, USA. 522–538. isbn:9781450342612 https://doi.org/10.1145/2908080.2908093
Google Scholar
Digital Library
- William Pugh and Tatiana Shpeisman. 1999. SIPR: A new framework for generating efficient code for sparse matrix computations. In Languages and Compilers for Parallel Computing. Springer, 213–229.
Google Scholar
Digital Library
- Xiaokang Qiu and Armando Solar-Lezama. 2017. Natural Synthesis of Provably-Correct Data-Structure Manipulations. Proc. ACM Program. Lang., 1, OOPSLA (2017), Article 65, oct, 28 pages. https://doi.org/10.1145/3133889
Google Scholar
Digital Library
- Samyam Rajbhandari, Yuxiong He, Olatunji Ruwase, Michael Carbin, and Trishul Chilimbi. 2017. Optimizing CNNs on Multicores for Scalability, Performance and Goodput. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS ’17). ACM, New York, NY, USA. 267–280. isbn:978-1-4503-4465-4 https://doi.org/10.1145/3037697.3037745
Google Scholar
Digital Library
- Derek Rayside, Vajihollah Montaghami, Francesca Leung, Albert Yuen, Kevin Xu, and Daniel Jackson. 2012. Synthesizing Iterators from Abstraction Functions. SIGPLAN Not., 48, 3 (2012), sep, 31–40. issn:0362-1340 https://doi.org/10.1145/2480361.2371407
Google Scholar
Digital Library
- Yousef Saad. 2003. Iterative methods for sparse linear systems. SIAM.
Google Scholar
- Dipanjan Sengupta and Shuaiwen Leon Song. 2017. EvoGraph: On-the-Fly Efficient Mining of Evolving Graphs on GPU. In High Performance Computing, Julian M. Kunkel, Rio Yokota, Pavan Balaji, and David Keyes (Eds.). Springer International Publishing, Cham. 97–119. isbn:978-3-319-58667-0
Google Scholar
- Mo Sha, Yuchen Li, Bingsheng He, and Kian-Lee Tan. 2017. Accelerating Dynamic Graph Analytics on GPUs. Proc. VLDB Endow., 11, 1 (2017), sep, 107–120. issn:2150-8097 https://doi.org/10.14778/3151113.3151122
Google Scholar
Digital Library
- Julian Shun and Guy E. Blelloch. 2013. Ligra: A Lightweight Graph Processing Framework for Shared Memory. SIGPLAN Not., 48, 8 (2013), feb, 135–146. issn:0362-1340 https://doi.org/10.1145/2517327.2442530
Google Scholar
Digital Library
- Rishabh Singh and Armando Solar-Lezama. 2011. Synthesizing Data Structure Manipulations from Storyboards. In Proceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering (ESEC/FSE ’11). Association for Computing Machinery, New York, NY, USA. 289–299. isbn:9781450304436 https://doi.org/10.1145/2025113.2025153
Google Scholar
Digital Library
- Shaden Smith and George Karypis. 2015. Tensor-matrix products with a compressed sparse tensor. In Proceedings of the 5th Workshop on Irregular Applications: Architectures and Algorithms. 5.
Google Scholar
Digital Library
- Shaden Smith, Niranjay Ravindran, Nicholas Sidiropoulos, and George Karypis. 2015. SPLATT: Efficient and Parallel Sparse Tensor-Matrix Multiplication. In 2015 IEEE International Parallel and Distributed Processing Symposium (IPDPS). 61–70. issn:1530-2075
Google Scholar
- Paul Stodghill. 1997. A Relational Approach to the Automatic Generation of Sequential Sparse Matrix Codes. Ph. D. Dissertation. Cornell University.
Google Scholar
- Yihan Sun, Daniel Ferizovic, and Guy E. Belloch. 2018. PAM: Parallel Augmented Maps. SIGPLAN Not., 53, 1 (2018), feb, 290–304. issn:0362-1340 https://doi.org/10.1145/3200691.3178509
Google Scholar
Digital Library
- Ruiqin Tian, Luanzheng Guo, Jiajia Li, Bin Ren, and Gokcen Kestor. 2021. A High-Performance Sparse Tensor Algebra Compiler in Multi-Level IR. https://doi.org/10.48550/ARXIV.2102.05187
Google Scholar
- Anand Venkat, Mary Hall, and Michelle Strout. 2015. Loop and Data Transformations for Sparse Matrix Code. In Proceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2015). 521–532.
Google Scholar
Digital Library
- Martin Winter, Rhaleb Zayer, and Markus Steinberger. 2017. Autonomous, independent management of dynamic graphs on GPUs. In 2017 IEEE High Performance Extreme Computing Conference (HPEC). 1–7. https://doi.org/10.1109/HPEC.2017.8091058
Google Scholar
Cross Ref
- Biwei Xie, Jianfeng Zhan, Xu Liu, Wanling Gao, Zhen Jia, Xiwen He, and Lixin Zhang. 2018. CVR: Efficient Vectorization of SpMV on x86 Processors. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO 2018). ACM, New York, NY, USA. 149–162. isbn:978-1-4503-5617-6 https://doi.org/10.1145/3168818
Google Scholar
Digital Library
- Yunming Zhang, Vladimir Kiriansky, Charith Mendis, Saman Amarasinghe, and Matei Zaharia. 2017. Making caches work for graph analytics. In 2017 IEEE International Conference on Big Data (Big Data). 293–302. https://doi.org/10.1109/BigData.2017.8257937
Google Scholar
Cross Ref
Index Terms
Compilation of dynamic sparse tensor algebra
Recommendations
Format abstraction for sparse tensor algebra compilers
This paper shows how to build a sparse tensor algebra compiler that is agnostic to tensor formats (data layouts). We develop an interface that describes formats in terms of their capabilities and properties, and show how to build a modular code ...
Automatic generation of efficient sparse tensor format conversion routines
PLDI 2020: Proceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and ImplementationThis paper shows how to generate code that efficiently converts sparse tensors between disparate storage formats (data layouts) such as CSR, DIA, ELL, and many others. We decompose sparse tensor conversion into three logical phases: coordinate remapping,...
Tensor algebra compilation with workspaces
CGO 2019: Proceedings of the 2019 IEEE/ACM International Symposium on Code Generation and OptimizationThis paper shows how to extend sparse tensor algebra compilers to introduce temporary tensors called workspaces to avoid inefficient sparse data structures accesses. We develop an intermediate representation (IR) for tensor operations called concrete ...






Comments