Abstract
The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy to improve the kernel's performance is to drastically reduce the data volume involved in the computations. Since the storage formats for sparse matrices include metadata describing the structure of non-zero elements within the matrix, we propose a generalized approach to compress metadata by exploiting substructures within the matrix. We call the proposed storage format Compressed Sparse eXtended (CSX). In our implementation we employ runtime code generation to construct specialized SpMV routines for each matrix. Experimental evaluation on two shared memory systems for 15 sparse matrices demonstrates significant performance gains as the number of participating cores increases. Regarding the cost of CSX construction, we propose several strategies which trade performance for preprocessing cost making CSX applicable both to online and offline preprocessing.
- R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Supercomputing'92, pages 32--41, Minn., MN, November 1992. IEEE. Google Scholar
Digital Library
- W. K. Anderson, W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith. Achieving high sustained performance in an unstructured mesh CFD application. In SC'99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, page 69, New York, NY, USA, 1999. ACM. Google Scholar
Digital Library
- K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 18 2006.Google Scholar
- R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. M. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, 1994.Google Scholar
- M. Belgin, G. Back, and C. J. Ribbens. Pattern-based sparse matrix representation for memory-efficient smvm kernels. In ICS'09: Proceedings of the 23rd international conference on Supercomputing, pages 100--109, New York, NY, USA, 2009. ACM. Google Scholar
Digital Library
- U. V. Catalyuerek and C. Aykanat. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. Lecture Notes In Computer Science, 1117:75--86, 1996. Google Scholar
Digital Library
- T. Davis. University of Florida sparse matrix collection. NA Digest, 97(23):7, 1997.Google Scholar
- R. Geus and S. Röllin. Towards a fast parallel sparse matrix-vector multiplication. In Parallel Computing: Fundamentals and Applications, International Conference ParCo, pages 308--315. Imperial College Press, 1999.Google Scholar
- G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. The Journal of Supercomputing, 2008. Google Scholar
Digital Library
- E. Im and K. Yelick. Optimizing sparse matrix-vector multiplication on SMPs. In 9th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1999.Google Scholar
- E. Im and K. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. Lecture Notes in Computer Science, 2073:127--136, 2001. Google Scholar
Digital Library
- V. Karakasis, G. Goumas, and N. Koziris. A comparative study of blocking storage methods for sparse matrices on multicore architectures. In 12th IEEE International Conference on Computational Science and Engineering (CSE-09), Vancouver, Canada, 2009. IEEE Computer Society. Google Scholar
Digital Library
- D. Keppel, S. J. Eggers, and R. R. Henry. A case for runtime code generation. Technical Report UWCSE 91-11-04, University of Washington Department of Computer Science and Engineering, November 1991.Google Scholar
- K. Kourtis, G. Goumas, and N. Koziris. Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression. In 37th International Conference on Parallel Processing (ICPP'08), pages 511--519, Sept. 2008. Google Scholar
Digital Library
- K. Kourtis, G. Goumas, and N. Koziris. Optimizing sparse matrix-vector multiplication using index and value compression. In CF'08: Proceedings of the 2008 conference on Computing frontiers, pages 87--96, New York, NY, USA, 2008. ACM. Google Scholar
Digital Library
- C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004. Google Scholar
Digital Library
- J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. International Journal of High Performance Computing Applications, 18(2):225, 2004. Google Scholar
Digital Library
- J. C. Pichel, D. B. Heras, J. C. Cabaleiro, and F. F. Rivera. Improving the locality of the sparse matrix-vector product on shared memory multiprocessors. In PDP, pages 66--71. IEEE Computer Society, 2004.Google Scholar
Cross Ref
- A. Pinar and M. T. Heath. Improving performance of sparse matrix-vector multiplication. In Supercomputing'99, Portland, OR, November 1999. ACM SIGARCH and IEEE. Google Scholar
Digital Library
- Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, Minneapolis, MN 55455, June 1994. Version 2.Google Scholar
- Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia, PA, USA, 2003. Google Scholar
Digital Library
- S. Toledo. Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development, 41(6):711--725, 1997. Google Scholar
Digital Library
- R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance optimizations and bounds for sparse matrix-vector multiply. In Supercomputing, Baltimore, MD, November 2002. Google Scholar
Digital Library
- R. W. Vuduc and H. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In High Performance Computing and Communications, volume 3726 of Lecture Notes in Computer Science, pages 807--816. Springer, 2005. Google Scholar
Digital Library
- J. White and P. Sadayappan. On improving the performance of sparse matrix-vector multiplication. In HiPC'97: 4th International Conference on High Performance Computing, 1997. Google Scholar
Digital Library
- J. Willcock and A. Lumsdaine. Accelerating sparse matrix computations via data compression. In ICS'06: Proceedings of the 20th annual International Conference on Supercomputing, pages 307--316, New York, NY, USA, 2006. ACM Press. Google Scholar
Digital Library
- S. Williams, L. Oilker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, Reno, NV, November 2007. Google Scholar
Digital Library
Index Terms
CSX: an extended compression format for spmv on shared memory systems
Recommendations
CSX: an extended compression format for spmv on shared memory systems
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programmingThe Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy to improve ...
Improving the Performance of the Symmetric Sparse Matrix-Vector Multiplication in Multicore
IPDPS '13: Proceedings of the 2013 IEEE 27th International Symposium on Parallel and Distributed ProcessingSymmetric sparse matrices arise often in the solution of sparse linear systems. Exploiting the non-zero element symmetry in order to reduce the overall matrix size is very tempting for optimizing the symmetric Sparse Matrix-Vector Multiplication kernel (...
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and ImplementationSparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific ...







Comments