skip to main content
research-article

CSX: an extended compression format for spmv on shared memory systems

Published:12 February 2011Publication History
Skip Abstract Section

Abstract

The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy to improve the kernel's performance is to drastically reduce the data volume involved in the computations. Since the storage formats for sparse matrices include metadata describing the structure of non-zero elements within the matrix, we propose a generalized approach to compress metadata by exploiting substructures within the matrix. We call the proposed storage format Compressed Sparse eXtended (CSX). In our implementation we employ runtime code generation to construct specialized SpMV routines for each matrix. Experimental evaluation on two shared memory systems for 15 sparse matrices demonstrates significant performance gains as the number of participating cores increases. Regarding the cost of CSX construction, we propose several strategies which trade performance for preprocessing cost making CSX applicable both to online and offline preprocessing.

References

  1. R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Supercomputing'92, pages 32--41, Minn., MN, November 1992. IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. W. K. Anderson, W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith. Achieving high sustained performance in an unstructured mesh CFD application. In SC'99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, page 69, New York, NY, USA, 1999. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 18 2006.Google ScholarGoogle Scholar
  4. R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. M. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, 1994.Google ScholarGoogle Scholar
  5. M. Belgin, G. Back, and C. J. Ribbens. Pattern-based sparse matrix representation for memory-efficient smvm kernels. In ICS'09: Proceedings of the 23rd international conference on Supercomputing, pages 100--109, New York, NY, USA, 2009. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. U. V. Catalyuerek and C. Aykanat. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. Lecture Notes In Computer Science, 1117:75--86, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. T. Davis. University of Florida sparse matrix collection. NA Digest, 97(23):7, 1997.Google ScholarGoogle Scholar
  8. R. Geus and S. Röllin. Towards a fast parallel sparse matrix-vector multiplication. In Parallel Computing: Fundamentals and Applications, International Conference ParCo, pages 308--315. Imperial College Press, 1999.Google ScholarGoogle Scholar
  9. G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. The Journal of Supercomputing, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. E. Im and K. Yelick. Optimizing sparse matrix-vector multiplication on SMPs. In 9th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1999.Google ScholarGoogle Scholar
  11. E. Im and K. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. Lecture Notes in Computer Science, 2073:127--136, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. V. Karakasis, G. Goumas, and N. Koziris. A comparative study of blocking storage methods for sparse matrices on multicore architectures. In 12th IEEE International Conference on Computational Science and Engineering (CSE-09), Vancouver, Canada, 2009. IEEE Computer Society. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. D. Keppel, S. J. Eggers, and R. R. Henry. A case for runtime code generation. Technical Report UWCSE 91-11-04, University of Washington Department of Computer Science and Engineering, November 1991.Google ScholarGoogle Scholar
  14. K. Kourtis, G. Goumas, and N. Koziris. Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression. In 37th International Conference on Parallel Processing (ICPP'08), pages 511--519, Sept. 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. K. Kourtis, G. Goumas, and N. Koziris. Optimizing sparse matrix-vector multiplication using index and value compression. In CF'08: Proceedings of the 2008 conference on Computing frontiers, pages 87--96, New York, NY, USA, 2008. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. International Journal of High Performance Computing Applications, 18(2):225, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. C. Pichel, D. B. Heras, J. C. Cabaleiro, and F. F. Rivera. Improving the locality of the sparse matrix-vector product on shared memory multiprocessors. In PDP, pages 66--71. IEEE Computer Society, 2004.Google ScholarGoogle ScholarCross RefCross Ref
  19. A. Pinar and M. T. Heath. Improving performance of sparse matrix-vector multiplication. In Supercomputing'99, Portland, OR, November 1999. ACM SIGARCH and IEEE. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, Minneapolis, MN 55455, June 1994. Version 2.Google ScholarGoogle Scholar
  21. Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia, PA, USA, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Toledo. Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development, 41(6):711--725, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance optimizations and bounds for sparse matrix-vector multiply. In Supercomputing, Baltimore, MD, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. R. W. Vuduc and H. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In High Performance Computing and Communications, volume 3726 of Lecture Notes in Computer Science, pages 807--816. Springer, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. J. White and P. Sadayappan. On improving the performance of sparse matrix-vector multiplication. In HiPC'97: 4th International Conference on High Performance Computing, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Willcock and A. Lumsdaine. Accelerating sparse matrix computations via data compression. In ICS'06: Proceedings of the 20th annual International Conference on Supercomputing, pages 307--316, New York, NY, USA, 2006. ACM Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. S. Williams, L. Oilker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, Reno, NV, November 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. CSX: an extended compression format for spmv on shared memory systems

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 46, Issue 8
      PPoPP '11
      August 2011
      300 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2038037
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
        February 2011
        326 pages
        ISBN:9781450301190
        DOI:10.1145/1941553
        • General Chair:
        • Calin Cascaval,
        • Program Chair:
        • Pen-Chung Yew

      Copyright © 2011 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 12 February 2011

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!