skip to main content
article

Optimization by runtime specialization for sparse matrix-vector multiplication

Published:15 September 2014Publication History
Skip Abstract Section

Abstract

Runtime specialization optimizes programs based on partial information available only at run time. It is applicable when some input data is used repeatedly while other input data varies. This technique has the potential of generating highly efficient codes. In this paper, we explore the potential for obtaining speedups for sparse matrix-dense vector multiplication using runtime specialization, in the case where a single matrix is to be multiplied by many vectors. We experiment with five methods involving runtime specialization, comparing them to methods that do not (including Intel's MKL library). For this work, our focus is the evaluation of the speedups that can be obtained with runtime specialization without considering the overheads of the code generation. Our experiments use 23 matrices from the Matrix Market and Florida collections, and run on five different machines. In 94 of those 115 cases, the specialized code runs faster than any version without specialization. If we only use specialization, the average speedup with respect to Intel's MKL library ranges from 1.44x to 1.77x, depending on the machine. We have also found that the best method depends on the matrix and machine; no method is best for all matrices and machines.

References

  1. AMD Core Math Library. http://developer.amd.com/tools-andsdks/cpu-development/amd-core-math-library-acml.Google ScholarGoogle Scholar
  2. CSB library. http://gauss.cs.ucsb.edu/∼aydin/csb/html/index.html.Google ScholarGoogle Scholar
  3. CSX library. https://github.com/cslab-ntua/csx.Google ScholarGoogle Scholar
  4. The University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices/.Google ScholarGoogle Scholar
  5. MKL. http://software.intel.com/en-us/articles/intel-mkl/.Google ScholarGoogle Scholar
  6. Matrix Market. http://math.nist.gov/MatrixMarket/.Google ScholarGoogle Scholar
  7. B. Aktemur, J. Jones, S. Kamin, and L. Clausen. Optimizing marshalling by run-time program generation. In GPCE ’05, pages 221– 236, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. B. Aktemur, Y. Kameyama, O. Kiselyov, and C.-c. Shan. Shonan challenge for generative programming. In PEPM ’13, pages 147–154, 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. M. Belgin, G. Back, and C. J. Ribbens. Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In ICS’09, pages 100–109, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Bell, S. Dalton, and L. Olson. Exposing fine-grained parallelism in algebraic multigrid methods. SIAM Journal on Scientific Computing, 34(4):C123–C152, 2012.Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. A. Buluc¸, J. Fineman, M. Frigo, J. Gilbert, and C. Leiserson. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In SPAA’09, pages 233–244, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Buluc¸, S. Williams, L. Oliker, and J. Demmel. Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In IPDPS ’11, pages 721–733, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. W. Choi, B. Aktemur, K. Yi, and M. Tatsuta. Static analysis of multistaged programs via unstaging translation. In POPL ’11, pages 81–92, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Cohen and C. Herrmann. Towards a high-productivity and highperformance marshaling library for compound data. In 2nd MetaOCaml Workshop, 2005.Google ScholarGoogle Scholar
  15. A. Cohen, S. Donadio, M. J. Garzarán, C. Herrmann, O. Kiselyov, and D. Padua. In search of a program generator to implement generic transformations for high-performance computing. Science of Computer Programming, 62(1):25–46, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. R. Davies and F. Pfenning. A modal analysis of staged computation. In POPL ’96, pages 258–270, 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1–1:25, Dec. 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. E. D’Azevedo, M. Fahey, and R. Mills. Vectorized sparse matrix multiply for compressed row storage format. In ICCS’05, pages 99– 106, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R. C. Whaley, and K. Yelick. Self Adapting Linear Algebra Algorithms and Software. Proc. of the IEEE, 93(2):293–312, 2005.Google ScholarGoogle Scholar
  20. M. Frigo. A Fast Fourier Transform Compiler. In PLDI ’99, pages 169–180, 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl., 18(1): 135–158, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. A. Jain. pOSKI: An extensible autotuning framework to perform optimized SpMVs on multicore architectures. Master’s thesis, U. of California at Berkeley, 2008.Google ScholarGoogle Scholar
  23. S. Kamin, L. Clausen, and A. Jarvis. Jumbo: Run-time Code Generation for Java and Its Applications. In CGO ’03, pages 48–56, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris. Csx: An extended compression format for spmv on shared memory systems. In PPoPP’11, pages 247–256, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Li, M. J. Garzarán, and D. Padua. Optimizing Sorting with Genetic Algorithms. In CGO ’05, pages 99–110, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. A. Logg, K.-A. Mardal, and G. N. Wells. Automated solution of differential equations by the finite element method (chapter 6). https://bitbucket.org/fenics-project/fenics-book/downloads. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix vector multiply using unroll-and-jam. Int. J. High Perform. Comput. Appl., 18(2), 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Püschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proc. of the IEEE, 93(2):232–275, 2005.Google ScholarGoogle ScholarCross RefCross Ref
  29. F. Smith, D. Grossman, G. Morrisett, L. Hornof, and T. Jim. Compiling for template-based run-time code generation. J. of Functional Programming, 13(3):677–708, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. W. Taha and M. Nielsen. Environment classifiers. In POPL ’03, pages 26–37, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply. In Supercomputing ’02, page 26, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. E. Westbrook, M. Ricken, J. Inoue, Y. Yao, T. Abdelatif, and W. Taha. Mint: Java multi-stage programming using weak separability. In PLDI ’10, pages 400–411, 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Sofware and the ATLAS Project. Parallel Computing, 27 (1-2):3–35, 2001.Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Supercomputing’07, pages 38:1–38:12, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Optimization by runtime specialization for sparse matrix-vector multiplication

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 50, Issue 3
      GPCE '14
      March 2015
      141 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2775053
      • Editor:
      • Andy Gill
      Issue’s Table of Contents
      • cover image ACM Conferences
        GPCE 2014: Proceedings of the 2014 International Conference on Generative Programming: Concepts and Experiences
        September 2014
        141 pages
        ISBN:9781450331616
        DOI:10.1145/2658761

      Copyright © 2014 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 15 September 2014

      Check for updates

      Qualifiers

      • article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!