Abstract
Runtime specialization optimizes programs based on partial information available only at run time. It is applicable when some input data is used repeatedly while other input data varies. This technique has the potential of generating highly efficient codes. In this paper, we explore the potential for obtaining speedups for sparse matrix-dense vector multiplication using runtime specialization, in the case where a single matrix is to be multiplied by many vectors. We experiment with five methods involving runtime specialization, comparing them to methods that do not (including Intel's MKL library). For this work, our focus is the evaluation of the speedups that can be obtained with runtime specialization without considering the overheads of the code generation. Our experiments use 23 matrices from the Matrix Market and Florida collections, and run on five different machines. In 94 of those 115 cases, the specialized code runs faster than any version without specialization. If we only use specialization, the average speedup with respect to Intel's MKL library ranges from 1.44x to 1.77x, depending on the machine. We have also found that the best method depends on the matrix and machine; no method is best for all matrices and machines.
- AMD Core Math Library. http://developer.amd.com/tools-andsdks/cpu-development/amd-core-math-library-acml.Google Scholar
- CSB library. http://gauss.cs.ucsb.edu/∼aydin/csb/html/index.html.Google Scholar
- CSX library. https://github.com/cslab-ntua/csx.Google Scholar
- The University of Florida Sparse Matrix Collection. http://www.cise.ufl.edu/research/sparse/matrices/.Google Scholar
- MKL. http://software.intel.com/en-us/articles/intel-mkl/.Google Scholar
- Matrix Market. http://math.nist.gov/MatrixMarket/.Google Scholar
- B. Aktemur, J. Jones, S. Kamin, and L. Clausen. Optimizing marshalling by run-time program generation. In GPCE ’05, pages 221– 236, 2005. Google Scholar
Digital Library
- B. Aktemur, Y. Kameyama, O. Kiselyov, and C.-c. Shan. Shonan challenge for generative programming. In PEPM ’13, pages 147–154, 2013. Google Scholar
Digital Library
- M. Belgin, G. Back, and C. J. Ribbens. Pattern-based sparse matrix representation for memory-efficient SMVM kernels. In ICS’09, pages 100–109, 2009. Google Scholar
Digital Library
- N. Bell, S. Dalton, and L. Olson. Exposing fine-grained parallelism in algebraic multigrid methods. SIAM Journal on Scientific Computing, 34(4):C123–C152, 2012.Google Scholar
Digital Library
- A. Buluc¸, J. Fineman, M. Frigo, J. Gilbert, and C. Leiserson. Parallel sparse matrix-vector and matrix-transpose-vector multiplication using compressed sparse blocks. In SPAA’09, pages 233–244, 2009. Google Scholar
Digital Library
- A. Buluc¸, S. Williams, L. Oliker, and J. Demmel. Reduced-bandwidth multithreaded algorithms for sparse matrix-vector multiplication. In IPDPS ’11, pages 721–733, 2011. Google Scholar
Digital Library
- W. Choi, B. Aktemur, K. Yi, and M. Tatsuta. Static analysis of multistaged programs via unstaging translation. In POPL ’11, pages 81–92, 2011. Google Scholar
Digital Library
- A. Cohen and C. Herrmann. Towards a high-productivity and highperformance marshaling library for compound data. In 2nd MetaOCaml Workshop, 2005.Google Scholar
- A. Cohen, S. Donadio, M. J. Garzarán, C. Herrmann, O. Kiselyov, and D. Padua. In search of a program generator to implement generic transformations for high-performance computing. Science of Computer Programming, 62(1):25–46, 2006. Google Scholar
Digital Library
- R. Davies and F. Pfenning. A modal analysis of staged computation. In POPL ’96, pages 258–270, 1996. Google Scholar
Digital Library
- T. A. Davis and Y. Hu. The University of Florida sparse matrix collection. ACM Trans. Math. Softw., 38(1):1:1–1:25, Dec. 2011. Google Scholar
Digital Library
- E. D’Azevedo, M. Fahey, and R. Mills. Vectorized sparse matrix multiply for compressed row storage format. In ICCS’05, pages 99– 106, 2005. Google Scholar
Digital Library
- J. Demmel, J. Dongarra, V. Eijkhout, E. Fuentes, A. Petitet, R. Vuduc, R. C. Whaley, and K. Yelick. Self Adapting Linear Algebra Algorithms and Software. Proc. of the IEEE, 93(2):293–312, 2005.Google Scholar
- M. Frigo. A Fast Fourier Transform Compiler. In PLDI ’99, pages 169–180, 1999. Google Scholar
Digital Library
- E.-J. Im, K. Yelick, and R. Vuduc. Sparsity: Optimization framework for sparse matrix kernels. Int. J. High Perform. Comput. Appl., 18(1): 135–158, 2004. Google Scholar
Digital Library
- A. Jain. pOSKI: An extensible autotuning framework to perform optimized SpMVs on multicore architectures. Master’s thesis, U. of California at Berkeley, 2008.Google Scholar
- S. Kamin, L. Clausen, and A. Jarvis. Jumbo: Run-time Code Generation for Java and Its Applications. In CGO ’03, pages 48–56, 2003. Google Scholar
Digital Library
- K. Kourtis, V. Karakasis, G. Goumas, and N. Koziris. Csx: An extended compression format for spmv on shared memory systems. In PPoPP’11, pages 247–256, 2011. Google Scholar
Digital Library
- X. Li, M. J. Garzarán, and D. Padua. Optimizing Sorting with Genetic Algorithms. In CGO ’05, pages 99–110, 2005. Google Scholar
Digital Library
- A. Logg, K.-A. Mardal, and G. N. Wells. Automated solution of differential equations by the finite element method (chapter 6). https://bitbucket.org/fenics-project/fenics-book/downloads. Google Scholar
Digital Library
- J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix vector multiply using unroll-and-jam. Int. J. High Perform. Comput. Appl., 18(2), 2004. Google Scholar
Digital Library
- M. Püschel, J. Moura, J. Johnson, D. Padua, M. Veloso, B. Singer, J. Xiong, F. Franchetti, A. Gacic, Y. Voronenko, K. Chen, R. Johnson, and N. Rizzolo. SPIRAL: Code generation for DSP transforms. Proc. of the IEEE, 93(2):232–275, 2005.Google Scholar
Cross Ref
- F. Smith, D. Grossman, G. Morrisett, L. Hornof, and T. Jim. Compiling for template-based run-time code generation. J. of Functional Programming, 13(3):677–708, 2003. Google Scholar
Digital Library
- W. Taha and M. Nielsen. Environment classifiers. In POPL ’03, pages 26–37, 2003. Google Scholar
Digital Library
- R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance Optimizations and Bounds for Sparse Matrix-Vector Multiply. In Supercomputing ’02, page 26, 2002. Google Scholar
Digital Library
- E. Westbrook, M. Ricken, J. Inoue, Y. Yao, T. Abdelatif, and W. Taha. Mint: Java multi-stage programming using weak separability. In PLDI ’10, pages 400–411, 2010. Google Scholar
Digital Library
- R. Whaley, A. Petitet, and J. Dongarra. Automated Empirical Optimizations of Sofware and the ATLAS Project. Parallel Computing, 27 (1-2):3–35, 2001.Google Scholar
Digital Library
- S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In Supercomputing’07, pages 38:1–38:12, 2007. Google Scholar
Digital Library
Index Terms
Optimization by runtime specialization for sparse matrix-vector multiplication
Recommendations
Autotuning Runtime Specialization for Sparse Matrix-Vector Multiplication
Runtime specialization is used for optimizing programs based on partial information available only at runtime. In this paper we apply autotuning on runtime specialization of Sparse Matrix-Vector Multiplication to predict a best specialization method ...
Optimization by runtime specialization for sparse matrix-vector multiplication
GPCE 2014: Proceedings of the 2014 International Conference on Generative Programming: Concepts and ExperiencesRuntime specialization optimizes programs based on partial information available only at run time. It is applicable when some input data is used repeatedly while other input data varies. This technique has the potential of generating highly efficient ...
CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix ...






Comments