skip to main content
research-article

Communication avoiding successive band reduction

Published:25 February 2012Publication History
Skip Abstract Section

Abstract

The running time of an algorithm depends on both arithmetic and communication (i.e., data movement) costs, and the relative costs of communication are growing over time. In this work, we present both theoretical and practical results for tridiagonalizing a symmetric band matrix: we present an algorithm that asymptotically reduces communication, and we show that it indeed performs well in practice.

The tridiagonalization of a symmetric band matrix is a key kernel in solving the symmetric eigenvalue problem for both full and band matrices. In order to preserve sparsity, tridiagonalization routines use annihilate-and-chase procedures that previously have suffered from poor data locality. We improve data locality by reorganizing the computation, asymptotically reducing communication costs compared to existing algorithms. Our sequential implementation demonstrates that avoiding communication improves runtime even at the expense of extra arithmetic: we observe a 2x speedup over Intel MKL while doing 43% more floating point operations.

Our parallel implementation targets shared-memory multicore platforms. It uses pipelined parallelism and a static scheduler while retaining the locality properties of the sequential algorithm. Due to lightweight synchronization and effective data reuse, we see 9.5x scaling over our serial code and up to 6x speedup over the PLASMA library, comparing parallel performance on a ten-core processor.

References

  1. Aggarwal, A., and Vitter, J. S. The input/output complexity of sorting and related problems. Comm. ACM 31, 9 (1988), 1116--1127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Agullo, E., Dongarra, J., Hadri, B., Kurzak, J., Langou, J., Langou, J., Ltaief, H., Luszczek, P., and YarKhan, A. PLASMA users' guide, 2009. http://icl.cs.utk.edu/plasma/.Google ScholarGoogle Scholar
  3. Ballard, G., Demmel, J., Holtz, O., and Schwartz, O. Minimizing communication in linear algebra. SIAM Journal on Matrix Analysis and Applications 32, 3 (2011), 866--901.Google ScholarGoogle ScholarCross RefCross Ref
  4. Bischof, C., Lang, B., and Sun, X. A framework for symmetric band reduction. ACM Trans. Math. Soft. 26, 4 (2000), 581--601. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Bischof, C. H., Lang, B., and Sun, X. Algorithm 807: The SBR Toolbox--software for successive band reduction. ACM Trans. Math. Soft. 26, 4 (2000), 602--616. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Demmel, J., Grigori, L., Hoemmen, M., and Langou, J. Communication-optimal parallel and sequential QR and LU factorizations. SIAM J. Sci. Comput. (2011). To appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Dongarra, J., Hammarling, S., and Sorensen, D. Block reduction of matrices to condensed forms for eigenvalue computations. Journal of Computational and Applied Mathematics 27 (1989).Google ScholarGoogle Scholar
  8. Fuller, S. H., and Millett, L. I., Eds. The Future of Computing Performance: Game Over or Next Level? The National Academies Press, Washington, D.C., 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Haidar, A., Ltaief, H., and Dongarra, J. Parallel reduction to condensed forms for symmetric eigenvalue problems using aggregated fine-grained and memory-aware kernels. Proceedings of the ACM/IEEE Conference on Supercomputing (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Howell, G., Demmel, J., Fulton, C., Hammarling, S., and Marmol, K. Cache efficient bidiagonalization using BLAS 2.5 operators. ACM Trans. Math. Softw. 34, 3 (2008), 14:1--14:33. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Kaufman, L. Banded eigenvalue solvers on vector machines. ACM Trans. Math. Softw. 10 (1984), 73--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Kaufman, L. Band reduction algorithms revisited. ACM Trans. Math. Softw. 26 (December 2000), 551--567. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Lang, B. A parallel algorithm for reducing symmetric banded matrices to tridiagonal form. SIAM J. Sci. Comput. 14, 6 (1993), 1320--1338. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lang, B. Efficient eigenvalue and singular value computations on shared memory machines. Par. Comp. 25, 7 (1999), 845 -- 860. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ltaief, H., Luszczek, P., and Dongarra, J. High performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures. Tech. Rep. 247, LAPACK Working Note, May 2011. Submitted to ACM TOMS.Google ScholarGoogle Scholar
  16. Luszczek, P., Ltaief, H., and Dongarra, J. Two-stage tridiagonal reduction for dense symmetric matrices using tile algorithms on multicore architectures. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (2011). Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Murata, K., and Horikoshi, K. A new method for the tridiagonalization of the symmetric band matrix. Information Processing in Japan 15 (1975), 108--112.Google ScholarGoogle Scholar
  18. Rajamanickam, S. Efficient Algorithms for Sparse Singular Value Decomposition. PhD thesis, University of Florida, 2009.Google ScholarGoogle Scholar
  19. Rutishauser, H. On Jacobi rotation patterns. In Proceedings of Symposia in Applied Mathematics (1963), vol. 15, pp. 219--239.Google ScholarGoogle Scholar
  20. Schwarz, H. Algorithm 183: Reduction of a symmetric bandmatrix to triple diagonal form. Comm. ACM 6, 6 (June 1963), 315--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Schwarz, H. Tridiagonalization of a symmetric band matrix. Numerische Mathematik 12 (1968), 231--241.Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Communication avoiding successive band reduction

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 47, Issue 8
      PPOPP '12
      August 2012
      334 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/2370036
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '12: Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
        February 2012
        352 pages
        ISBN:9781450311601
        DOI:10.1145/2145816

      Copyright © 2012 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 25 February 2012

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!