skip to main content
research-article

Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application

Published:01 January 2010Publication History
Skip Abstract Section

Abstract

Double precision floating point Sparse Matrix-Vector Multiplication (SMVM) is a critical computational kernel used in iterative solvers for systems of sparse linear equations. The poor data locality exhibited by sparse matrices along with the high memory bandwidth requirements of SMVM result in poor performance on general purpose processors. Field Programmable Gate Arrays (FPGAs) offer a possible alternative with their customizable and application-targeted memory sub-system and processing elements. In this work we investigate two separate implementations of the SMVM on an SRC-6 MAPStation workstation. The first implementation investigates the peak performance capability, while the second implementation balances the amount of instantiated logic with the available sustained bandwidth of the FPGA subsystem. Both implementations yield the same sustained performance with the second producing a much more efficient solution. The metrics of processor and application balance are introduced to help provide some insight into the efficiencies of the FPGA and CPU based solutions explicitly showing the tight coupling of the available bandwidth to peak floating point performance. Due to the FPGAs ability to balance the amount of implemented logic to the available memory bandwidth it can provide a much more efficient solution. Finally, making use of the lessons learned implementing the SMVM, we present a fully implemented non-preconditioned Conjugate Gradient Algorithm utilizing the second SMVM design.

References

  1. Achronix Semiconductor Corporation. 2008. Speedster data sheet. http://www.achronix.com.Google ScholarGoogle Scholar
  2. Barrett, R., Berry, M., Chan, T., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and Ven der Vorst, H. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, PA.Google ScholarGoogle Scholar
  3. Delorimier, M. and Dehon, A. 2005. Floating-point sparse matrix-vector multiply for FPGAs: In Proceedings of the ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays. ACM Press, 75--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Dubois, D., Dubois, A., Boorman, T., Connor, C., and Poole, S. 2008a. An implementation of the conjugate gradient algorithm on FPGAs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dubois, D., Dubois, A., Connor, C., and Poole, S. 2008b. Sparse matrix-vector multiplication on a reconfigurable supercomputer. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Fettig, J., Kwok, W.-Y., and Saied, F. 2002. Scaling Behavior of Linear Solvers on Large Linux Clusters. National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, IL.Google ScholarGoogle Scholar
  7. Guo, Z., Najjar, W., Vahid, F., and Vissers, K. 2004. A quantitative analysis of the speedup factors of FPGAs over processors. In Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays. ACM, New York, 162--170. http://doi.acm.org/10.1145/968280.96830. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Hennessy, J. and Patterson, D. 2003. Computer Architecture: A Quantitative Approach, 3rd Ed. Morgan Kaufmann.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ipsen, I. and Meyer, C. D. 1998. The idea behind Krylov methods. In Amer. Math. Mon. 105, 10, 889--899.Google ScholarGoogle ScholarCross RefCross Ref
  10. Maslennikow, O., Lepekha, V. and Sergyienko, A. 2006. FPGA implementation of the conjugate gradient method. Lecture Notes in Computer Science, vol. 3911, 526--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. McKee, S. A. 2004. Reflections on the memory wall. In Proceedings of the Computing Frontiers International Conference. www.csl.cornell.edu/~sam/papers/cf04.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mills, R. T., D’Azevedo, E. F., and Fahey, M. R. 2005. Progress towards optimizing the PETSc numerical toolkit on the Cray X1. Cray Users Group, http://www.ccs.ornl.gov/~rmills/pubs/cug2005.pdf.Google ScholarGoogle Scholar
  13. Morris, G. R., Prasanna, V. K., and Anderson, R. D. 2006. A hybrid approach for mapping conjugate gradient onto an FPGA-augmented reconfigurable supercomputer. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). IEEE Computer Society, 3--12. http://dx.doi.org/10.1109/FCCM.2006.8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Shewchuk, J. R. 1994. An introduction to the conjugate gradient method without the agonizing pain. Tech. rep. UMI Order Number: CS-94-125, Carnegie Mellon University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. SRC Computers, Inc. 2008. Product page. http://www.srccomp.com/products/products.asp.Google ScholarGoogle Scholar
  16. SRC Computers, Inc. SRC C Programming Environment v. 2.1 Guide. SRC Computers, Inc.Google ScholarGoogle Scholar
  17. Sun, J., Peterson, G., and Storaasli, O. 2007. Sparse matrix-vector multiplication design on FPGAs. In Proceedings of the Field-Programmable Custom Computing Machines Conference. http://ft.ornl.gov/~olaf/pubs/FCCM07matrix.pdf. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Toledo, S. 1997. Improving memory-system performance of sparse matrix-vector multiplication. IBM J. Res. Devel. 41, 6, 711--725. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Underwood, K. 2004. FPGAs vs. CPUs: Trends in peak floating-point performance. In Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA’04). ACM, New York. http://doi.acm.org/10.1145/968280.968305. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Wellein, G., Hager, G., and Zeiser, T. 2005. Basic principles of modern processors: Memory hierarchy optimization of data access. http://www.rrze.unierlangen.de/ausbildung/vorlesungen/04-25_2005_ptfs.pdf.Google ScholarGoogle Scholar
  21. Zhuo, L. and Prasanna, V. K. 2005. Sparse matrix-vector multiplication on FPGAs. In Proceedings of the ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays (FPGA’05). ACM, New York, 63--74. http://doi.acm.org/10.1145/1046192.1046202. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application

                                Recommendations

                                Comments

                                Login options

                                Check if you have access through your login credentials or your institution to get full access on this article.

                                Sign in

                                Full Access

                                PDF Format

                                View or Download as a PDF file.

                                PDF

                                eReader

                                View online with eReader.

                                eReader
                                About Cookies On This Site

                                We use cookies to ensure that we give you the best experience on our website.

                                Learn more

                                Got it!