Abstract
Double precision floating point Sparse Matrix-Vector Multiplication (SMVM) is a critical computational kernel used in iterative solvers for systems of sparse linear equations. The poor data locality exhibited by sparse matrices along with the high memory bandwidth requirements of SMVM result in poor performance on general purpose processors. Field Programmable Gate Arrays (FPGAs) offer a possible alternative with their customizable and application-targeted memory sub-system and processing elements. In this work we investigate two separate implementations of the SMVM on an SRC-6 MAPStation workstation. The first implementation investigates the peak performance capability, while the second implementation balances the amount of instantiated logic with the available sustained bandwidth of the FPGA subsystem. Both implementations yield the same sustained performance with the second producing a much more efficient solution. The metrics of processor and application balance are introduced to help provide some insight into the efficiencies of the FPGA and CPU based solutions explicitly showing the tight coupling of the available bandwidth to peak floating point performance. Due to the FPGAs ability to balance the amount of implemented logic to the available memory bandwidth it can provide a much more efficient solution. Finally, making use of the lessons learned implementing the SMVM, we present a fully implemented non-preconditioned Conjugate Gradient Algorithm utilizing the second SMVM design.
- Achronix Semiconductor Corporation. 2008. Speedster data sheet. http://www.achronix.com.Google Scholar
- Barrett, R., Berry, M., Chan, T., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and Ven der Vorst, H. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, PA.Google Scholar
- Delorimier, M. and Dehon, A. 2005. Floating-point sparse matrix-vector multiply for FPGAs: In Proceedings of the ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays. ACM Press, 75--85. Google Scholar
Digital Library
- Dubois, D., Dubois, A., Boorman, T., Connor, C., and Poole, S. 2008a. An implementation of the conjugate gradient algorithm on FPGAs. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’08). Google Scholar
Digital Library
- Dubois, D., Dubois, A., Connor, C., and Poole, S. 2008b. Sparse matrix-vector multiplication on a reconfigurable supercomputer. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’08). Google Scholar
Digital Library
- Fettig, J., Kwok, W.-Y., and Saied, F. 2002. Scaling Behavior of Linear Solvers on Large Linux Clusters. National Center for Supercomputing Applications at the University of Illinois at Urbana-Champaign, IL.Google Scholar
- Guo, Z., Najjar, W., Vahid, F., and Vissers, K. 2004. A quantitative analysis of the speedup factors of FPGAs over processors. In Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays. ACM, New York, 162--170. http://doi.acm.org/10.1145/968280.96830. Google Scholar
Digital Library
- Hennessy, J. and Patterson, D. 2003. Computer Architecture: A Quantitative Approach, 3rd Ed. Morgan Kaufmann.Google Scholar
Digital Library
- Ipsen, I. and Meyer, C. D. 1998. The idea behind Krylov methods. In Amer. Math. Mon. 105, 10, 889--899.Google Scholar
Cross Ref
- Maslennikow, O., Lepekha, V. and Sergyienko, A. 2006. FPGA implementation of the conjugate gradient method. Lecture Notes in Computer Science, vol. 3911, 526--533. Google Scholar
Digital Library
- McKee, S. A. 2004. Reflections on the memory wall. In Proceedings of the Computing Frontiers International Conference. www.csl.cornell.edu/~sam/papers/cf04.pdf. Google Scholar
Digital Library
- Mills, R. T., D’Azevedo, E. F., and Fahey, M. R. 2005. Progress towards optimizing the PETSc numerical toolkit on the Cray X1. Cray Users Group, http://www.ccs.ornl.gov/~rmills/pubs/cug2005.pdf.Google Scholar
- Morris, G. R., Prasanna, V. K., and Anderson, R. D. 2006. A hybrid approach for mapping conjugate gradient onto an FPGA-augmented reconfigurable supercomputer. In Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines (FCCM’06). IEEE Computer Society, 3--12. http://dx.doi.org/10.1109/FCCM.2006.8. Google Scholar
Digital Library
- Shewchuk, J. R. 1994. An introduction to the conjugate gradient method without the agonizing pain. Tech. rep. UMI Order Number: CS-94-125, Carnegie Mellon University. Google Scholar
Digital Library
- SRC Computers, Inc. 2008. Product page. http://www.srccomp.com/products/products.asp.Google Scholar
- SRC Computers, Inc. SRC C Programming Environment v. 2.1 Guide. SRC Computers, Inc.Google Scholar
- Sun, J., Peterson, G., and Storaasli, O. 2007. Sparse matrix-vector multiplication design on FPGAs. In Proceedings of the Field-Programmable Custom Computing Machines Conference. http://ft.ornl.gov/~olaf/pubs/FCCM07matrix.pdf. Google Scholar
Digital Library
- Toledo, S. 1997. Improving memory-system performance of sparse matrix-vector multiplication. IBM J. Res. Devel. 41, 6, 711--725. Google Scholar
Digital Library
- Underwood, K. 2004. FPGAs vs. CPUs: Trends in peak floating-point performance. In Proceedings of the ACM/SIGDA 12th International Symposium on Field Programmable Gate Arrays (FPGA’04). ACM, New York. http://doi.acm.org/10.1145/968280.968305. Google Scholar
Digital Library
- Wellein, G., Hager, G., and Zeiser, T. 2005. Basic principles of modern processors: Memory hierarchy optimization of data access. http://www.rrze.unierlangen.de/ausbildung/vorlesungen/04-25_2005_ptfs.pdf.Google Scholar
- Zhuo, L. and Prasanna, V. K. 2005. Sparse matrix-vector multiplication on FPGAs. In Proceedings of the ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays (FPGA’05). ACM, New York, 63--74. http://doi.acm.org/10.1145/1046192.1046202. Google Scholar
Digital Library
Index Terms
Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer with Application
Recommendations
Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer
FCCM '08: Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing MachinesDouble precision floating point Sparse Matrix-Vector Multiplication (SMVM) is a critical computational kernel used in iterative solvers for systems of sparse linear equations. The poor data locality exhibited by sparse matrices along with the high ...
An Implementation of the Conjugate Gradient Algorithm on FPGAs
FCCM '08: Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing MachinesThe conjugate gradient is a prominent iterative method for solving systems of sparse linear equations. Large-scale scientific applications often utilize a conjugate gradient solver at their computational core. Since a single iteration of a conjugate ...
CUDA-enabled Sparse Matrix-Vector Multiplication on GPUs using atomic operations
We propose the Sliced Coordinate Format (SCOO) for Sparse Matrix-Vector Multiplication on GPUs.An associated CUDA implementation which takes advantage of atomic operations is presented.We propose partitioning methods to transform a given sparse matrix ...






Comments