Abstract
Computing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [Morris et al. 2006; Zhang et al. 2008; Zhuo and Prasanna 2006]. One class of algorithms to solve these systems, iterative methods, has drawn particular interest, with recent literature showing large performance improvements over General-Purpose Processors (GPPs) [Lopes and Constantinides 2008]. In several iterative methods, this performance gain is largely a result of parallelization of the matrix-vector multiplication, an operation that occurs in many applications and hence has also been widely studied on FPGAs [Zhuo and Prasanna 2005; El-Kurdi et al. 2006]. However, whilst the performance of matrix-vector multiplication on FPGAs is generally I/O bound [Zhuo and Prasanna 2005], the nature of iterative methods allows the use of on-chip memory buffers to increase the bandwidth, providing the potential for significantly more parallelism [deLorimier and DeHon 2005]. Unfortunately, existing approaches have generally only either been capable of solving large matrices with limited improvement over GPPs [Zhuo and Prasanna 2005; El-Kurdi et al. 2006; deLorimier and DeHon 2005], or achieve high performance for relatively small matrices [Lopes and Constantinides 2008; Boland and Constantinides 2008]. This article proposes hardware designs to take advantage of symmetrical and banded matrix structure, as well as methods to optimize the RAM use, in order to both increase the performance and retain this performance for larger-order matrices.
- Barrett, R., Berry, M., Chan, T. F., Demmel, J., Donato, J., Dongarra, J., Eijkhout, V., Pozo, R., Romine, C., and der Vorst, H. V. 1994. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods, 2nd Ed. SIAM, Philadelphia, PA.Google Scholar
- Boland, D. and Constantinides, G. 2008. An FPGA-based implementation of the MINRES algorithm. In Proceedings of the International Conference on Field Programmable Logic and Applications. 379--384.Google Scholar
- Boland, D. and Constantinides, G. 2010. Optimising memory bandwidth use for matrix-vector multiplication in iterative methods. In Proceedings of the International Symposium on Applied Reconfigurable Computing. 169--181. Google Scholar
Digital Library
- deLorimier, M. and DeHon, A. 2005. Floating-Point sparse matrix-vector multiply for FPGAs. In Proceedings of the ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays. ACM, New York, 75--85. Google Scholar
Digital Library
- El-Kurdi, Y., Gross, W. J., and Giannacopoulos, D. 2006. Sparse matrix-vector multiplication for finite element method matrices on FPGAs. In Proceedings of the International Symposium on Field-Programmable Custom Computing Machines. 293--294. Google Scholar
Digital Library
- Golub, G. H. and Loan, C. F. V. 1996. Matrix Computations, 3rd Ed. Johns Hopkins University Press, Baltimore, MD. Google Scholar
Digital Library
- Heath, M. T. 2001. Scientific Computing. McGraw-Hill Higher Education. Google Scholar
Digital Library
- Hoekstra, A. G., Sloot, P., Hoffmann, W., and Hertzberger, L. 1992. Time complexity of a parallel conjugate gradient solver for light scattering simulations: Theory and spmd implementation. Tech. rep., University of Amsterdam.Google Scholar
- Ilog, Inc. 2009. Solver cplex. http://www.ilog.fr/products/cplex/.Google Scholar
- Lopes, A., Constantinides, G., and Kerrigan, E. C. 2008. A floating-point solver for band structured linear equations. In Proceedings of the International Conference on Field Programmable Technology. 353--356.Google Scholar
- Lopes, A. R. and Constantinides, G. A. 2008. A high throughput FPGA-based floating point conjugate gradient implementation. In Proceedings of the Applied Reconfigurable Recomputing. 75--86. Google Scholar
Digital Library
- Morris, G. R., Prasanna, V. K., and Anderson, R. D. 2006. A hybrid approach for mapping conjugate gradient onto an FPGA-augmented reconfigurable supercomputer. In Proceedings of the 14th IEEE Symposium on Field-Programmable Custom Computing Machines. 3--12. Google Scholar
Digital Library
- Sewell, G. 1988. The Numerical Solution of Ordinary and Partial Differential Equations. Academic Press Professional, San Diego, CA. Google Scholar
Digital Library
- Winston, W. L. 2003. Introduction to Mathematical Programming: Applications and Algorithms. Duxbury Resource Center. Google Scholar
Digital Library
- Xilinx. 2010. Virtex-5 FPGA User Guide. http://www.xilinx.com/support/documentation/user-guides/ug190.pdf.Google Scholar
- Zhang, W., Betz, V., and Rose, J. 2008. Portable and scalable FPGA-based acceleration of a direct linear system solver. In Proceedings of the International Conference on Field Programmable Technology. 17--24.Google Scholar
- Zhuo, L., Morris, G. R., and Prasanna, V. K. 2007. High-Performance reduction circuits using deeply pipelined operators on FPGAs. IEEE Trans. Parall. Distrib. Syst. 18, 10, 1377--1392. Google Scholar
Digital Library
- Zhuo, L. and Prasanna, V. K. 2005. Sparse matrix-vector multiplication on FPGAs. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. ACM, New York, 63--74. Google Scholar
Digital Library
- Zhuo, L. and Prasanna, V. K. 2006. High-Performance and parameterized matrix factorization on fpgas. In Proceedings of the International Conference on Field Programmable Logic and Applications. 1--6.Google Scholar
Index Terms
Optimizing memory bandwidth use and performance for matrix-vector multiplication in iterative methods
Recommendations
Optimising memory bandwidth use for matrix-vector multiplication in iterative methods
ARC'10: Proceedings of the 6th international conference on Reconfigurable Computing: architectures, Tools and ApplicationsComputing the solution to a system of linear equations is a fundamental problem in scientific computing, and its acceleration has drawn wide interest in the FPGA community [1, 2, 3]. One class of algorithms to solve these systems, iterative methods, has ...
Sparse Matrix-Vector Multiplication on a Reconfigurable Supercomputer
FCCM '08: Proceedings of the 2008 16th International Symposium on Field-Programmable Custom Computing MachinesDouble precision floating point Sparse Matrix-Vector Multiplication (SMVM) is a critical computational kernel used in iterative solvers for systems of sparse linear equations. The poor data locality exhibited by sparse matrices along with the high ...
Practical Use of Some Krylov Subspace Methods for Solving Indefinite and Nonsymmetric Linear Systems
The main purpose of this paper is to develop stable versions of some Krylov subspace methods for solving linear systems of equations $Ax = b$. As in the case of Paige and Saunders's SYMMLQ [SIAM J. Numer. Anal., 12 (1975), pp. 617–624], our algorithms ...






Comments