Abstract
Scientific computing is at the core of many High-Performance Computing applications, including computational flow dynamics. Because of the utmost importance to simulate increasingly larger computational models, hardware acceleration is receiving increased attention due to its potential to maximize the performance of scientific computing. Field-Programmable Gate Arrays could accelerate scientific computing because of the possibility to fully customize the memory hierarchy important in irregular applications such as iterative linear solvers. In this article, we study the potential of using Field-Programmable Gate Arrays in High-Performance Computing because of the rapid advances in reconfigurable hardware, such as the increase in on-chip memory size, increasing number of logic cells, and the integration of High-Bandwidth Memories on board. To perform this study, we propose a novel Sparse Matrix-Vector multiplication unit and an ILU0 preconditioner tightly integrated with a BiCGStab solver kernel. We integrate the developed preconditioned iterative solver in Flow from the Open Porous Media project, a state-of-the-art open source reservoir simulator. Finally, we perform a thorough evaluation of the FPGA solver kernel in both stand-alone mode and integrated in the reservoir simulator, using the NORNE field, a real-world case reservoir model using a grid with more than 105 cells and using three unknowns per cell.
- [1] NVIDIA. n.d. NVIDIA Nsight Compute Command Line Interface. Retrieved November 3, 2021 from https://docs.nvidia.com/nsight-compute/NsightComputeCli/index.html.Google Scholar
- [2] . 2020. AMD to Acquire Xilinx. Retrieved November 3, 2021 from https://www.amd.com/en/press-releases/2020-10-27-amd-to-acquire-xilinx-creating-the-industry-s-high-performance-computing.Google Scholar
- [3] . 1983. Nested factorization. In Proceedings of the SPE Reservoir Simulation Symposium.Google Scholar
- [4] . 2021. The Dune framework: Basic concepts and recent developments. Computers & Mathematics with Applications 81 (2021), 75–112. https://doi.org/10.1016/j.camwa.2020.06.007Google Scholar
Cross Ref
- [5] . 2015. Fine-grained parallel incomplete LU factorization. SIAM Journal on Scientific Computing 37, 2 (2015), C169–C193. https://doi.org/10.1137/140968896 arXiv: https://doi.org/10.1137/140968896Google Scholar
Cross Ref
- [6] . 2014. An efficient sparse conjugate gradient solver using a Beneš permutation network. In Proceedings of the 2014 24th International Conference on Field Programmable Logic and Applications (FPL’14). IEEE, Los Alamitos, CA, 1–7.Google Scholar
Cross Ref
- [7] . 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software 38, 1 (Dec. 2011), Article 1, 25 pages. https://doi.org/10.1145/2049662.2049663 Google Scholar
Digital Library
- [8] . 2014. A scalable sparse matrix-vector multiplication kernel for energy-efficient sparse-blas on FPGAs. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 161–170. Google Scholar
Digital Library
- [9] . 2020. Dune Project. Retrieved November 3, 2021 from https://dune-project.org/.Google Scholar
- [10] . 2016. Optimization of block sparse matrix-vector multiplication on shared-memory parallel architectures. In Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW’16). 663–672. https://doi.org/10.1109/IPDPSW.2016.42Google Scholar
Cross Ref
- [11] 2020. PETSc Users Manual.
Technical Report ANL-95/11—Revision 3.14. Argonne National Laboratory. https://www.mcs.anl.gov/petsc.Google ScholarCross Ref
- [12] . 2014. A high memory bandwidth FPGA accelerator for sparse matrix-vector multiplication. In Proceedings of the 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, Los Alamitos, CA, 36–43. Google Scholar
Digital Library
- [13] . 2015. Intel Acquisition of Altera. Retrieved November 3, 2021 from https://newsroom.intel.com/press-kits/intel-acquisition-of-altera.Google Scholar
- [14] . 1993. A parallel graph coloring heuristic. SIAM Journal of Scientific Computing 14-3 (1993), 654–669. Google Scholar
Digital Library
- [15] . 2010. Automatically tuning sparse matrix-vector multiplication for GPU architectures. In High Performance Embedded Architectures and Compilers, , , , , and (Eds.). Springer, Berlin, Germany, 111–125. Google Scholar
Digital Library
- [16] . 2006. A hybrid approach for mapping conjugate gradient onto an FPGA-augmented reconfigurable supercomputer. In Proceedings of the 2006 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines. IEEE, Los Alamitos, CA, 3–12. Google Scholar
Digital Library
- [17] . 2011. OpenCL Programming Guide.Addison-Wesley Professional. Google Scholar
Digital Library
- [18] . 2020. The API Reference Guide for cuSPARSE, the CUDA Sparse Matrix Library. Retrieved November 3, 2021 from https://docs.nvidia.com/cuda/cusparse/index.html.Google Scholar
- [19] . 2020. Open Porous Media Project. Retrieved November 3, 2021 from https://github.com/OPM.Google Scholar
- [20] . 2020. Open Porous Media Reservoir Simulator. Retrieved November 3, 2021 from https://github.com/OPM/opm-simulators.Google Scholar
- [21] . 2020. Open Porous Media Reservoir Simulator—FPGA Kernels. Retrieved November 3, 2021 from https://github.com/OPM/FPGA.Google Scholar
- [22] . 2020. OPM Tests. Retrieved November 3, 2021 from https://github.com/OPM/opm-tests/tree/master/norne.Google Scholar
- [23] . 2021. Scaling up HBM efficiency of top-K SpMV for approximate embedding similarity on FPGAs. CoRR abs/2103.04808 (2021). arXiv:2103.04808 https://arxiv.org/abs/2103.04808.Google Scholar
- [24] . 2020. Modified compressed sparse row format for accelerated FPGA-based sparse matrix multiplication. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). 1–5. https://doi.org/10.1109/ISCAS45731.2020.9181266Google Scholar
Cross Ref
- [25] . 2021. The Open Porous Media Flow reservoir simulator. Computers & Mathematics with Applications 81 (2021), 159–185. https://doi.org/10.1016/j.camwa.2020.05.014Google Scholar
Cross Ref
- [26] . 2003. Iterative Methods for Sparse Linear Systems. Society for Industrial and Applied Mathematics. Google Scholar
Digital Library
- [27] . 2019. Sparstition: A partitioning scheme for large-scale sparse matrix vector multiplication on FPGA. In Proceedings of the I nternational Conference on Application-Specific Systems, Architectures, and Processors (ASAP’19).51–58.Google Scholar
- [28] . 2013. Multi-GPU parallelization of nested factorization for solving large linear systems. In Proceedings of the SPE Reservoir Simulation Symposium.Google Scholar
- [29] . 2001. Evaluation of the BICGSTAB (l) algorithm for the finite-element/boundary-integral method. IEEE Antennas and Propagation Magazine 43, 6 (2001), 124–131.Google Scholar
Cross Ref
- [30] . 2018. Accelerating Sparse Linear Algebra and Deep Neural Networks on Reconfigurable Platforms. Ph.D. Dissertation. NTNU.Google Scholar
- [31] . 1992. Bi-CGSTAB: A fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing 13, 2 (1992), 631–644. https://doi.org/10.1137/0913035 arXiv: https://doi.org/10.1137/0913035 Google Scholar
Digital Library
- [32] . 2013. High-performance architecture for the conjugate gradient solver on FPGAs. IEEE Transactions on Circuits and Systems II: Express Briefs 60, 11 (2013), 791–795.Google Scholar
Cross Ref
- [33] . 2020. Alveo U280 Data Center Accelerator Card. Retrieved November 3, 2021 from https://www.xilinx.com/products/boards-and-kits/alveo/u280.html.Google Scholar
- [34] . 2020. Ultra RAM. Retrieved November 3, 2021 from https://www.xilinx.com/support/documentation/white_papers/wp477-ultraram.pdf.Google Scholar
- [35] 2012. GPU-based parallel reservoir simulation for large-scale simulation problems. In Proceedings of the SPE Europec/EAGE Annual Conference.Google Scholar
Cross Ref
- [36] . 2005. Sparse matrix-vector multiplication on FPGAs. In Proceedings of the 2005 ACM/SIGDA 13th International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 63–74. Google Scholar
Digital Library
Index Terms
Hardware Acceleration of High-Performance Computational Flow Dynamics Using High-Bandwidth Memory-Enabled Field-Programmable Gate Arrays
Recommendations
Reconfigurable Processing With Field Programmable Gate Arrays
ASAP '96: Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and ProcessorsIn-system-programmable, SRAM-based Field Programmable Gate Arrays (FPGAs) can be used to create processors and coprocessors whose internal architecture as well as interconnections can be reconfigured to match the needs of a given application. Exploiting ...
An Automated Design Framework for Floating Point Scientific Algorithms using Field Programmable Gate Arrays (FPGAs) (Abstract Only)
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate ArraysThis paper presents a reconfigurable computing environment while addressing the problem of porting High Performance Computing (HPC) applications directly to Field Programmable Gate Arrays (FPGAs)-based architectures. The objectives of this research are ...






Comments