ABSTRACT
Field-programmable gate arrays (FPGAs) have garnered significant interest in high-performance computing research; their computational and communication capabilities have drastically improved in recent years owing to advances in semiconductor integration technologies. In addition to improving FPGA performance, toolchains for the development of FPGAs in OpenCL that reduce the amount of programming effort required have been developed and offered by FPGA vendors. These improvements reveal the possibility of implementing a concept that enables on-the-fly offloading of computational loads at which CPUs/GPUs perform poorly compared to FPGAs while moving data with low latency. We think that this concept is key to improving the performance of heterogeneous supercomputers that use accelerators such as the GPU. In this paper, we propose an approach for GPU--FPGA accelerated computing with the OpenCL programming framework that is based on the OpenCL-enabled GPU--FPGA DMA method and the FPGA-to-FPGA communication method. The experimental results demonstrate that our proposed method can enable GPUs and FPGAs to work together over different nodes.
References
- [n.d.]. Arria 10 Hard IP for PCI Express Avalon-MM with DMA.Google Scholar
- [n.d.]. Bittware A10PL4.Google Scholar
- N. Fujita, R. Kobayashi, Y. Yamaguchi, and T. Boku. 2019. Parallel Processing on FPGA Combining Computation and Communication in OpenCL Programming. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 479--488. https://doi.org/10.1109/IPDPSW.2019.00089Google Scholar
- T. Hanawa, Y. Kodama, T. Boku, and M. Sato. 2013. Interconnection Network for Tightly Coupled Accelerators Architecture. In 2013 IEEE 21st Annual Symposium on High-Performance Interconnects. 79--82. https://doi.org/10.1109/HOTI.2013.15Google Scholar
- R. Kobayashi, N. Fujita, Y. Yamaguchi, A. Nakamichi, and T. Boku. 2019. GPU-FPGA Heterogeneous Computing with OpenCL-Enabled Direct Memory Access. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 489--498. https://doi.org/10.1109/IPDPSW.2019.00090Google Scholar
- T. Kuhara, C. Tsuruta, T. Hanawa, and H. Amano. 2015. Reduction calculator in an FPGA based switching Hub for high performance clusters. In 2015 25th International Conference on Field Programmable Logic and Applications (FPL). 1--4. https://doi.org/10.1109/FPL.2015.7293985Google Scholar
Cross Ref
- Chiharu Tsuruta, Yohei Miki, Takuya Kuhara, Hideharu Amano, and Masayuki Umemura. 2016. Off-Loading LET Generation to PEACH2: A Switching Hub for High Performance GPU Clusters. SIGARCH Comput. Archit. News 43, 4 (April 2016), 3--8. https://doi.org/10.1145/2927964.2927966Google Scholar
Digital Library


Ryohei Kobayashi
Taisuke Boku

Comments