10.1145/3373271.3373275acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicpsprocConference Proceedings
short-paper

OpenCL-enabled GPU-FPGA Accelerated Computing with Inter-FPGA Communication

ABSTRACT

Field-programmable gate arrays (FPGAs) have garnered significant interest in high-performance computing research; their computational and communication capabilities have drastically improved in recent years owing to advances in semiconductor integration technologies. In addition to improving FPGA performance, toolchains for the development of FPGAs in OpenCL that reduce the amount of programming effort required have been developed and offered by FPGA vendors. These improvements reveal the possibility of implementing a concept that enables on-the-fly offloading of computational loads at which CPUs/GPUs perform poorly compared to FPGAs while moving data with low latency. We think that this concept is key to improving the performance of heterogeneous supercomputers that use accelerators such as the GPU. In this paper, we propose an approach for GPU--FPGA accelerated computing with the OpenCL programming framework that is based on the OpenCL-enabled GPU--FPGA DMA method and the FPGA-to-FPGA communication method. The experimental results demonstrate that our proposed method can enable GPUs and FPGAs to work together over different nodes.

References

  1. [n.d.]. Arria 10 Hard IP for PCI Express Avalon-MM with DMA.Google ScholarGoogle Scholar
  2. [n.d.]. Bittware A10PL4.Google ScholarGoogle Scholar
  3. N. Fujita, R. Kobayashi, Y. Yamaguchi, and T. Boku. 2019. Parallel Processing on FPGA Combining Computation and Communication in OpenCL Programming. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 479--488. https://doi.org/10.1109/IPDPSW.2019.00089Google ScholarGoogle Scholar
  4. T. Hanawa, Y. Kodama, T. Boku, and M. Sato. 2013. Interconnection Network for Tightly Coupled Accelerators Architecture. In 2013 IEEE 21st Annual Symposium on High-Performance Interconnects. 79--82. https://doi.org/10.1109/HOTI.2013.15Google ScholarGoogle Scholar
  5. R. Kobayashi, N. Fujita, Y. Yamaguchi, A. Nakamichi, and T. Boku. 2019. GPU-FPGA Heterogeneous Computing with OpenCL-Enabled Direct Memory Access. In 2019 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 489--498. https://doi.org/10.1109/IPDPSW.2019.00090Google ScholarGoogle Scholar
  6. T. Kuhara, C. Tsuruta, T. Hanawa, and H. Amano. 2015. Reduction calculator in an FPGA based switching Hub for high performance clusters. In 2015 25th International Conference on Field Programmable Logic and Applications (FPL). 1--4. https://doi.org/10.1109/FPL.2015.7293985Google ScholarGoogle ScholarCross RefCross Ref
  7. Chiharu Tsuruta, Yohei Miki, Takuya Kuhara, Hideharu Amano, and Masayuki Umemura. 2016. Off-Loading LET Generation to PEACH2: A Switching Hub for High Performance GPU Clusters. SIGARCH Comput. Archit. News 43, 4 (April 2016), 3--8. https://doi.org/10.1145/2927964.2927966Google ScholarGoogle ScholarDigital LibraryDigital Library

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in
  • Article Metrics

    • Downloads (Last 12 months)109
    • Downloads (Last 6 weeks)5

    Other Metrics

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!