10.1145/3337801.3337806acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicpsprocConference Proceedings
poster

FPGA-based Implementation of Memory-Intensive Application using OpenCL

ABSTRACT

Systems with heterogeneous architectures, such as field programmable gate arrays (FPGAs) and graphics processing units (GPUs), are expected to boost the throughput in high-performance computing applications. However, the user may struggle to write sophisticated programs that are performed effectively on complicated heterogeneous systems. High-level synthesis (HLS) is a solution that can reduce the development time. This study focuses on the implementation of OpenCL programming as a type of HLS design on FPGA boards. The Himeno benchmark, which is a suitable benchmark for the measurement of memory-intensive applications, is chosen as a verification program. We found that the OpenCL-based implementation achieves reasonable performance on FPGAs by demonstrating the implementation of temporal blocking combined with shift register implementation simultaneously. For Stratix V DE5-Net FPGA, our current implementation achieves 10.62 GFLOPS, or 75% of the theoretical performance. Meanwhile, for Arria 10 A10PL4 FPGA, the peak performance reaches 13.95 GFLOPS, or 76% of the theoretical performance.

References

  1. Naruse, A., Sumimoto, S., Kumon, K. Acceleration Technique of Computational Fluid Dynanics on GPGPU - Over 60 GFLOPS Himeno Benchmark Performance on 1 GPU. IPSJ SIG Technical Reports, no. 99, pp. 49--54, 2008.Google ScholarGoogle Scholar
  2. Matsuoka et al, A. GPU accelerated computingâĂŞfrom hype to mainstream, the rebirth of vector computing. Journal of Physics, vol. 180, no. 1, 2009.Google ScholarGoogle Scholar
  3. Phillips, E.H., Fatica, M. Implementing the Himeno benchmark with CUDA on GPU clusters. IEEE Intl. Symposium on Parallel Distributed Processing, 2010.Google ScholarGoogle ScholarCross RefCross Ref
  4. Sato, Y., Inoguchi, Y., Luk, W., Nakamura, T. Evaluating reconfigurable dataflow computing using the Himeno benchmark. International Conference on Reconfigurable Computing and FPGAs, 2012.Google ScholarGoogle ScholarCross RefCross Ref
  5. Korch, M., Rauber, T., Scholtes, C. Memory-Intensive Applications on a Many-Core Processor. IEEE International Conference on High Performance Computing and Communications, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Kobayashi, H. Feasibility Study of a Future HPC System for Memory-Intensive Applications. Sustained Simulation Performance. Springer, Cham, 2014.Google ScholarGoogle Scholar
  7. Himeno Benchmark, http://accc.riken.jp/.Google ScholarGoogle Scholar
  8. Waidyasooriya, H.M., Hariyama, M., Uchiyama, K.. Design of FPGA-Based Computing Systems with OpenCL. Springer International Publishing AG 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jia, Q., Zhou, H. Tuning Stencil codes in OpenCL for FPGAs. IEEE Intl. Conference on Computer Design, 2016.Google ScholarGoogle ScholarCross RefCross Ref
  10. Waidyasooriya, H.M et al, OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology. IEEE Trans. on Parallel and Distributed Systems, 2016. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Zohouri, H.R., Podobas, A., Matsuoka, S., Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL. International Symposium on Field-Programmable Gate Arrays, 2018. Google ScholarGoogle ScholarDigital LibraryDigital Library

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!