ABSTRACT
Systems with heterogeneous architectures, such as field programmable gate arrays (FPGAs) and graphics processing units (GPUs), are expected to boost the throughput in high-performance computing applications. However, the user may struggle to write sophisticated programs that are performed effectively on complicated heterogeneous systems. High-level synthesis (HLS) is a solution that can reduce the development time. This study focuses on the implementation of OpenCL programming as a type of HLS design on FPGA boards. The Himeno benchmark, which is a suitable benchmark for the measurement of memory-intensive applications, is chosen as a verification program. We found that the OpenCL-based implementation achieves reasonable performance on FPGAs by demonstrating the implementation of temporal blocking combined with shift register implementation simultaneously. For Stratix V DE5-Net FPGA, our current implementation achieves 10.62 GFLOPS, or 75% of the theoretical performance. Meanwhile, for Arria 10 A10PL4 FPGA, the peak performance reaches 13.95 GFLOPS, or 76% of the theoretical performance.
References
- Naruse, A., Sumimoto, S., Kumon, K. Acceleration Technique of Computational Fluid Dynanics on GPGPU - Over 60 GFLOPS Himeno Benchmark Performance on 1 GPU. IPSJ SIG Technical Reports, no. 99, pp. 49--54, 2008.Google Scholar
- Matsuoka et al, A. GPU accelerated computingâĂŞfrom hype to mainstream, the rebirth of vector computing. Journal of Physics, vol. 180, no. 1, 2009.Google Scholar
- Phillips, E.H., Fatica, M. Implementing the Himeno benchmark with CUDA on GPU clusters. IEEE Intl. Symposium on Parallel Distributed Processing, 2010.Google Scholar
Cross Ref
- Sato, Y., Inoguchi, Y., Luk, W., Nakamura, T. Evaluating reconfigurable dataflow computing using the Himeno benchmark. International Conference on Reconfigurable Computing and FPGAs, 2012.Google Scholar
Cross Ref
- Korch, M., Rauber, T., Scholtes, C. Memory-Intensive Applications on a Many-Core Processor. IEEE International Conference on High Performance Computing and Communications, 2011. Google Scholar
Digital Library
- Kobayashi, H. Feasibility Study of a Future HPC System for Memory-Intensive Applications. Sustained Simulation Performance. Springer, Cham, 2014.Google Scholar
- Himeno Benchmark, http://accc.riken.jp/.Google Scholar
- Waidyasooriya, H.M., Hariyama, M., Uchiyama, K.. Design of FPGA-Based Computing Systems with OpenCL. Springer International Publishing AG 2018. Google Scholar
Digital Library
- Jia, Q., Zhou, H. Tuning Stencil codes in OpenCL for FPGAs. IEEE Intl. Conference on Computer Design, 2016.Google Scholar
Cross Ref
- Waidyasooriya, H.M et al, OpenCL-Based FPGA-Platform for Stencil Computation and Its Optimization Methodology. IEEE Trans. on Parallel and Distributed Systems, 2016. Google Scholar
Digital Library
- Zohouri, H.R., Podobas, A., Matsuoka, S., Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL. International Symposium on Field-Programmable Gate Arrays, 2018. Google Scholar
Digital Library


Taisuke Boku

Comments