skip to main content
research-article

General-Purpose Computing with Soft GPUs on FPGAs

Published:24 January 2018Publication History
Skip Abstract Section

Abstract

Using field-programmable gate arrays (FPGAs) as a substrate to deploy soft graphics processing units (GPUs) would enable offering the FPGA compute power in a very flexible GPU-like tool flow. Application-specific adaptations like selective hardening of floating-point operations and instruction set subsetting would mitigate the high area and power demands of soft GPUs. This work explores the capabilities and limitations of soft General Purpose Computing on GPUs (GPGPU) for both fixed- and floating point arithmetic. For this purpose, we have developed FGPU: a configurable, scalable, and portable GPU architecture designed especially for FPGAs. FGPU is open-source and implemented entirely in RTL. It can be programmed in OpenCL and controlled through a Python API. This article introduces its hardware architecture as well as its tool flow. We evaluated the proposed GPGPU approach against multiple other solutions. In comparison to homogeneous Multi-Processor System-On-Chips (MPSoCs), we found that using a soft GPU is a Pareto-optimal solution regarding throughput per area and energy consumption. On average, FGPU has a 2.9× better compute density and 11.2× less energy consumption than a single MicroBlaze processor when computing in IEEE-754 floating-point format. An average speedup of about 4× over the ARM Cortex-A9 supported with the NEON vector co-processor has been measured for fixed- or floating-point benchmarks. In addition, the biggest FGPU cores we could implement on a Xilinx Zynq-7000 System-On-Chip (SoC) can deliver similar performance to equivalent implementations with High-Level Synthesis (HLS).

References

  1. A. Al-Dujaili et al. 2012. Guppy: A GPU-like soft-core processor. In Proceedings of the International Conference on Field-Programmable Technology (FPT’12). 57--60.Google ScholarGoogle Scholar
  2. Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2016. FGPU: An SIMT-architecture for FPGAs (FPGA’16). ACM, New York, NY, 254--263. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2017. Floating-point arithmetic using GPGPU on FPGAs. In Proceedings of the 2017 IEEE Computer Society Annual Symposium on VLSI (ISVLSI’17).Google ScholarGoogle ScholarCross RefCross Ref
  4. Altera Corp. Dec. 2015. Stratix 10 Device Overview. Initial Release.Google ScholarGoogle Scholar
  5. AMD, Inc. 2017. ADM Accelerated Parallel Processing SDK v3.0. Retrieved from http://developer.amd.com/amd-accelerated-parallel-processing-app-sdk/.Google ScholarGoogle Scholar
  6. K. Andryc, M. Merchant, and R. Tessier. 2013. FlexGrip: A soft GPGPU for FPGAs. In Proceedings of the 2013 International Conference on Field-Programmable Technology (FPT’13). 230--237.Google ScholarGoogle Scholar
  7. K. Andryc, T. Thomas, and R. Tessier. 2016. Soft GPGPUs for embedded FPGAs: An architectural evaluation. In Proceedings of the 2016 Second Workshop on Overlay Architectures for FPGAs (OLAF’16).Google ScholarGoogle Scholar
  8. Raghuraman Balasubramanian et al. 2015. Enabling GPGPU low-level hardware explorations with MIAOW: An open-source RTL implementation of a GPGPU. ACM Trans. Archit. Code Optim. 12, 2, Article 21 (June 2015). Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. J. Bush, P. Dexter, and T. N. Miller. 2015. Nyami: A synthesizable GPU architectural model for general-purpose and graphics-specific workloads. In Proceedings of the 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’15). 173--182.Google ScholarGoogle Scholar
  10. D. W. Chang et al. 2010. ERCBench: An open-source benchmark suite for embedded and reconfigurable computing. In Proceedings of the 2010 International Conference on Field Programmable Logic and Applications. 408--413. 1946-147X Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Diego Valverde. 2011. Theia: Ray Graphic Processing Unit. Retrieved from opencores.com/project,theia_gpu.Google ScholarGoogle Scholar
  12. M. Al Kadi and M. Huebner. 2016. Integer computations with soft GPGPU on FPGAs. In Proceedings of the 2016 International Conference on Field-Programmable Technology (FPT’16). 28--35.Google ScholarGoogle Scholar
  13. Nachiket Kapre. 2016. Optimizing soft vector processing in FPGA-based embedded systems. ACM Trans. Reconfigurable Technol. Syst. 9, 3, Article 17 (May 2016). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Khronos Group. 2012. OpenCL 1.2 Specification. https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf.Google ScholarGoogle Scholar
  15. J. Kingyens and J. Gregory Steffan. 2010. A GPU-inspired soft processor for high-throughput acceleration. In Proceedings of the 2010 IEEE International Symposium on Parallel Distributed Processing, Workshops and Phd Forum (IPDPSW’10). 1--8.Google ScholarGoogle Scholar
  16. C. Lattner and V. Adve. 2004. LLVM: A compilation framework for lifelong program analysis transformation. In Proceedings of the International Symposium on Code Generation and Optimization (CGO’04). 75--86. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. T. Miller. 2016. OpenShader: Open Architecture GPU Simulator and Implementation. Retrieved from sourceforge.net/projects/openshader.Google ScholarGoogle Scholar
  18. Muhammed Al Kadi. 2017. FGPU Demo using PYNQ on the Xilinx ZC706. Retrieved from https://github.com/malkadi/FGPU_IPython.Google ScholarGoogle Scholar
  19. Muhammed Al Kadi. 2017. The FGPU Project. Retrieved from https://github.com/malkadi/FGPU.Google ScholarGoogle Scholar
  20. R. Rashid, J. G. Steffan, and V. Betz. 2014. Comparing performance, productivity and scalability of the TILT overlay processor to OpenCL HLS. In Proceedings of the IEEE International Conference on Field-Programmable Technology (FPT’14). 20--27.Google ScholarGoogle Scholar
  21. A. Severance and G. G. F. Lemieux. 2013. Embedded Supercomputing in FPGAs with the vectorblox MXP matrix processor. In Proceedings of the 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’13). 1--10. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. VectorBlox Computing, Inc. 2017. The MXP Vector Matrix Processor Repository. Retrieved from https://github.com/VectorBlox/mxp.Google ScholarGoogle Scholar
  23. Xilinx, Inc. 2015. AXI DMA, LogiCORE IP Product Guide (PG021, v7.1). https://www.xilinx.com/support/documentation/ipdocumentation/axidma/v71/pg021axidma.pdf.Google ScholarGoogle Scholar
  24. Xilinx, Inc. 2015. Floating-Point Operator v7.1, LogiCORE IP Product Guide (PG060). https://www.xilinx.com/support/documentation/ipdocumentation/floatingpoint/v71/pg060-floating-point.pdf.Google ScholarGoogle Scholar
  25. Xilinx, Inc. 2016. 7 Series FPGAs Configurable Logic Block v1.8, (UG474). https://www.xilinx.com/support/documentation/userguides/ug4747SeriesCLB.pdf.Google ScholarGoogle Scholar
  26. Xilinx, Inc. 2016. The PYNQ Project. http://www.pynq.io {Online; accessed 15-Jan-2017}.Google ScholarGoogle Scholar
  27. Xilinx, Inc. 2016. UltraScale Architecture and Product Overview (v3.1), DS890. https://www.xilinx.com/support/documentation/datasheets/ds890-ultrascale-overview.pdf.Google ScholarGoogle Scholar
  28. Xilinx, Inc. 2016. Zynq-7000 All Programmable SoC, Technical Reference Manual (UG585, v1.12.1). https://www.xilinx.com/support/documentation/userguides/ug585-Zynq-7000-TRM.pdf.Google ScholarGoogle Scholar
  29. Xilinx, Inc. 2016. SDAccel Development Environment Methodology Guide, Performance Optimization (UG1207, v2.0). https://www.xilinx.com/support/documentation/swmanuals/ug1207-sdaccel-performance-optimization.pdf. (August 2016). Ch. 7.Google ScholarGoogle Scholar
  30. Peter Yiannacouras, J. Gregory Steffan, and Jonathan Rose. 2009. Fine-grain performance scaling of soft vector processors. In Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES’09). ACM, New York, NY, 97--106. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. General-Purpose Computing with Soft GPUs on FPGAs

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            • Published in

              cover image ACM Transactions on Reconfigurable Technology and Systems
              ACM Transactions on Reconfigurable Technology and Systems  Volume 11, Issue 1
              Special Section on FCCM 2016 and Regular Papers
              March 2018
              183 pages
              ISSN:1936-7406
              EISSN:1936-7414
              DOI:10.1145/3178391
              • Editor:
              • Steve Wilton
              Issue’s Table of Contents

              Copyright © 2018 ACM

              Publisher

              Association for Computing Machinery

              New York, NY, United States

              Publication History

              • Published: 24 January 2018
              • Accepted: 1 December 2017
              • Revised: 1 November 2017
              • Received: 1 June 2017
              Published in trets Volume 11, Issue 1

              Permissions

              Request permissions about this article.

              Request Permissions

              Check for updates

              Qualifiers

              • research-article
              • Research
              • Refereed

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!