skip to main content
research-article

An OpenGL Compliant Hardware Implementation of a Graphic Processing Unit Using Field Programmable Gate Array–System on Chip Technology

Published:02 September 2020Publication History
Skip Abstract Section

Abstract

FPGA-SoC technology provides a heterogeneous platform for advanced, high-performance systems. The System on Chip (SoC) architecture combines traditional single and multiple core processor topologies with flexible FPGA fabric. Dynamic reconfiguration allows the hardware accelerators to be changed at run-time. This article presents a novel OpenGL compliant GPU design implemented on an FPGA. The design uses an FPGA-SoC environment allowing the embedded processor to offload graphics operation onto a more suitable architecture. To the authors’ knowledge, this is a first. The graphics processor consists of GLSL compliant shaders, an efficient Barycentric Rasterizer, and a draw mode manager. Performance analysis shows the throughput of the shaders to be hundreds of millions of vertices per second. The design uses both pipelining and resource reuse to optimise throughput and resource use, allowing implementation on a low-cost, FPGA device. Pixel processing rates from this implementation are almost 80% higher than other FPGA implementations. Power consumption compared with comparative embedded devices shows the FPGA consuming as little as 2% of the power of a Mali device, and an up to 11.9-fold increase in efficiency compared to an Nvidia RTX 2060 - Turing architecture device.

References

  1. Guanwen Zhong, Akshat Dubey, Cheng Tan, and Tulika Mitra. 2019. Synergy: An HW/SW framework for high throughput CNNs on embedded heterogeneous SoC. ACM Trans. Embed. Comput. Syst. 18, 2 (Mar. 2019). DOI:http://dx.doi.org/10.1145/3301278Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Alex Beasley, Luke Walker, and Chris Clarke. 2015. Developing and Implementing Dynamic Partial Reconfiguration for Pre-Emptible Context Switching and Continuous End-To-End Dataflow Applications. Retrieved from https://www.researchgate.net/publication/283725505_Developing_and_Implementing_Dynamic_Partial_Reconfiguration_for_Pre-Emptible_Context_Switching_and_Continuous_End-To-End_Dataflow_Applications.Google ScholarGoogle Scholar
  3. C. Tan, M. Karunaratne, T. Mitra, and L. Peh. 2018. Stitch: Fusible heterogeneous accelerators enmeshed with many-core architecture for wearables. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 575--587. DOI:http://dx.doi.org/10.1109/ISCA.2018.00054Google ScholarGoogle Scholar
  4. O. Diessel. Opportunities and challenges for dynamic FPGA reconfiguration in electronic measurement and instrumentation. In Proceedings of the IEEE 11th International Conference on Electronic Measurement 8 Instruments, Vol. 1. 258--263. DOI:http://dx.doi.org/10.1109/ICEMI.2013.6743028Google ScholarGoogle ScholarCross RefCross Ref
  5. R. T. Gu, T. C. Yeh, W. S. Hunag, T. Y. Huang, C. H. Tsai, C. N. Lee, M. C. Chiang, S. F. Hsiao, Y. N. Chang, and I. J. Huang. A low cost tile-based 3D graphics full pipeline with real-time performance monitoring support for OpenGL ES in consumer electronics. In Proceedings of the IEEE International Symposium on Consumer Electronics. 1--6. DOI:http://dx.doi.org/10.1109/ISCE.2007.4382225Google ScholarGoogle ScholarCross RefCross Ref
  6. Kim Kyungsu, Lee Hoosung, Cho Seonghyun, and Park Seongmo. Implementation of 3D graphics accelerator using full pipeline scheme on FPGA. In Proceedings of the International SoC Design Conference, Vol. 02. II--97--II--100. DOI:http://dx.doi.org/10.1109/SOCDC.2008.4815693Google ScholarGoogle ScholarCross RefCross Ref
  7. P. Zemcik, A. Herout, L. Crha, O. Fucik, and P. Tupec. 2004. Particle rendering engine in DSP and FPGA. In Proceedings of the 11th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems. 361--368.Google ScholarGoogle Scholar
  8. P. Jin, S. Yao, D. Li, L. Wang, and M. Zhang. 2012. Real-time multi-view rendering based on FPGA. In Proceedings of the International Conference on Systems and Informatics (ICSAI’12). 1981--1984.Google ScholarGoogle Scholar
  9. F. Guo, W. Wan, W. Zhang, and X. Feng. 2012. Research of graphics acceleration based on embedded system. In Proceedings of the International Conference on Audio, Language and Image Processing. 1120--1124.Google ScholarGoogle Scholar
  10. Xiangfei Li, X. Wang, and Rong Sun. 2013. Real-time 3D graphics for mobile devices on reconfigurable hardware. In Proceedings of the IET International Conference on Smart and Sustainable City (ICSSC’13). 471--475.Google ScholarGoogle Scholar
  11. Y. Liu. A novel mesa-based OpenGL implementation on an FPGA-based embedded system. In Proceedings of the International Conference on Audio, Language and Image Processing. 78--83. DOI:http://dx.doi.org/10.1109/ICALIP.2014.7009761Google ScholarGoogle ScholarCross RefCross Ref
  12. Intel. 2017. Stratix 10 SoC: Highest Performance and Most Power Efficient Processing. Retrieved from https://www.altera.com/products/soc/portfolio/stratix-10-soc/overview.html.Google ScholarGoogle Scholar
  13. N. Hu, X. Zhou, X. Li, and C. Wang. 2018. 3D waveform oscilloscope implemented on coupled FPGA-GPU embedded system. In Proceedings of the 5th International Conference on Information Science and Control Engineering (ICISCE’18). 1--5.Google ScholarGoogle Scholar
  14. M. Qasaimeh, J. Zambreno, P. H. Jones, K. Denolf, J. Lo, and K. Vissers. 2019. Analyzing the energy-efficiency of vision kernels on embedded CPU, GPU, and FPGA platforms. In Proceedings of the IEEE 27th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM’19). 336--336.Google ScholarGoogle Scholar
  15. S. K. Rethinagiri, O. Palomar, J. A. Moreno, O. Unsal, and A. Cristal. 2015. An energy efficient hybrid FPGA-GPU based embedded platform to accelerate face recognition application. In Proceedings of the IEEE Symposium in Low-power and High-speed Chips (COOL CHIPS’15). 1--3.Google ScholarGoogle Scholar
  16. K. Jin, K. Lee, and G. Kim. 2017. High-speed FPGA-GPU processing for 3D-OCT imaging. In Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC’17). 2085--2088.Google ScholarGoogle Scholar
  17. M. F. Tolba, A. H. Madian, and A. G. Radwan. 2016. FPGA realization of ALU for mobile GPU. In Proceedings of the 3rd International Conference on Advances in Computational Tools for Engineering Applications (ACTEA’16). 16--20.Google ScholarGoogle Scholar
  18. Graham Sellers, Richard S. Wright Jr, and Nicholas Haemel. 2016. OpenGL SuperBible (7th ed.). Addison-Wesley.Google ScholarGoogle Scholar
  19. Catalin Zima-Zegreanu. (2015). Crash Course in HLSL. Retrieved from http://www.catalinzima.com/xna/tutorials/crash-course-in-hlsl/.Google ScholarGoogle Scholar
  20. ARM IHI 0051A. 2010. AMBA 4 AXI4-Stream Protocol, Vol. 1.0. ARM, Cambridge, UK.Google ScholarGoogle Scholar
  21. John Kessenich, Graham Sellers, and Dave Shreiner. 2016. OpenGL Programming Guide (9th ed.). Addison-Wesley.Google ScholarGoogle Scholar
  22. Wei Tao, Chen Chang Wen, and Wang Changhu. Barycentric coordinates based soft assignment for object classification. In Proceedings of the IEEE International Conference on Multimedia 8 Expo Workshops (ICMEW’16). 1--6. DOI:http://dx.doi.org/10.1109/ICMEW.2016.7574755Google ScholarGoogle ScholarCross RefCross Ref
  23. Khronos Group. 2017. Shader Compilation.Retrieved from https://www.khronos.org/opengl/wiki/Shader_Compilation#Shader_and_program_objects.Google ScholarGoogle Scholar
  24. W. Chen, Y. Wang, X. Wang, and C. Peng. 2008. A new placement approach to minimizing FPGA reconfiguration data. In Proceedings of the International Conference on Embedded Software and Systems. 169--174.Google ScholarGoogle Scholar
  25. W. Lie and W. Feng-yan. 2009. Dynamic partial reconfiguration in FPGAs. In Proceedings of the 3rd International Symposium on Intelligent Information Technology Application, Vol. 2. 445--448.Google ScholarGoogle Scholar
  26. X. Di, S. Fazhuang, D. Zhantao, and H. Wei. 2012. A design flow for FPGA partial dynamic reconfiguration. In Proceedings of the 2nd International Conference on Instrumentation, Measurement, Computer, Communication and Control. 119--123.Google ScholarGoogle Scholar
  27. A. Kondelová and J. Äuntala. 2014. Time models of dynamic and static reconfiguration in FPGAs. In Proceedings of the 2014 ELEKTRO Conference. 451--454.Google ScholarGoogle ScholarCross RefCross Ref
  28. A. Hassan, H. Mostafa, H. A. H. Fahmy, and Y. Ismail. 2017. Exploiting the dynamic partial reconfiguration on NoC-based FPGA. In Proceedings of the New Generation of CAS (NGCAS’17). 277--280.Google ScholarGoogle Scholar
  29. Kizheppatt Vipin and Suhaib A. Fahmy. 2018. FPGA dynamic and partial reconfiguration: A survey of architectures, methods, and applications. ACM Comput. Surv. 51, 4 (July 2018). DOI:http://dx.doi.org/10.1145/3193827Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Altera. 2010. Increasing Design Functionality with Partial and Dynamic Reconfiguration in 28-nm FPGAs. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01137-stxv-dynamic-partial-reconFigurepdf.Google ScholarGoogle Scholar
  31. Kwok-Hay So. Hayden. 2006. Dynamic Reconfiguration of Xilinx FPGAs. Retrieved from https://www.xilinx.com/univ/FPL06_Invited_Presentation_PLysaght.pdf.Google ScholarGoogle Scholar
  32. A. L. Petrescu, F. Moldoveanu, V. Asavei, and A. Moldoveanu. Virtual deferred rendering. In Proceedings of the 20th International Conference on Control Systems and Computer Science. 373--378. DOI:http://dx.doi.org/10.1109/CSCS.2015.49Google ScholarGoogle ScholarCross RefCross Ref
  33. S. Schneegans, F. Lauer, A. C. Bernstein, A. Schollmeyer, and B. Froehlich. GuacamoleAn extensible scene graph and rendering framework based on deferred shading. In Proceedings of the IEEE 7th Workshop on Software Engineering and Architectures for Realtime Interactive Systems (SEARIS’14). 35--42. DOI:http://dx.doi.org/10.1109/SEARIS.2014.7152799Google ScholarGoogle Scholar
  34. Intel. 2017. Cyclone V SoCs: Lowest System Cost and Power. Retrieved from https://www.altera.com/products/soc/portfolio/cyclone-v-soc/overview.html.Google ScholarGoogle Scholar
  35. Intel. 2020. Stratix V FPGAs. Retrieved from https://www.intel.co.uk/content/www/uk/en/products/programmable/fpga/stratix-v.html.Google ScholarGoogle Scholar
  36. Intel. 2020. Intel Max 10 FPGA. Retrieved from https://www.intel.co.uk/content/www/uk/en/products/programmable/fpga/max-10.html.Google ScholarGoogle Scholar
  37. Intel. 2020. Intel Cyclone 10 GX FPGA. Retrieved from https://www.intel.co.uk/content/www/uk/en/products/programmable/fpga/cyclone-10/gx.html.Google ScholarGoogle Scholar
  38. TerASIC. 2017. DE1-SoC Board. Retrieved from http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English8CategoryNo=1658No=836.Google ScholarGoogle Scholar
  39. TerASIC. 2017. DE10-Standard. Retrieved from http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English8CategoryNo=1658No=1081.Google ScholarGoogle Scholar
  40. Nvidia. 2017. Tegra. Retrieved from http://www.nvidia.com/object/tegra-k1-processor.html.Google ScholarGoogle Scholar
  41. Nvidia. 2017. GeForce GTX TITAN X. Retrieved from http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x.Google ScholarGoogle Scholar
  42. Nvidia. 2020. GeForce GTX 1050. Retrieved from https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1050/.Google ScholarGoogle Scholar
  43. Nvidia. 2020. Nvidia GeForce Laptops. Retrieved from https://www.nvidia.com/en-gb/geforce/gaming-laptops/.Google ScholarGoogle Scholar
  44. ARM. 2017. Mali-400 Ultra Low Power GPU. Retrieved from https://developer.arm.com/products/graphics-and-multimedia/mali-gpus/mali-400-gpu.Google ScholarGoogle Scholar
  45. K. C. Kwan, X. Xu, L. Wan, T. T. Wong, and W. M. Pang. 2018. Packing vertex data into hardware-decompressible textures. IEEE Trans. Vis. Comput. Graph. PP, 99 (2018), 1--1. DOI:http://dx.doi.org/10.1109/TVCG.2017.2695182Google ScholarGoogle Scholar
  46. Nvidia. 2016. Nvidia Home. Retrieved from http://www.nvidia.co.uk/page/home.html.Google ScholarGoogle Scholar

Index Terms

  1. An OpenGL Compliant Hardware Implementation of a Graphic Processing Unit Using Field Programmable Gate Array–System on Chip Technology

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 14, Issue 1
          March 2021
          138 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/3418746
          • Editor:
          • Deming Chen
          Issue’s Table of Contents

          Copyright © 2020 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 2 September 2020
          • Accepted: 1 July 2020
          • Revised: 1 May 2020
          • Received: 1 February 2020
          Published in trets Volume 14, Issue 1

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format .

        View HTML Format
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!