Abstract
FPGA-SoC technology provides a heterogeneous platform for advanced, high-performance systems. The System on Chip (SoC) architecture combines traditional single and multiple core processor topologies with flexible FPGA fabric. Dynamic reconfiguration allows the hardware accelerators to be changed at run-time. This article presents a novel OpenGL compliant GPU design implemented on an FPGA. The design uses an FPGA-SoC environment allowing the embedded processor to offload graphics operation onto a more suitable architecture. To the authors’ knowledge, this is a first. The graphics processor consists of GLSL compliant shaders, an efficient Barycentric Rasterizer, and a draw mode manager. Performance analysis shows the throughput of the shaders to be hundreds of millions of vertices per second. The design uses both pipelining and resource reuse to optimise throughput and resource use, allowing implementation on a low-cost, FPGA device. Pixel processing rates from this implementation are almost 80% higher than other FPGA implementations. Power consumption compared with comparative embedded devices shows the FPGA consuming as little as 2% of the power of a Mali device, and an up to 11.9-fold increase in efficiency compared to an Nvidia RTX 2060 - Turing architecture device.
- Guanwen Zhong, Akshat Dubey, Cheng Tan, and Tulika Mitra. 2019. Synergy: An HW/SW framework for high throughput CNNs on embedded heterogeneous SoC. ACM Trans. Embed. Comput. Syst. 18, 2 (Mar. 2019). DOI:http://dx.doi.org/10.1145/3301278Google Scholar
Digital Library
- Alex Beasley, Luke Walker, and Chris Clarke. 2015. Developing and Implementing Dynamic Partial Reconfiguration for Pre-Emptible Context Switching and Continuous End-To-End Dataflow Applications. Retrieved from https://www.researchgate.net/publication/283725505_Developing_and_Implementing_Dynamic_Partial_Reconfiguration_for_Pre-Emptible_Context_Switching_and_Continuous_End-To-End_Dataflow_Applications.Google Scholar
- C. Tan, M. Karunaratne, T. Mitra, and L. Peh. 2018. Stitch: Fusible heterogeneous accelerators enmeshed with many-core architecture for wearables. In Proceedings of the ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA’18). 575--587. DOI:http://dx.doi.org/10.1109/ISCA.2018.00054Google Scholar
- O. Diessel. Opportunities and challenges for dynamic FPGA reconfiguration in electronic measurement and instrumentation. In Proceedings of the IEEE 11th International Conference on Electronic Measurement 8 Instruments, Vol. 1. 258--263. DOI:http://dx.doi.org/10.1109/ICEMI.2013.6743028Google Scholar
Cross Ref
- R. T. Gu, T. C. Yeh, W. S. Hunag, T. Y. Huang, C. H. Tsai, C. N. Lee, M. C. Chiang, S. F. Hsiao, Y. N. Chang, and I. J. Huang. A low cost tile-based 3D graphics full pipeline with real-time performance monitoring support for OpenGL ES in consumer electronics. In Proceedings of the IEEE International Symposium on Consumer Electronics. 1--6. DOI:http://dx.doi.org/10.1109/ISCE.2007.4382225Google Scholar
Cross Ref
- Kim Kyungsu, Lee Hoosung, Cho Seonghyun, and Park Seongmo. Implementation of 3D graphics accelerator using full pipeline scheme on FPGA. In Proceedings of the International SoC Design Conference, Vol. 02. II--97--II--100. DOI:http://dx.doi.org/10.1109/SOCDC.2008.4815693Google Scholar
Cross Ref
- P. Zemcik, A. Herout, L. Crha, O. Fucik, and P. Tupec. 2004. Particle rendering engine in DSP and FPGA. In Proceedings of the 11th IEEE International Conference and Workshop on the Engineering of Computer-Based Systems. 361--368.Google Scholar
- P. Jin, S. Yao, D. Li, L. Wang, and M. Zhang. 2012. Real-time multi-view rendering based on FPGA. In Proceedings of the International Conference on Systems and Informatics (ICSAI’12). 1981--1984.Google Scholar
- F. Guo, W. Wan, W. Zhang, and X. Feng. 2012. Research of graphics acceleration based on embedded system. In Proceedings of the International Conference on Audio, Language and Image Processing. 1120--1124.Google Scholar
- Xiangfei Li, X. Wang, and Rong Sun. 2013. Real-time 3D graphics for mobile devices on reconfigurable hardware. In Proceedings of the IET International Conference on Smart and Sustainable City (ICSSC’13). 471--475.Google Scholar
- Y. Liu. A novel mesa-based OpenGL implementation on an FPGA-based embedded system. In Proceedings of the International Conference on Audio, Language and Image Processing. 78--83. DOI:http://dx.doi.org/10.1109/ICALIP.2014.7009761Google Scholar
Cross Ref
- Intel. 2017. Stratix 10 SoC: Highest Performance and Most Power Efficient Processing. Retrieved from https://www.altera.com/products/soc/portfolio/stratix-10-soc/overview.html.Google Scholar
- N. Hu, X. Zhou, X. Li, and C. Wang. 2018. 3D waveform oscilloscope implemented on coupled FPGA-GPU embedded system. In Proceedings of the 5th International Conference on Information Science and Control Engineering (ICISCE’18). 1--5.Google Scholar
- M. Qasaimeh, J. Zambreno, P. H. Jones, K. Denolf, J. Lo, and K. Vissers. 2019. Analyzing the energy-efficiency of vision kernels on embedded CPU, GPU, and FPGA platforms. In Proceedings of the IEEE 27th Annual International Symposium on Field-programmable Custom Computing Machines (FCCM’19). 336--336.Google Scholar
- S. K. Rethinagiri, O. Palomar, J. A. Moreno, O. Unsal, and A. Cristal. 2015. An energy efficient hybrid FPGA-GPU based embedded platform to accelerate face recognition application. In Proceedings of the IEEE Symposium in Low-power and High-speed Chips (COOL CHIPS’15). 1--3.Google Scholar
- K. Jin, K. Lee, and G. Kim. 2017. High-speed FPGA-GPU processing for 3D-OCT imaging. In Proceedings of the 3rd IEEE International Conference on Computer and Communications (ICCC’17). 2085--2088.Google Scholar
- M. F. Tolba, A. H. Madian, and A. G. Radwan. 2016. FPGA realization of ALU for mobile GPU. In Proceedings of the 3rd International Conference on Advances in Computational Tools for Engineering Applications (ACTEA’16). 16--20.Google Scholar
- Graham Sellers, Richard S. Wright Jr, and Nicholas Haemel. 2016. OpenGL SuperBible (7th ed.). Addison-Wesley.Google Scholar
- Catalin Zima-Zegreanu. (2015). Crash Course in HLSL. Retrieved from http://www.catalinzima.com/xna/tutorials/crash-course-in-hlsl/.Google Scholar
- ARM IHI 0051A. 2010. AMBA 4 AXI4-Stream Protocol, Vol. 1.0. ARM, Cambridge, UK.Google Scholar
- John Kessenich, Graham Sellers, and Dave Shreiner. 2016. OpenGL Programming Guide (9th ed.). Addison-Wesley.Google Scholar
- Wei Tao, Chen Chang Wen, and Wang Changhu. Barycentric coordinates based soft assignment for object classification. In Proceedings of the IEEE International Conference on Multimedia 8 Expo Workshops (ICMEW’16). 1--6. DOI:http://dx.doi.org/10.1109/ICMEW.2016.7574755Google Scholar
Cross Ref
- Khronos Group. 2017. Shader Compilation.Retrieved from https://www.khronos.org/opengl/wiki/Shader_Compilation#Shader_and_program_objects.Google Scholar
- W. Chen, Y. Wang, X. Wang, and C. Peng. 2008. A new placement approach to minimizing FPGA reconfiguration data. In Proceedings of the International Conference on Embedded Software and Systems. 169--174.Google Scholar
- W. Lie and W. Feng-yan. 2009. Dynamic partial reconfiguration in FPGAs. In Proceedings of the 3rd International Symposium on Intelligent Information Technology Application, Vol. 2. 445--448.Google Scholar
- X. Di, S. Fazhuang, D. Zhantao, and H. Wei. 2012. A design flow for FPGA partial dynamic reconfiguration. In Proceedings of the 2nd International Conference on Instrumentation, Measurement, Computer, Communication and Control. 119--123.Google Scholar
- A. Kondelová and J. Äuntala. 2014. Time models of dynamic and static reconfiguration in FPGAs. In Proceedings of the 2014 ELEKTRO Conference. 451--454.Google Scholar
Cross Ref
- A. Hassan, H. Mostafa, H. A. H. Fahmy, and Y. Ismail. 2017. Exploiting the dynamic partial reconfiguration on NoC-based FPGA. In Proceedings of the New Generation of CAS (NGCAS’17). 277--280.Google Scholar
- Kizheppatt Vipin and Suhaib A. Fahmy. 2018. FPGA dynamic and partial reconfiguration: A survey of architectures, methods, and applications. ACM Comput. Surv. 51, 4 (July 2018). DOI:http://dx.doi.org/10.1145/3193827Google Scholar
Digital Library
- Altera. 2010. Increasing Design Functionality with Partial and Dynamic Reconfiguration in 28-nm FPGAs. Retrieved from https://www.altera.com/content/dam/altera-www/global/en_US/pdfs/literature/wp/wp-01137-stxv-dynamic-partial-reconFigurepdf.Google Scholar
- Kwok-Hay So. Hayden. 2006. Dynamic Reconfiguration of Xilinx FPGAs. Retrieved from https://www.xilinx.com/univ/FPL06_Invited_Presentation_PLysaght.pdf.Google Scholar
- A. L. Petrescu, F. Moldoveanu, V. Asavei, and A. Moldoveanu. Virtual deferred rendering. In Proceedings of the 20th International Conference on Control Systems and Computer Science. 373--378. DOI:http://dx.doi.org/10.1109/CSCS.2015.49Google Scholar
Cross Ref
- S. Schneegans, F. Lauer, A. C. Bernstein, A. Schollmeyer, and B. Froehlich. GuacamoleAn extensible scene graph and rendering framework based on deferred shading. In Proceedings of the IEEE 7th Workshop on Software Engineering and Architectures for Realtime Interactive Systems (SEARIS’14). 35--42. DOI:http://dx.doi.org/10.1109/SEARIS.2014.7152799Google Scholar
- Intel. 2017. Cyclone V SoCs: Lowest System Cost and Power. Retrieved from https://www.altera.com/products/soc/portfolio/cyclone-v-soc/overview.html.Google Scholar
- Intel. 2020. Stratix V FPGAs. Retrieved from https://www.intel.co.uk/content/www/uk/en/products/programmable/fpga/stratix-v.html.Google Scholar
- Intel. 2020. Intel Max 10 FPGA. Retrieved from https://www.intel.co.uk/content/www/uk/en/products/programmable/fpga/max-10.html.Google Scholar
- Intel. 2020. Intel Cyclone 10 GX FPGA. Retrieved from https://www.intel.co.uk/content/www/uk/en/products/programmable/fpga/cyclone-10/gx.html.Google Scholar
- TerASIC. 2017. DE1-SoC Board. Retrieved from http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English8CategoryNo=1658No=836.Google Scholar
- TerASIC. 2017. DE10-Standard. Retrieved from http://www.terasic.com.tw/cgi-bin/page/archive.pl?Language=English8CategoryNo=1658No=1081.Google Scholar
- Nvidia. 2017. Tegra. Retrieved from http://www.nvidia.com/object/tegra-k1-processor.html.Google Scholar
- Nvidia. 2017. GeForce GTX TITAN X. Retrieved from http://www.geforce.com/hardware/desktop-gpus/geforce-gtx-titan-x.Google Scholar
- Nvidia. 2020. GeForce GTX 1050. Retrieved from https://www.nvidia.com/en-in/geforce/products/10series/geforce-gtx-1050/.Google Scholar
- Nvidia. 2020. Nvidia GeForce Laptops. Retrieved from https://www.nvidia.com/en-gb/geforce/gaming-laptops/.Google Scholar
- ARM. 2017. Mali-400 Ultra Low Power GPU. Retrieved from https://developer.arm.com/products/graphics-and-multimedia/mali-gpus/mali-400-gpu.Google Scholar
- K. C. Kwan, X. Xu, L. Wan, T. T. Wong, and W. M. Pang. 2018. Packing vertex data into hardware-decompressible textures. IEEE Trans. Vis. Comput. Graph. PP, 99 (2018), 1--1. DOI:http://dx.doi.org/10.1109/TVCG.2017.2695182Google Scholar
- Nvidia. 2016. Nvidia Home. Retrieved from http://www.nvidia.co.uk/page/home.html.Google Scholar
Index Terms
An OpenGL Compliant Hardware Implementation of a Graphic Processing Unit Using Field Programmable Gate Array–System on Chip Technology
Recommendations
Reconfigurable Processing With Field Programmable Gate Arrays
ASAP '96: Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and ProcessorsIn-system-programmable, SRAM-based Field Programmable Gate Arrays (FPGAs) can be used to create processors and coprocessors whose internal architecture as well as interconnections can be reconfigured to match the needs of a given application. Exploiting ...
The Application of Moving Target Defense to Field Programmable Gate Arrays
CISRC '16: Proceedings of the 11th Annual Cyber and Information Security Research ConferenceField Programmable Gate Arrays (FPGAs) are powerful and flexible pieces of hardware used in a variety of applications. These chips are used in monitoring network traffic, guidance systems, cryptographic calculations, medical devices, embedded systems, ...






Comments