skip to main content
10.1145/3466752.3480128acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics

Published: 17 October 2021 Publication History

Abstract

The importance of open-source hardware and software has been increasing. However, despite GPUs being one of the more popular accelerators across various applications, there is very little open-source GPU infrastructure in the public domain. We argue that one of the reasons for the lack of open-source infrastructure for GPUs is rooted in the complexity of their ISA and software stacks. In this work, we first propose an ISA extension to RISC-V that supports GPGPUs and graphics. The main goal of the ISA extension proposal is to minimize the ISA changes so that the corresponding changes to the open-source ecosystem are also minimal, which makes for a sustainable development ecosystem. To demonstrate the feasibility of the minimally extended RISC-V ISA, we implemented the complete software and hardware stacks of Vortex on FPGA. Vortex is a PCIe-based soft GPU that supports OpenCL and OpenGL. Vortex can be used in a variety of applications, including machine learning, graph analytics, and graphics rendering. Vortex can scale up to 32 cores on an Altera Stratix 10 FPGA, delivering a peak performance of 25.6 GFlops at 200 Mhz.

References

[1]
Muhammed Al Kadi, Benedikt Janssen, and Michael Huebner. 2016. FGPU: An SIMT-architecture for FPGAs. In Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 254–263.
[2]
AMD. [n.d.]. RDNA 1.0 Instruction Set Architecture. https://developer.amd.com/wp-content/resources/RDNA_Shader_ISA.pdf.
[3]
AMD. [n.d.]. RDNA 1.0 Instruction Set Architecture. http://developer.amd.com/wordpress/media/2013/12/AMD_GCN3_Instruction_Set_Architecture_rev1.1.pdf.
[4]
Kevin Andryc, Murtaza Merchant, and Russell Tessier. 2013. FlexGrip: A soft GPGPU for FPGAs. In 2013 International Conference on Field-Programmable Technology (FPT). IEEE, 230–237.
[5]
Arvind. 2003. Bluespec: A Language for Hardware Design, Simulation, Synthesis and Verification Invited Talk. In Proceedings of the First ACM and IEEE International Conference on Formal Methods and Models for Co-Design(MEMOCODE ’03). IEEE Computer Society, Washington, DC, USA, 249–. http://dl.acm.org/citation.cfm?id=823453.823860
[6]
Krste Asanovic. [n.d.]. RISC-V Vector Extension. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc
[7]
Krste Asanović and David A Patterson. 2014. Instruction sets should be free: The case for risc-v. EECS Department, University of California, Berkeley, Tech. Rep. UCB/EECS-2014-146 (2014).
[8]
Mikhail Asiatici and Paolo Ienne. 2019. Stop crying over your cache miss rate: Handling efficiently thousands of outstanding misses in fpgas. In Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 310–319.
[9]
J. Bachrach, H. Vo, B. Richards, Y. Lee, A. Waterman, R. Avižienis, J. Wawrzynek, and K. Asanović. 2012. Chisel: Constructing hardware in a Scala embedded language. In Design Automation Conference (DAC), 2012 49th ACM/EDAC/IEEE. 1212–1221. https://doi.org/10.1145/2228360.2228584
[10]
Ali Bakhoda, George L Yuan, Wilson WL Fung, Henry Wong, and Tor M Aamodt. 2009. Analyzing CUDA workloads using a detailed GPU simulator. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 163–174.
[11]
Raghuraman Balasubramanian, Vinay Gangadhar, Ziliang Guo, Chen-Han Ho, Cherin Joseph, Jaikrishnan Menon, Mario Paulo Drumond, Robin Paul, Sharath Prasad, and Pradip Valathol. 2015. Miaow-an open source rtl implementation of a gpgpu. In 2015 IEEE Symposium in Low-Power and High-Speed Chips (COOL CHIPS XVIII). IEEE, 1–3.
[12]
Lars Bishop. 2006. OpenGL ES 1.1, 2.0 and EGL. In ACM SIGGRAPH 2006 Courses. 3–es.
[13]
Tine Blaise, Seyong Lee, Jeff Vetter, and Hyesoon Kim. 2021. Bringing OpenCL to Commodity RISC-V CPUs. In 2021 Workshop on RISC-V for Computer Architecture Research (CARRV).
[14]
Ian Bratt. 2015. The arm® mali-t880 mobile gpu. In 2015 IEEE Hot Chips 27 Symposium (HCS). IEEE, 1–27.
[15]
John Burgess. 2020. Rtx on—the nvidia turing gpu. IEEE Micro 40, 2 (2020), 36–44.
[16]
Jeff Bush, Philip Dexter, Timothy N Miller, and Aaron Carpenter. 2015. Nyami: a synthesizable GPU architectural model for general-purpose and graphics-specific workloads. In 2015 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 173–182.
[17]
Jeff Bush, Mohammad A Khasawneh, Khaled Z Mahmoud, and Timothy N Miller. 2016. NyuziRaster: Optimizing rasterizer performance and energy in the Nyuzi open source GPU. In 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). IEEE, 204–213.
[18]
Matheus A. Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, and Luca Benini. 2019. Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI. CoRR abs/1906.00478(2019). arXiv:1906.00478
[19]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang-Ha Lee, and Kevin Skadron. 2009. Rodinia: A benchmark suite for heterogeneous computing. In 2009 IEEE International Symposium on Workload Characterization (IISWC). 44–54.
[20]
Jongsok Choi, Kevin Nam, Andrew Canis, Jason Anderson, Stephen Brown, and Tomasz Czajkowski. 2012. Impact of cache architecture and interface on performance and area of FPGA-based processor/parallel-accelerator systems. In 2012 IEEE 20th International Symposium on Field-Programmable Custom Computing Machines. IEEE, 17–24.
[21]
Sylvain Collange. 2017. Simty: generalized SIMT execution on RISC-V. In First Workshop on Computer Architecture Research with RISC-V (CARRV 2017). 6.
[22]
Jordi Cortadella, Marc Galceran-Oms, and Mike Kishinevsky. 2010. Elastic systems. In Eighth ACM/IEEE International Conference on Formal Methods and Models for Codesign (MEMOCODE 2010). IEEE, 149–158.
[23]
Victor Moya Del Barrio, Carlos González, Jordi Roca, Agustín Fernández, and E Espasa. 2006. ATTILA: a cycle-level execution-driven simulator for modern GPU architectures. In 2006 IEEE International Symposium on Performance Analysis of Systems and Software. IEEE, 231–241.
[24]
Fares Elsabbagh, Blaise Tine, Priyadarshini Roshan, Ethan Lyons, Euna Kim, Da Eun Shim, Lingjun Zhu, Sung Kyu Lim, and Hyesoon Kim. 2020. Vortex: OpenCL Compatible RISC-V GPGPU. CoRR abs/2002.12151(2020). arXiv:2002.12151https://arxiv.org/abs/2002.12151
[25]
H. Esmaeilzadeh, E. Blem, R. S. Amant, K. Sankaralingam, and D. Burger. 2011. Dark silicon and the end of multicore scaling. In 2011 38th Annual International Symposium on Computer Architecture (ISCA). 365–376.
[26]
Jon P Ewins, Marcus D Waller, Martin White, and Paul F Lister. 1998. Mip-map level selection for texture mapping. IEEE Transactions on Visualization and Computer Graphics 4, 4(1998), 317–329.
[27]
Kayvon Fatahalian. [n.d.]. Lecture 15: Optimizing Data Access in the Graphics Pipeline. http://cs348k.stanford.edu/fall18/lecture/gfxmemory.
[28]
Wilson W. L. Fung, Ivan Sham, George Yuan, and Tor M. Aamodt. 2007. Dynamic Warp Formation and Scheduling for Efficient GPU Control Flow. IEEE Computer Society, 407–420. https://doi.org/10.1109/MICRO.2007.12
[29]
Google. 2019. Google Stadia. https://stadia.google.com/.
[30]
Green500. 2019. Green500 list - June 2019. https://www.top500.org/lists/2019/06/
[31]
Ayub A Gubran and Tor M Aamodt. 2019. Emerald: graphics modeling for SoC systems. In Proceedings of the 46th International Symposium on Computer Architecture. 169–182.
[32]
A. Gutierrez, B. M. Beckmann, A. Dutu, J. Gross, M. LeBeane, J. Kalamatianos, O. Kayiran, M. Poremba, B. Potter, S. Puthoor, M. D. Sinclair, M. Wyse, J. Yin, X. Zhang, A. Jain, and T. Rogers. 2018. Lost in Abstraction: Pitfalls of Analyzing GPUs at the Intermediate Language Level. In 2018 IEEE International Symposium on High Performance Computer Architecture (HPCA). 608–619. https://doi.org/10.1109/HPCA.2018.00058
[33]
Yuanjie Huang, Paolo Ienne, Olivier Temam, Yunji Chen, and Chengyong Wu. 2013. Elastic cgras. In Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays. 171–180.
[34]
Homan Igehy, Matthew Eldridge, and Kekoa Proudfoot. 1998. Prefetching in a texture cache architecture. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware. 133–ff.
[35]
Intel. [n.d.]. Intel Graphics Hardware Specifications. https://01.org/linuxgraphics/documentation/hardware-specification-prms.
[36]
Intel. 2018. the Open Programmable Acceleration Engine (OPAE). https://01.org/opae.
[37]
Pekka Jaaskelainen, Carlos Sanchez de La Lama, Erik Schnetter, Kalle Raiskila, Jarmo Takala, and Heikki Berg. 2015. POCL: Portable Computing Language. http://portablecl.org. International Journal of Parallel Programming (2015), 752–785.
[38]
Mohammad Reza Kakoee, Vladimir Petrovic, and Luca Benini. 2012. A multi-banked shared-l1 cache architecture for tightly coupled processor clusters. In 2012 International Symposium on System on Chip (SoC). IEEE, 1–5.
[39]
Michael Kenzel, Bernhard Kerbl, Wolfgang Tatzgern, Elena Ivanchenko, Dieter Schmalstieg, and Markus Steinberger. 2018. On-the-fly Vertex Reuse for Massively-Parallel Software Geometry Processing. PACMCGIT 1, 2 (2018), 28:1–28:17. https://doi.org/10.1145/3233303
[40]
Chad D. Kersey, Hyesoon Kim, and Sudhakar Yalamanchili. 2017. Lightweight SIMT Core Designs for Intelligent 3D Stacked DRAM. In Proceedings of the International Symposium on Memory Systems (Alexandria, Virginia) (MEMSYS ’17). ACM, 49–59. https://doi.org/10.1145/3132402.3132426
[41]
Hyesoon Kim, Jaekyu Lee, Nagesh B Lakshminarayana, Jaewoong Sim, Jieun Lim, and Tri Pho. 2012. Macsim: A cpu-gpu heterogeneous simulation framework user guide. Georgia Institute of Technology(2012).
[42]
Charles Eric LaForest and J Gregory Steffan. 2010. Efficient multi-ported memories for FPGAs. In Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays. 41–50.
[43]
Samuli Laine and Tero Karras. 2011. High-performance software rasterization on GPUs. In Proceedings of the ACM SIGGRAPH Symposium on High Performance Graphics. 79–88.
[44]
C. Lattner and V. Adve. 2004. LLVM: a compilation framework for lifelong program analysis amp; transformation. In International Symposium on Code Generation and Optimization, 2004. CGO 2004.75–86. https://doi.org/10.1109/CGO.2004.1281665
[45]
Y. Lee, A. Waterman, R. Avizienis, H. Cook, C. Sun, V. Stojanović, and K. Asanović. 2014. A 45nm 1.3GHz 16.7 double-precision GFLOPS/W RISC-V processor with vector accelerators. In ESSCIRC 2014 - 40th European Solid State Circuits Conference (ESSCIRC). 199–202. https://doi.org/10.1109/ESSCIRC.2014.6942056
[46]
Alexander Lier, Marc Stamminger, and Kai Selgrad. 2018. CPU-style SIMD ray traversal on GPUs. In HPG ’18.
[47]
LunarG. 2019. LunarGLASS Shader Compiler Stack. https://www.lunarg.com/.
[48]
Mike Mantor. 2012. AMD Radeon™ HD 7970 with graphics core next (GCN) architecture. In 2012 IEEE Hot Chips 24 Symposium (HCS). IEEE, 1–35.
[49]
Microsoft. 2019. Microsoft XCloud. https://www.xbox.com/en-US/xbox-game-streaming/project-xcloud/.
[50]
A. Munshi. 2009. The OpenCL specification. In 2009 IEEE Hot Chips 21 Symposium (HCS). 1–314. https://doi.org/10.1109/HOTCHIPS.2009.7478342
[51]
Veynu Narasiman, Michael Shebanow, Chang Joo Lee, Rustam Miftakhutdinov, Onur Mutlu, and Yale N. Patt. 2011. Improving GPU Performance via Large Warps and Two-level Warp Scheduling(MICRO-44). ACM, 308–317. https://doi.org/10.1145/2155620.2155656
[52]
NVIDIA. 2010. PTX: Parallel thread execution ISA version 2.3. http://developer.nvidia.com/compute/cuda.
[53]
Rafael T Possignolo, Elnaz Ebrahimi, Haven Skinner, and Jose Renau. 2016. FluidPipelines: Elastic circuitry without throughput penalty. In Logic Synthesis (IWLS), Proceedings of the 2016 International Workshop on.
[54]
Jason Power, Joel Hestness, Marc S Orr, Mark D Hill, and David A Wood. 2014. gem5-gpu: A heterogeneous cpu-gpu simulator. IEEE Computer Architecture Letters 14, 1 (2014), 34–36.
[55]
Kyle Roarty and Matthew D Sinclair. 2020. Modeling Modern GPU Applications in gem5. In gem5 Users Workshop.
[56]
Ben Sander and AMD SENIOR FELLOW. 2013. HSAIL: Portable compiler IR for HSA. In Hot Chips Symposium. 1–32.
[57]
Jason Sanders and Edward Kandrot. 2010. CUDA by example: an introduction to general-purpose GPU programming. Addison-Wesley Professional.
[58]
Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, 2008. Larrabee: a many-core x86 architecture for visual computing. ACM Transactions on Graphics (TOG) 27, 3 (2008), 1–15.
[59]
Wilson Snyder. [n.d.]. Verilator. https://www.veripool.org/wiki/verilator.
[60]
Rys Sommefeldt. 2015. A look at the PowerVR graphics architecture: Tile-based rendering.
[61]
Imagination Technologies. [n.d.]. PowerVR Instruction Set Reference. Rev 1.0. http://cdn.imgtec.com/sdk-documentation/PowerVR+Instruction+Set+Reference.pdf.
[62]
Blaise-Pascal Tine, Sudhakar Yalamanchili, and Hyesoon Kim. 2020. Tango: an optimizing compiler for Just-In-Time RTL simulation. In 2020 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 157–162.
[63]
R. Ubal, B. Jang, P. Mistry, D. Schaa, and D. Kaeli. 2012. Multi2Sim: A simulation framework for CPU-GPU computing. In 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT). 335–344.
[64]
Elena Vasiou, Konstantin Shkurko, Erik Brunvand, and Cem Yuksel. 2019. Mach-RT: A Many Chip Architecture for Ray Tracing. In High-Performance Graphics - Short Papers, Markus Steinberger and Tim Foley (Eds.). The Eurographics Association. https://doi.org/10.2312/hpg.20191188
[65]
Ingo Wald, Will Usher, Nate Morrical, Laura Lediaev, and Valerio Pascucci. 2019. RTX Beyond Ray Tracing: Exploring the Use of Hardware Ray Tracing Cores for Tet-Mesh Point Location. In High-Performance Graphics - Short Papers. https://doi.org/10.2312/hpg.20191189
[66]
Li-Yi Wei. 2004. Tile-based texture mapping on graphics hardware. In Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware. 55–63.
[67]
Mike Wissolik, Darren Zacher, Anthony Torza, and Brandon Da. 2017. Virtex UltraScale+ HBM FPGA: A revolutionary increase in memory performance. Xilinx Whitepaper (2017).
[68]
Hoi-Jun Yoo, Jeong-Ho Woo, Ju-Ho Sohn, and Byeong-Gyu Nam. 2010. Mobile 3D graphics SoC: From algorithm to chip. John Wiley & Sons.

Cited By

View all
  • (2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
  • (2024)RISC-V Custom Instructions of Elementary Functions for IoT Endpoint DevicesIEEE Transactions on Computers10.1109/TC.2023.333617473:2(523-535)Online publication date: Mar-2024
  • (2024)Towards a Qualifiable Space Cloud Approach2024 IEEE 10th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT61443.2024.00019(109-114)Online publication date: 15-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO '21: MICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture
October 2021
1322 pages
ISBN:9781450385572
DOI:10.1145/3466752
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. computer graphics
  2. memory systems.
  3. reconfigurable computing

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

  • NSF CCRI
  • NSF CNS

Conference

MICRO '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)503
  • Downloads (Last 6 weeks)49
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Breaking Barriers: Expanding GPU Memory with Sub-Two Digit Nanosecond Latency CXL ControllerProceedings of the 16th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3655038.3665953(108-115)Online publication date: 8-Jul-2024
  • (2024)RISC-V Custom Instructions of Elementary Functions for IoT Endpoint DevicesIEEE Transactions on Computers10.1109/TC.2023.333617473:2(523-535)Online publication date: Mar-2024
  • (2024)Towards a Qualifiable Space Cloud Approach2024 IEEE 10th International Conference on Space Mission Challenges for Information Technology (SMC-IT)10.1109/SMC-IT61443.2024.00019(109-114)Online publication date: 15-Jul-2024
  • (2024)Atomic Cache: Enabling Efficient Fine-Grained Synchronization with Relaxed Memory Consistency on GPGPUs Through In-Cache Atomic Operations2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00056(671-685)Online publication date: 2-Nov-2024
  • (2024)HyFiSS: A Hybrid Fidelity Stall-Aware Simulator for GPGPUs2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00022(168-185)Online publication date: 2-Nov-2024
  • (2024)DAW-DMR: Divergence-Aware Warped DMR with Full Error Detection for GPGPU s2024 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI61997.2024.00039(161-166)Online publication date: 1-Jul-2024
  • (2024)Comparative Analysis of Executing GPU Applications on FPGA: HLS vs. Soft GPU Approaches2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW63119.2024.00123(634-641)Online publication date: 27-May-2024
  • (2024)HeroSDK: Streamlining Heterogeneous RISC-V Accelerated Computing from Embedded to High-Performance Systems2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00050(280-287)Online publication date: 18-Nov-2024
  • (2024)Ventus: A High-performance Open-source GPGPU Based on RISC-V and Its Vector Extension2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00049(276-279)Online publication date: 18-Nov-2024
  • (2024)Advanced Dynamic Scalarisation for RISC-V GPGPUs2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00047(260-267)Online publication date: 18-Nov-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media