skip to main content
research-article

VecRA: A Vector-Aware Register Allocator for GPU Shader Processors

Published:01 September 2016Publication History
Skip Abstract Section

Abstract

Graphics processing units (GPUs) are now widely used in embedded systems for manipulating computer graphics and even for general-purpose computation. However, many embedded systems have to manage highly restricted hardware resources in order to achieve high performance or energy efficiency. The number of registers is one of the common limiting factors in an embedded GPU design. Programs that run with a low number of registers may suffer from high register pressure if register allocation is not properly designed, especially on a GPU in which a register is divided into four elements and each element can be accessed separately, because allocating a register for a vector-type variable that does not contain values in all elements wastes register spaces. In this article, we present a vector-aware register allocation framework to improve register utilization on shader architectures. The framework involves two major components: (1) element-based register allocation that allocates registers based on the element requirement of variables and (2) register packing that rearranges elements of registers in order to increase the number of contiguous free elements, thereby keeping more live variables in registers. Experimental results on a cycle-approximate simulator showed that the proposed framework decreased 92% of register spills in total and made 91.7% of 14 common shader programs spill free. These results indicate an opportunity for energy management of the space that is used for storing spilled variables, with the framework improving the performance by a geometric mean of 8.3%, 16.3%, and 29.2% for general shader processors in which variables are spilled to memory with 5-, 10-, and 20-cycle access latencies, respectively. Furthermore, the reduction in the register requirement of programs enabled another 11 programs with high register pressure to be runnable on a lightweight GPU.

References

  1. Advanced Micro Devices, Inc. 2008. RenderMonkey Toolsuite. (2008). http://developer.amd.com/tools-and-sdks/archive/legacy-cpu-gpu-tools/rendermonkey-toolsuite/.Google ScholarGoogle Scholar
  2. Andrew W. Appel and Jens Palsberg. 2002. Modern Compiler Implementation in Java (2nd ed.). Cambridge University Press. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ARM Ltd. 2010. Cortex-A8 Technical Reference Manual. (2010).Google ScholarGoogle Scholar
  4. ATI Technologies, Inc. 2005. Radeon X1800 Shader Architecture: Technology White Paper. (2005).Google ScholarGoogle Scholar
  5. Gregory Chaitin. 1982. Register allocation & spilling via graph coloring. In Proc. SIGPLAN Symp. on Compiler Construction (CC’’82). 98--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chia-Ming Chang, Yu-Jung Chen, Yen-Chang Lu, Chun-Yi Lin, Liang-Gee Chen, and Shao-Yi Chien. 2011. A 172.6mW 43.8GFLOPS energy-efficient scalable eight-core 3D graphics processor for mobile multimedia applications. In Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC’11). 405--408.Google ScholarGoogle ScholarCross RefCross Ref
  7. Shao-Yi Chien, You-Ming Tsao, Chin-Hsiang Chang, and Yu-Cheng Lin. 2008. An 8.6mW 25Mvertices/s 400-MFLOPS 800-MOPS 8.91mm2 multimedia stream processor core for mobile applications. IEEE J. Solid-State Circ. 43, 9 (2008), 2025--2035.Google ScholarGoogle ScholarCross RefCross Ref
  8. Martin Christen. 2007. ClockworkCoders Tutorials—Loading, Compiling, Linking, and Using GLSL Programs. (2007). http://www.opengl.org/sdk/docs/tutorials/ClockworkCoders/loading.php.Google ScholarGoogle Scholar
  9. Imagination Technologies, Inc. 2009. PowerVR Technology Overview. (May 2009).Google ScholarGoogle Scholar
  10. Intel Corp. 2011. Intel Core i7-900 Desktop Processor Extreme Edition Series and Intel Core i7-900 Desktop Processor Series on 32-nm Process Specification. (May 2011).Google ScholarGoogle Scholar
  11. Bruce Jacob, Spencer Ng, and David Wang. 2007. Memory Systems: Cache, DRAM, Disk (1st ed.). Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Pol Jeremias and Iñigo Quilez. 2013. Shadertoy: Live coding for reactive shaders. In ACM SIGGRAPH 2013 Computer Animation Festival. 1--1. https://www.shadertoy.com/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Khronos Group, Inc. 2009. The OpenGL ES Shading Language. (May 2009).Google ScholarGoogle Scholar
  14. Emmett Kilgariff and Randima Fernando. 2005. The GeForce 6 series GPU architecture. In ACM SIGGRAPH 2005 Courses. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Ming-Yung Ko. 2012. Personal communication with the authors. (June 2012).Google ScholarGoogle Scholar
  16. Ming-Yung Ko, I-Ting Lin, Shi-Yuan Lee, Zong-Hong Lyu, Chia-Ming Chang, and Yu-Jung Cheng. 2011. Cyclone—A GPU IP designed for embedded 3D games. In Proc. Conf. on Computer Vision, Graphics, and Image Processing (CVGIP’11).Google ScholarGoogle Scholar
  17. Bengu Li, Youtao Zhang, and Rajiv Gupta. 2004. Speculative subword register allocation in embedded processors. In Proc. Int. Workshop on Languages and Compilers for High Performance Computing (LCPC’04). 56--71. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Aaftab Munshi and Jon Leech. 2010. OpenGL®ES Common Profile Specification, Version 2.0.25. (November 2010).Google ScholarGoogle Scholar
  19. NVIDIA Corp. 2010. GeForce Graphics Card User Manual. (2010).Google ScholarGoogle Scholar
  20. Fernando Magno Quintao Pereira and Jens Palsberg. 2008. Register allocation by puzzle solving. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI’08). 216--226. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Massimiliano Poletto, Dawson R. Engler, and M. Frans Kaashoek. 1997. TCC: A system for fast, flexible, and high-level dynamic code generation. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI’97). 109--121. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Massimiliano Poletto and Vivek Sarkar. 1999. Linear scan register allocation. ACM Trans. on Programming Languages and Syst. 21, 5 (September 1999), 895--913. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Qualcomm, Inc. 2010. Most common shaders. (2010). https://developer.qualcomm.com/case-study/most-common-shaders.Google ScholarGoogle Scholar
  24. Qualcomm, Inc. 2011. Adreno Graphics and Tools. (September 2011).Google ScholarGoogle Scholar
  25. Qualcomm, Inc. 2013. Mobile Gaming & Graphics Optimization (Adreno) Tools and Resources. (2013). https://developer.qualcomm.com/mobile-development/mobile-technologies/gaming-graphics-optimization-adreno/tools-and-resources.Google ScholarGoogle Scholar
  26. Alfonse Reinheart. 2012. OpenGL Shading Language. (2012). http://www.opengl.org/wiki/OpenGL_Shading_Language.Google ScholarGoogle Scholar
  27. Henrik Rydgård. 2012. PPSSPP Project—A PSP Emulator. (2012). https://github.com/hrydgard/ppsspp/tree/master/assets/shaders.Google ScholarGoogle Scholar
  28. SPARC International, Inc. 1992. The SPARC Architecture Manual, Version 8. (1992). Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Synopsys, Inc. 2006. Design Compiler Technology Backgrounder. (April 2006).Google ScholarGoogle Scholar
  30. Sriraman Tallam and Rajiv Gupta. 2003. Bitwidth aware global register allocation. In Proc. ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages (POPL’03). 85--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Omri Traub, Glenn Holloway, and Michael D. Smith. 1998. Quality and speed in linear-scan register allocation. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI’98). 142--151. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Vincent Pervasive Media Technologies, LLC. 2011. Vincent 3D Rendering Library—Open Source Graphics Libraries for Mobile and Embedded Devices. http://www.vincent3d.com/software/software.html. (2011).Google ScholarGoogle Scholar
  33. Yi-Ping You and Szu-Chieh Chen. 2015. Vector-aware register allocation for GPU shader processors. In Proc. Int. Conf. on Compilers, Architectures and Synthesis of Embedded Syst. (CASES’15). 99--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Yi-Ping You and Yu-Shiuan Tsai. 2012. Compiler-assisted resource management for CUDA programs. In Proc. Workshop on Compilers for Parallel Computing (CPC’12).Google ScholarGoogle Scholar
  35. Yi-Ping You and Shen-Hong Wang. 2013. Energy-aware code motion for GPU shader processors. ACM Trans. Embedd. Comput. Syst. 13, 3, Article 49 (December 2013). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. VecRA: A Vector-Aware Register Allocator for GPU Shader Processors

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Article Metrics

        • Downloads (Last 12 months)2
        • Downloads (Last 6 weeks)0

        Other Metrics

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!