Abstract
Graphics processing units (GPUs) are now widely used in embedded systems for manipulating computer graphics and even for general-purpose computation. However, many embedded systems have to manage highly restricted hardware resources in order to achieve high performance or energy efficiency. The number of registers is one of the common limiting factors in an embedded GPU design. Programs that run with a low number of registers may suffer from high register pressure if register allocation is not properly designed, especially on a GPU in which a register is divided into four elements and each element can be accessed separately, because allocating a register for a vector-type variable that does not contain values in all elements wastes register spaces. In this article, we present a vector-aware register allocation framework to improve register utilization on shader architectures. The framework involves two major components: (1) element-based register allocation that allocates registers based on the element requirement of variables and (2) register packing that rearranges elements of registers in order to increase the number of contiguous free elements, thereby keeping more live variables in registers. Experimental results on a cycle-approximate simulator showed that the proposed framework decreased 92% of register spills in total and made 91.7% of 14 common shader programs spill free. These results indicate an opportunity for energy management of the space that is used for storing spilled variables, with the framework improving the performance by a geometric mean of 8.3%, 16.3%, and 29.2% for general shader processors in which variables are spilled to memory with 5-, 10-, and 20-cycle access latencies, respectively. Furthermore, the reduction in the register requirement of programs enabled another 11 programs with high register pressure to be runnable on a lightweight GPU.
- Advanced Micro Devices, Inc. 2008. RenderMonkey Toolsuite. (2008). http://developer.amd.com/tools-and-sdks/archive/legacy-cpu-gpu-tools/rendermonkey-toolsuite/.Google Scholar
- Andrew W. Appel and Jens Palsberg. 2002. Modern Compiler Implementation in Java (2nd ed.). Cambridge University Press. Google Scholar
Digital Library
- ARM Ltd. 2010. Cortex-A8 Technical Reference Manual. (2010).Google Scholar
- ATI Technologies, Inc. 2005. Radeon X1800 Shader Architecture: Technology White Paper. (2005).Google Scholar
- Gregory Chaitin. 1982. Register allocation & spilling via graph coloring. In Proc. SIGPLAN Symp. on Compiler Construction (CC’’82). 98--105. Google Scholar
Digital Library
- Chia-Ming Chang, Yu-Jung Chen, Yen-Chang Lu, Chun-Yi Lin, Liang-Gee Chen, and Shao-Yi Chien. 2011. A 172.6mW 43.8GFLOPS energy-efficient scalable eight-core 3D graphics processor for mobile multimedia applications. In Proc. IEEE Asian Solid-State Circuits Conf. (A-SSCC’11). 405--408.Google Scholar
Cross Ref
- Shao-Yi Chien, You-Ming Tsao, Chin-Hsiang Chang, and Yu-Cheng Lin. 2008. An 8.6mW 25Mvertices/s 400-MFLOPS 800-MOPS 8.91mm2 multimedia stream processor core for mobile applications. IEEE J. Solid-State Circ. 43, 9 (2008), 2025--2035.Google Scholar
Cross Ref
- Martin Christen. 2007. ClockworkCoders Tutorials—Loading, Compiling, Linking, and Using GLSL Programs. (2007). http://www.opengl.org/sdk/docs/tutorials/ClockworkCoders/loading.php.Google Scholar
- Imagination Technologies, Inc. 2009. PowerVR Technology Overview. (May 2009).Google Scholar
- Intel Corp. 2011. Intel Core i7-900 Desktop Processor Extreme Edition Series and Intel Core i7-900 Desktop Processor Series on 32-nm Process Specification. (May 2011).Google Scholar
- Bruce Jacob, Spencer Ng, and David Wang. 2007. Memory Systems: Cache, DRAM, Disk (1st ed.). Morgan Kaufmann. Google Scholar
Digital Library
- Pol Jeremias and Iñigo Quilez. 2013. Shadertoy: Live coding for reactive shaders. In ACM SIGGRAPH 2013 Computer Animation Festival. 1--1. https://www.shadertoy.com/. Google Scholar
Digital Library
- Khronos Group, Inc. 2009. The OpenGL ES Shading Language. (May 2009).Google Scholar
- Emmett Kilgariff and Randima Fernando. 2005. The GeForce 6 series GPU architecture. In ACM SIGGRAPH 2005 Courses. Google Scholar
Digital Library
- Ming-Yung Ko. 2012. Personal communication with the authors. (June 2012).Google Scholar
- Ming-Yung Ko, I-Ting Lin, Shi-Yuan Lee, Zong-Hong Lyu, Chia-Ming Chang, and Yu-Jung Cheng. 2011. Cyclone—A GPU IP designed for embedded 3D games. In Proc. Conf. on Computer Vision, Graphics, and Image Processing (CVGIP’11).Google Scholar
- Bengu Li, Youtao Zhang, and Rajiv Gupta. 2004. Speculative subword register allocation in embedded processors. In Proc. Int. Workshop on Languages and Compilers for High Performance Computing (LCPC’04). 56--71. Google Scholar
Digital Library
- Aaftab Munshi and Jon Leech. 2010. OpenGL®ES Common Profile Specification, Version 2.0.25. (November 2010).Google Scholar
- NVIDIA Corp. 2010. GeForce Graphics Card User Manual. (2010).Google Scholar
- Fernando Magno Quintao Pereira and Jens Palsberg. 2008. Register allocation by puzzle solving. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI’08). 216--226. Google Scholar
Digital Library
- Massimiliano Poletto, Dawson R. Engler, and M. Frans Kaashoek. 1997. TCC: A system for fast, flexible, and high-level dynamic code generation. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI’97). 109--121. Google Scholar
Digital Library
- Massimiliano Poletto and Vivek Sarkar. 1999. Linear scan register allocation. ACM Trans. on Programming Languages and Syst. 21, 5 (September 1999), 895--913. Google Scholar
Digital Library
- Qualcomm, Inc. 2010. Most common shaders. (2010). https://developer.qualcomm.com/case-study/most-common-shaders.Google Scholar
- Qualcomm, Inc. 2011. Adreno Graphics and Tools. (September 2011).Google Scholar
- Qualcomm, Inc. 2013. Mobile Gaming & Graphics Optimization (Adreno) Tools and Resources. (2013). https://developer.qualcomm.com/mobile-development/mobile-technologies/gaming-graphics-optimization-adreno/tools-and-resources.Google Scholar
- Alfonse Reinheart. 2012. OpenGL Shading Language. (2012). http://www.opengl.org/wiki/OpenGL_Shading_Language.Google Scholar
- Henrik Rydgård. 2012. PPSSPP Project—A PSP Emulator. (2012). https://github.com/hrydgard/ppsspp/tree/master/assets/shaders.Google Scholar
- SPARC International, Inc. 1992. The SPARC Architecture Manual, Version 8. (1992). Google Scholar
Digital Library
- Synopsys, Inc. 2006. Design Compiler Technology Backgrounder. (April 2006).Google Scholar
- Sriraman Tallam and Rajiv Gupta. 2003. Bitwidth aware global register allocation. In Proc. ACM SIGPLAN-SIGACT Symp. on Principles of Programming Languages (POPL’03). 85--96. Google Scholar
Digital Library
- Omri Traub, Glenn Holloway, and Michael D. Smith. 1998. Quality and speed in linear-scan register allocation. In Proc. ACM SIGPLAN Conf. on Programming Language Design and Implementation (PLDI’98). 142--151. Google Scholar
Digital Library
- Vincent Pervasive Media Technologies, LLC. 2011. Vincent 3D Rendering Library—Open Source Graphics Libraries for Mobile and Embedded Devices. http://www.vincent3d.com/software/software.html. (2011).Google Scholar
- Yi-Ping You and Szu-Chieh Chen. 2015. Vector-aware register allocation for GPU shader processors. In Proc. Int. Conf. on Compilers, Architectures and Synthesis of Embedded Syst. (CASES’15). 99--108. Google Scholar
Digital Library
- Yi-Ping You and Yu-Shiuan Tsai. 2012. Compiler-assisted resource management for CUDA programs. In Proc. Workshop on Compilers for Parallel Computing (CPC’12).Google Scholar
- Yi-Ping You and Shen-Hong Wang. 2013. Energy-aware code motion for GPU shader processors. ACM Trans. Embedd. Comput. Syst. 13, 3, Article 49 (December 2013). Google Scholar
Digital Library
Index Terms
VecRA: A Vector-Aware Register Allocator for GPU Shader Processors
Recommendations
Vector-aware register allocation for GPU shader processors
CASES '15: Proceedings of the 2015 International Conference on Compilers, Architecture and Synthesis for Embedded SystemsGraphics processing units (GPUs) are now widely used in embedded systems for manipulating computer graphics and even for general-purpose computation. However, many embedded systems have to manage highly restricted hardware resources in order to achieve ...
CORF: Coalescing Operand Register File for GPUs
ASPLOS '19: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating SystemsThe Register File (RF) in GPUs is a critical structure that maintains the state for thousands of threads that support the GPU processing model. The RF organization substantially affects the overall performance and the energy efficiency of a GPU. For ...
Allocating architected registers through differential encoding
Micro-architecture designers are very cautious about expanding the number of architected and exposed registers in the instruction set because increasing the register field adds to the code size, raises the I-cache and memory pressure, and may complicate ...






Comments