ABSTRACT
A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all streams accessed must be communicated through the SRF (Stream Register File), a non-bypassing software-managed on-chip memory. Therefore, optimizing utilization of the SRF is crucial for good performance. The key insight is that the interference graphs formed by the streams in stream applications tend to be comparability graphs or decomposable into a set of multiple comparability graphs. We present a compiler algorithm that can find optimal or near-optimal colorings in stream IGs, thereby improving SRF utilization than the First-Fit
bin-packing algorithm, the best in the literature.
- Preston Briggs, Keith D. Cooper, and Linda Torczon. Improvements to graph coloring register allocation. ACM Transactions on Programming Languages and Systems, 16(3):428--455, 1994. Google Scholar
Digital Library
- G. J. Chaitin. Register allocation & spilling via graph coloring. In SIGPLAN '82: Proceedings of the 1982 SIGPLAN symposium on Compiler construction, pages 98--101. ACM Press, 1982. Google Scholar
Digital Library
- Fred C. Chow and John L. Hennessy. The priority-based coloring approach to register allocation. ACM Trans. Program. Lang. Syst.,12 (4):501--536, 1990. Google Scholar
Digital Library
- William J. Dally, Francois Labonte, Abhishek Das, Patrick Hanrahan, and Jung-Ho Ahn et al. Merrimac: Supercomputing with streams. In SC '03: Proceedings of the 2003 ACM/IEEE conference on Supercom-puting, page 35. IEEE Computer Society, 2003. Google Scholar
Digital Library
- Abhishek Das, William J. Dally, and Peter Mattson. Compiling for stream processing. In PACT '06: Proceedings of the 15th inter-national conference on Parallel architectures and compilation techniques, pages 33--42, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- Janet Fabri. Automatic storage optimization. SIGPLAN Not., 14(8): 83--91, 1979. ISSN 0362-1340. Google Scholar
Digital Library
- Lal George and Andrew W. Appel. Iterated register coalescing. ACM Trans. Program. Lang. Syst., 18(3):300--324, 1996. Google Scholar
Digital Library
- Jordan Gergov. Algorithms for compile-time memory optimization. In SODA '99: Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms, pages 907--908, Philadelphia, PA, USA, 1999. Society for Industrial and Applied Mathematics. Google Scholar
Digital Library
- Martin Charles Golumbic. Algorithmic Graph Theory and Perfect Graphs (Annals of Discrete Mathematics, Vol 57). North-Holland Publishing Co., Amsterdam, The Netherlands, The Netherlands, 2004. Google Scholar
Digital Library
- R. Govindarajan and S. Rengarajan. Buffer allocation in regular dataflow networks: An approach based on coloring circular-arc graphs. In HIPC '96: Proceedings of the Third International Conference on High-Performance Computing (HiPC '96), page 419, 1996. Google Scholar
Digital Library
- H. A. Kierstead. A polynomial time approximation algorithm for Discrete Math., 3):231--237, 1991. Google Scholar
Digital Library
- Francois Labonte, Peter Mattson, William Thies, Ian Buck, Christos Kozyrakis, and Mark Horowitz. The stream virtual machine. In PACT '04: Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, pages 267--277, 2004. Google Scholar
Digital Library
- Vincent Lefebvre and Paul Feautrier. Automatic storage management for parallel programs. Parallel Comput., 24(3-4):649--671, 1998. Google Scholar
Digital Library
- Lian Li, Lin Gao, and Jingling Xue. Memory coloring: A compiler approach for scratchpad memory management. In PACT '05: Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques, pages 329--338, 2005. Google Scholar
Digital Library
- Lian Li, Quan Hoang Nguyen, and Jingling Xue. Scratchpad allocation for data aggregates in superperfect graphs. In Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems, pages 207--216. ACM, 2007. Google Scholar
Digital Library
- Lian Li, Hui Feng, Quan Hoang Nguyen, Lin Gao, and Jingling Xue. Compiler-directed scratchpad memory management via graph coloring. ACM Transactions on Architecture and Code Optimization, 2009. To appear. Google Scholar
Digital Library
- John D. Owens. Computer Graphics on a Stream Architecture. PhD thesis, Stanford University, November 2002.Google Scholar
- John D. Owens, Ujval J. Kapasi, Peter Mattson, Brian Towles, Ben Serebrin, Scott Rixner, and William J. Dally. Media processing applications on the imagine stream processor. In Proceedings of the IEEE International Conference on Computer Design, pages 295--302, September 2002. Google Scholar
Digital Library
- Michael D. Smith, Norman Ramsey, and Glenn Holloway. A generalized algorithm for graph-coloring register allocation. In PLDI '04: Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation, pages 277--288. ACM, 2004. Google Scholar
Digital Library
- Michael Bedford Taylor and Jason Kim et al. The Raw microprocessor: A computational fabric for software circuits and general-purpose programs. IEEE Micro, 22(2):25--35, 2002. Google Scholar
Digital Library
- W. Thies, M. Karczmarek, M. Gordon, D. Maze, J. Wong, H. Ho, M. Brown, and S. Amarasinghe. StreamIt: A compiler for streaming applications, 2001. MIT-LCS Technical Memo TM-622.Google Scholar
- Li Wang, Xuejun Yang, Jingling Xue, Yu Deng, Xiaobo Yan, Tao Tang, and Quan Hoang Nguyen. Optimizing scientific application loops on stream processors. In LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems, pages 161--170. ACM, 2008. Google Scholar
Digital Library
- Samuel Williams, John Shalf, Leonid Oliker, Shoaib Kamil, Parry Husbands, and Katherine Yelick. The potential of the cell processor for scientific computing. In CF '06: Proceedings of the 3rd conference on Computing frontiers, pages 9--20, New York, NY, USA, 2006. ACM. Google Scholar
Digital Library
- Nan Wu, Mei Wen, Ju Ren, Yi He, and Chunyuan Zhang. Register allocation on stream processor with local register file. In ACSAC '06: Proceedings of the 11th Asia-Pacific Computer Systems Architecture Conference, pages 545--551, 2006. Google Scholar
Digital Library
- Xuejun Yang, Xiaobo Yan, Zuocheng Xing, Yu Deng, Jiang Jiang, and Ying Zhang. A 64-bit stream processor architecture for scientific applications. In ISCA '07: Proceedings of the 34th annual international symposium on Computer architecture, pages 210--219. ACM, 2007. Google Scholar
Digital Library
- Xuejun Yang, Ying Zhang, Jingling Xue, Ian Rogers, Gen Li, and Guibin Wang. Exploiting loop-dependent stream reuse for stream processors. In PACT '08: Proceedings of the 17th international conference on Parallel architectures and compilation techniques, pages 22--31, 2008. Google Scholar
Digital Library
Index Terms
Comparability graph coloring for optimizing utilization of stream register files in stream processors
Recommendations
Comparability graph coloring for optimizing utilization of stream register files in stream processors
PPoPP '09A stream processor executes an application that has been decomposed into a sequence of kernels that operate on streams of data elements. During the execution of a kernel, all streams accessed must be communicated through the SRF (Stream Register File), ...
Comparability Graph Coloring for Optimizing Utilization of Software-Managed Stream Register Files for Stream Processors
The stream processors represent a promising alternative to traditional cache-based general-purpose processors in achieving high performance in stream applications (media and some scientific applications). In a stream programming model for stream ...
Optimizing scientific application loops on stream processors
LCTES '08This paper describes a graph coloring compiler framework to allocate on-chip SRF(Stream Register File) storage for optimizing scientific applications on stream processors. Our framework consists of first applying enabling optimizations such as loop ...







Comments