ABSTRACT
The run-time performance of VLIW (very long instruction word) microprocessors depends heavily on the effectiveness of its associated optimizing compiler. Typical VLIW compiler phases include instruction scheduling, which maximizes instruction level parallelism (ILP), and register allocation, which minimizes data spills to external memory. If ILP is maximized without considering register constraints, high register pressure may result, leading to increased spill code and reduced run-time performance. In this paper, a new register pressure reduction technique for embedded VLIW processors is presented to control register pressure prior to instruction scheduling and register allocation. By modifying the relative ordering of operations, this technique restructures code to better reduce spills. Our technique has been implemented in Trimaran, an academic VLIW compiler, and evaluated using a series of VLIW benchmarks. Experimental results show that, on average, our algorithm reduces dynamic spills and improves overall cycle counts by 6% for a VLIW architecture with 8 functional units and 32 registers versus previous spill code reduction techniques.
- D. A. Berson, R. Gupta, and M. L. Soffa. URSA: A Unified ReSource Allocator for Registers and Functional Units in VLIW Architectures. In IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism, pages 243--254, Jan. 1993. Google Scholar
Digital Library
- D. A. Berson, R. Gupta, and M. L. Soffa. Integrated Instruction Scheduling and Register Allocation Techniques. In International Workshop on Languages and Compilers for Parallel Computing, pages 247--262, Aug. 1998. Google Scholar
Digital Library
- P. Briggs. Register Allocation via Graph Coloring. PhD thesis, Department of Computer Science, Rice University, Apr. 1992. Google Scholar
Digital Library
- P. Briggs, K. Cooper, K. Kennedy, and L. Torczon. Coloring Heuristics for Register Allocation. In ACM SIGPLAN Conference on Programming Language Design and Implementation, pages 275--284, June 1989. Google Scholar
Digital Library
- G. Chaitin. Register Allocation and Spilling via Graph Coloring. In ACM SIGPLAN Symposium on Compiler Construction, pages 98--105, June 1982. Google Scholar
Digital Library
- L. N. Chakrapani, J. Gyllenhaal, W. W. Hwu, S. A. Mahlke, K. V. Palem, and R. M. Rabbah. Trimaran, An Infrastructure for Research in Instruction Level Parallelism. In International Workshop on Languages and Compilers for High Performance Computing, pages 32--41, Sept. 2004.Google Scholar
- T. H. Cormen, C. E. Leiserson, and R. L. Rivest. Introduction to Algorithms. McGraw-Hill Book Company, 1990. Google Scholar
Digital Library
- R. P. Dilworth. A Decomposition Theorem for Partially Ordered Sets. Annals of Mathematics, 51(1):161--166, Jan. 1950.Google Scholar
Cross Ref
- Freescale Semiconductor, Inc. MSC8101 Reference Manual, 2005.Google Scholar
- S.M. Freudenberger and J. C. Ruttenberg. Phase Ordering of Register Allocation and Instruction Scheduling. In International Workshop on Code Generation, pages 146--172, May 1991.Google Scholar
- J. R. Goodman and W.-C. Hsu. Code scheduling and register allocation in large basic blocks. In ACM Supercomputing Conference, pages 442--452, July 1988. Google Scholar
Digital Library
- R. Govindarajan, H. Yang, J. N. Amaral, C. Zhang, and G. R. Gao. Minimum Register Instruction Sequencing to Reduce Register Spills in Out-of-Order Issue Superscalar Architectures. IEEE Transactions on Computers, 52(1):4--20, Jan. 2003. Google Scholar
Digital Library
- H. Kim. Region-based Register Allocation for EPIC Architectures. PhD thesis, Department of Computer Science, New York University, Jan. 2001. Google Scholar
Digital Library
- C. Lee, M. Potkonjak, and W. H. Mangione-Smith. MediaBench: A Tool for Evaluating and Synthesizing Multimedia and Communicatons Systems. In International Symposium on Microarchitecture, pages 330--335, June 1997. Google Scholar
Digital Library
- C. Norris and L. L. Pollock. A Scheduler-Sensitive Global Register Allocator. In ACM Supercomputing Conference, pages 804--813, July 1993. Google Scholar
Digital Library
- S. S. Pinter. Register Allocation with Instruction Scheduling: A New Approach. In ACMSIGPLAN Conference on Programming Language Design and Implementation, pages 248--257, June 1993. Google Scholar
Digital Library
- Texas Instruments, Inc. TMS320C6000 CPU and Instruction Set Reference Guide, 2000.Google Scholar
- S.-A.-A. Touati. Register Saturation in Superscalar and VLIWCodes. In International Conference on Compiler Construction, pages 213--228, Apr. 2001. Google Scholar
Digital Library
- S.-A.-A. Touati. Register Saturation in Instruction Level Parallelism. International Journal of Parallel Programming, 33(4):393--449, Aug. 2005. Google Scholar
Digital Library
- Transmeta, Inc. Transmeta Efficeon TM8820 Processor, 2005.Google Scholar
Index Terms
Tetris: a new register pressure control technique for VLIW processors
Recommendations
Tetris: a new register pressure control technique for VLIW processors
Proceedings of the 2007 LCTES conferenceThe run-time performance of VLIW (very long instruction word) microprocessors depends heavily on the effectiveness of its associated optimizing compiler. Typical VLIW compiler phases include instruction scheduling, which maximizes instruction level ...
Tetris-XL: A performance-driven spill reduction technique for embedded VLIW processors
As technology has advanced, the application space of Very Long Instruction Word (VLIW) processors has grown to include a variety of embedded platforms. Due to cost and power consumption constraints, many embedded VLIW processors contain limited ...
Tuning the continual flow pipeline architecture with virtual register renaming
Continual Flow Pipelines (CFPs) allow a processor core to process hundreds of in-flight instructions without increasing cycle-critical pipeline resources. When a load misses the data cache, CFP checkpoints the processor register state and then moves all ...






Comments