Abstract
As technology has advanced, the application space of Very Long Instruction Word (VLIW) processors has grown to include a variety of embedded platforms. Due to cost and power consumption constraints, many embedded VLIW processors contain limited resources, including registers. As a result, a VLIW compiler that maximizes instruction level parallelism (ILP) without considering register constraints may generate excessive register spills, leading to reduced overall system performance. To address this issue, this article presents a new spill reduction technique that improves VLIW runtime performance by reordering operations prior to register allocation and instruction scheduling. Unlike earlier algorithms, our approach explicitly considers both register reduction and data dependency in performing operation reordering. Data dependency control limits unexpected schedule length increases during subsequent instruction scheduling. Our technique has been evaluated using Trimaran, an academic VLIW compiler, and evaluated using a set of embedded systems benchmarks. Experimental results show that, on average, this technique improves VLIW performance by 10% for VLIW processors with 32 registers and 8 functional units compared with previous spill reduction techniques. Limited improvement is seen versus prior approaches for VLIW processors with 64 registers and 8 functional units.
- Berson, D. A., Gupta, R., and Soffa, M. L. 1993. URSA: A unified resource allocator for registers and functional units in VLIW architectures. In Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. Springer, Berlin, 243--254. Google Scholar
Digital Library
- Berson, D. A., Gupta, R., and Soffa, M. L. 1998. Integrated instruction scheduling and register allocation techniques. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, 247--262. Google Scholar
Digital Library
- Bouchez, F., Darte, A., and Rastello, F. 2007. On the complexity of spill everywhere under SSA form. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, 103--112. Google Scholar
Digital Library
- Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Department of Computer Science, Rice University. Google Scholar
Digital Library
- Briggs, P., Cooper, K., Kennedy, K., and Torczon, L. 1989. Coloring heuristics for register allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 275--284. Google Scholar
Digital Library
- Chaitin, G. 1982. Register allocation and spilling via graph Coloring. In Proceedings of the ACM SIGPLAN Symposium on Compiler Construction. ACM, New York, 98--105. Google Scholar
Digital Library
- Chakrapani, L. N., Gyllenhaal, J., Hwu, W. W., Mahlke, S. A., Palem, K. V., and Rabbah, R. M. 2004. Trimaran, an infrastructure for research in instruction level parallelism. In Proceedings of the International Workshop on Languages and Compilers for High-Performance Computing. ACM, New York, 32--41. Google Scholar
Digital Library
- Cilio, A. and Corporaal, H. 1999. Global program optimization: Register allocation of static scalar objects. In Proceedings of the Conference of the Advanced School for Computing and Imaging. 52--57.Google Scholar
- Cormen, T. H., Leiserson, C. E., and Rivest, R. L. 1990. Introduction to Algorithms. McGraw-Hill, New York. Google Scholar
Digital Library
- Dilworth, R. P. 1950. A decomposition theorem for partially ordered sets. Ann. Math. 51, 1, 161--166.Google Scholar
Cross Ref
- Faraboschi, P., Brown, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the International Symposium on Microarchitecture. ACM, New York, 203--213. Google Scholar
Digital Library
- Freescale Semiconductor, Inc. 2005. MSC8101 Reference Manual. Freescale Semiconductor, Inc. http://www.datasheetcatalog.org/datasheets2/17/1767447_1.pdfGoogle Scholar
- Freudenberger, S. M. and Ruttenberg, J. C. 1991. Phase ordering of register allocation and instruction scheduling. In Proceedings of the International Workshop on Code Generation. ACM, New York, 146--172.Google Scholar
- Goodman, J. R. and Hsu, W.-C. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the ACM Super-Computing Conference. ACM, New York, 442--452. Google Scholar
Digital Library
- Goossens, G., Praet, J. V., Lanneer, D., and Geurts, W. 1997. Embedded software in real-time signal processing systems: Design technologies. Proc. IEEE 85, 3, 436--454.Google Scholar
Cross Ref
- Govindarajan, R., Yang, H., Amaral, J. N., Zhang, C., and Gao, G. R. 2003. Minimum register instruction sequencing to reduce register spills in out-of-order issue super-scalar architectures. IEEE Trans. Comput. 52, 1, 4--20. Google Scholar
Digital Library
- Hennessy, J. L. and Patterson, D. A. 1996. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA. Google Scholar
Digital Library
- Kim, H. 2001. Region-based register allocation for EPIC architectures. Ph.D. thesis, Department of Computer Science, New York University. Google Scholar
Digital Library
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture. ACM, New York, 330--335. Google Scholar
Digital Library
- Marquardt, A., Betz, V., and Rose, J. 2000. Timing-driven placement for FPGAs. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays. ACM, New York, 203--213. Google Scholar
Digital Library
- Norris, C. and Pollock, L. L. 1993. A scheduler-sensitive global register allocator. In Proceedings of the ACM Super-Computing Conference. ACM, New York, 804--813. Google Scholar
Digital Library
- Pinter, S. S. 1993. Register allocation with instruction scheduling: A new approach. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 248--257. Google Scholar
Digital Library
- Texas Instruments, Inc. 2000. TMS320C6000 CPU and Instruction Set Reference Guide. Texas Instruments, Inc. http://focus.ti.com/lit/ug/spru189g/spru189g.pdfGoogle Scholar
- Touati, S.-A.-A. 2001. Register saturation in super-scalar and VLIW codes. In Proceedings of the International Conference on Compiler Construction. ACM, New York, 213--228. Google Scholar
Digital Library
- Touati, S.-A.-A. 2005. Register saturation in instruction level parallelism. Int. J. Parallel Program. 33, 4, 393--449. Google Scholar
Digital Library
- Transmeta, Inc. 2005. Transmeta Efficeon TM8820 Processor. Transmeta, Inc. http://datasheets.chipdb.org/Transmeta/pdfs/brochures/tmta_efficeon_tm8820.pdfGoogle Scholar
- Xu, W. and Tessier, R. 2007. Tetris: A new register pressure control technique for VLIW processors. In Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York. Google Scholar
Digital Library
- Zeitlhofer, T. and Wess, B. 2003. List-coloring of interval graphs with application to register assignment for heterogeneous register-set architectures. Signal Process. 83, 7, 1411--1425. Google Scholar
Digital Library
Index Terms
Tetris-XL: A performance-driven spill reduction technique for embedded VLIW processors
Recommendations
Tetris: a new register pressure control technique for VLIW processors
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsThe run-time performance of VLIW (very long instruction word) microprocessors depends heavily on the effectiveness of its associated optimizing compiler. Typical VLIW compiler phases include instruction scheduling, which maximizes instruction level ...
Tetris: a new register pressure control technique for VLIW processors
Proceedings of the 2007 LCTES conferenceThe run-time performance of VLIW (very long instruction word) microprocessors depends heavily on the effectiveness of its associated optimizing compiler. Typical VLIW compiler phases include instruction scheduling, which maximizes instruction level ...
Register saturation in instruction level parallelism
The registers constraints are usually taken into account during the scheduling pass of an acyclic data dependence graph (DAG): any schedule of the instructions inside a basic block must bound the register requirement under a certain limit. In this work, ...





Comments