skip to main content
research-article
Free Access

Tetris-XL: A performance-driven spill reduction technique for embedded VLIW processors

Published:02 October 2009Publication History
Skip Abstract Section

Abstract

As technology has advanced, the application space of Very Long Instruction Word (VLIW) processors has grown to include a variety of embedded platforms. Due to cost and power consumption constraints, many embedded VLIW processors contain limited resources, including registers. As a result, a VLIW compiler that maximizes instruction level parallelism (ILP) without considering register constraints may generate excessive register spills, leading to reduced overall system performance. To address this issue, this article presents a new spill reduction technique that improves VLIW runtime performance by reordering operations prior to register allocation and instruction scheduling. Unlike earlier algorithms, our approach explicitly considers both register reduction and data dependency in performing operation reordering. Data dependency control limits unexpected schedule length increases during subsequent instruction scheduling. Our technique has been evaluated using Trimaran, an academic VLIW compiler, and evaluated using a set of embedded systems benchmarks. Experimental results show that, on average, this technique improves VLIW performance by 10% for VLIW processors with 32 registers and 8 functional units compared with previous spill reduction techniques. Limited improvement is seen versus prior approaches for VLIW processors with 64 registers and 8 functional units.

References

  1. Berson, D. A., Gupta, R., and Soffa, M. L. 1993. URSA: A unified resource allocator for registers and functional units in VLIW architectures. In Proceedings of the IFIP Working Conference on Architectures and Compilation Techniques for Fine and Medium Grain Parallelism. Springer, Berlin, 243--254. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Berson, D. A., Gupta, R., and Soffa, M. L. 1998. Integrated instruction scheduling and register allocation techniques. In Proceedings of the International Workshop on Languages and Compilers for Parallel Computing. Springer, Berlin, 247--262. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bouchez, F., Darte, A., and Rastello, F. 2007. On the complexity of spill everywhere under SSA form. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York, 103--112. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Briggs, P. 1992. Register allocation via graph coloring. Ph.D. thesis, Department of Computer Science, Rice University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Briggs, P., Cooper, K., Kennedy, K., and Torczon, L. 1989. Coloring heuristics for register allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 275--284. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Chaitin, G. 1982. Register allocation and spilling via graph Coloring. In Proceedings of the ACM SIGPLAN Symposium on Compiler Construction. ACM, New York, 98--105. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Chakrapani, L. N., Gyllenhaal, J., Hwu, W. W., Mahlke, S. A., Palem, K. V., and Rabbah, R. M. 2004. Trimaran, an infrastructure for research in instruction level parallelism. In Proceedings of the International Workshop on Languages and Compilers for High-Performance Computing. ACM, New York, 32--41. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Cilio, A. and Corporaal, H. 1999. Global program optimization: Register allocation of static scalar objects. In Proceedings of the Conference of the Advanced School for Computing and Imaging. 52--57.Google ScholarGoogle Scholar
  9. Cormen, T. H., Leiserson, C. E., and Rivest, R. L. 1990. Introduction to Algorithms. McGraw-Hill, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Dilworth, R. P. 1950. A decomposition theorem for partially ordered sets. Ann. Math. 51, 1, 161--166.Google ScholarGoogle ScholarCross RefCross Ref
  11. Faraboschi, P., Brown, G., Fisher, J. A., Desoli, G., and Homewood, F. 2000. Lx: A technology platform for customizable VLIW embedded processing. In Proceedings of the International Symposium on Microarchitecture. ACM, New York, 203--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Freescale Semiconductor, Inc. 2005. MSC8101 Reference Manual. Freescale Semiconductor, Inc. http://www.datasheetcatalog.org/datasheets2/17/1767447_1.pdfGoogle ScholarGoogle Scholar
  13. Freudenberger, S. M. and Ruttenberg, J. C. 1991. Phase ordering of register allocation and instruction scheduling. In Proceedings of the International Workshop on Code Generation. ACM, New York, 146--172.Google ScholarGoogle Scholar
  14. Goodman, J. R. and Hsu, W.-C. 1988. Code scheduling and register allocation in large basic blocks. In Proceedings of the ACM Super-Computing Conference. ACM, New York, 442--452. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Goossens, G., Praet, J. V., Lanneer, D., and Geurts, W. 1997. Embedded software in real-time signal processing systems: Design technologies. Proc. IEEE 85, 3, 436--454.Google ScholarGoogle ScholarCross RefCross Ref
  16. Govindarajan, R., Yang, H., Amaral, J. N., Zhang, C., and Gao, G. R. 2003. Minimum register instruction sequencing to reduce register spills in out-of-order issue super-scalar architectures. IEEE Trans. Comput. 52, 1, 4--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hennessy, J. L. and Patterson, D. A. 1996. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Francisco, CA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Kim, H. 2001. Region-based register allocation for EPIC architectures. Ph.D. thesis, Department of Computer Science, New York University. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture. ACM, New York, 330--335. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Marquardt, A., Betz, V., and Rose, J. 2000. Timing-driven placement for FPGAs. In Proceedings of the ACM International Symposium on Field Programmable Gate Arrays. ACM, New York, 203--213. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Norris, C. and Pollock, L. L. 1993. A scheduler-sensitive global register allocator. In Proceedings of the ACM Super-Computing Conference. ACM, New York, 804--813. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Pinter, S. S. 1993. Register allocation with instruction scheduling: A new approach. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. ACM, New York, 248--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Texas Instruments, Inc. 2000. TMS320C6000 CPU and Instruction Set Reference Guide. Texas Instruments, Inc. http://focus.ti.com/lit/ug/spru189g/spru189g.pdfGoogle ScholarGoogle Scholar
  24. Touati, S.-A.-A. 2001. Register saturation in super-scalar and VLIW codes. In Proceedings of the International Conference on Compiler Construction. ACM, New York, 213--228. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Touati, S.-A.-A. 2005. Register saturation in instruction level parallelism. Int. J. Parallel Program. 33, 4, 393--449. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Transmeta, Inc. 2005. Transmeta Efficeon TM8820 Processor. Transmeta, Inc. http://datasheets.chipdb.org/Transmeta/pdfs/brochures/tmta_efficeon_tm8820.pdfGoogle ScholarGoogle Scholar
  27. Xu, W. and Tessier, R. 2007. Tetris: A new register pressure control technique for VLIW processors. In Proceedings of the ACM SIGPLAN Conference on Languages, Compilers, and Tools for Embedded Systems. ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zeitlhofer, T. and Wess, B. 2003. List-coloring of interval graphs with application to register assignment for heterogeneous register-set architectures. Signal Process. 83, 7, 1411--1425. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Tetris-XL: A performance-driven spill reduction technique for embedded VLIW processors

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 6, Issue 3
      September 2009
      114 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/1582710
      Issue’s Table of Contents

      Copyright © 2009 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 2 October 2009
      • Accepted: 1 March 2009
      • Revised: 1 September 2008
      • Received: 1 September 2007
      Published in taco Volume 6, Issue 3

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader