skip to main content
article
Free Access

Allocating architected registers through differential encoding

Published:01 April 2007Publication History
Skip Abstract Section

Abstract

Micro-architecture designers are very cautious about expanding the number of architected and exposed registers in the instruction set because increasing the register field adds to the code size, raises the I-cache and memory pressure, and may complicate the processor pipeline. Especially for low-end processors, encoding space could be extremely limited due to area and power considerations. On the other hand, the number of architected registers exposed to the compiler could directly affect the effectiveness of compiler analysis and optimization. For high-performance computers, register pressure can be higher than the available registers in some regions. This could be due to optimizations like aggressive function inlining, software pipelining, etc. The compiler cannot effectively perform compilation and optimization if only a small number of registers are exposed through the ISA. Therefore, it is crucial that more architected registers are available at the compiler's disposal, without expanding the code size significantly.

In this article, we devise a new register encoding scheme, called differential encoding, that allows more registers to be addressed in the operand field of instructions than the direct encoding currently being used. We show that this can be implemented with very low overhead. Based upon differential encoding, we apply it in several ways such that the extra architected registers can benefit the performance. Three schemes are devised to integrate differential encoding with register allocation. We demonstrate that differential register allocation is helpful in improving the performance of both high-end and low-end processors. Moreover, we can combine it with software pipelining to provide more registers and reduce spills.

Our results show that differential encoding significantly reduces the number of spills and speeds-up program execution. For a low-end configuration, we achieve over 14% speedup while keeping code size almost unaffected. For a high-end VLIW in-order machine, it can significantly speed-up loops with high register pressure (about 80% speedup) and the overall speedup is about 15%. Moreover, our scheme can be applied in an adaptive manner, making its overhead much smaller.

References

  1. Appel, A. W. and George, L. 2001. Optimal spilling for CISC machines with few registers. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 243--253. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. ARM Ltd. 2007. ARM TDMI datasheet. http://www.keil.com/product/brochures/rvmdk.pdf.Google ScholarGoogle Scholar
  3. Bartley, D. 1992. Optimizing stack frame access for processors with restricted addressing modes. Softw. Pract. Exper. 22, 2 (Feb.), 101--110. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Briggs, P., Cooper, K. D., and Torczon, L. 1994. Improvements to graph coloring register allocation. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI). ACM, New York.Google ScholarGoogle Scholar
  5. Burger, D. and Austin, T. 1997. The SimpleScalar tool set, version 2.0 Tech. Rep. No. 1342, Computer Sciences Department, University of Wisconsin-Madison.Google ScholarGoogle Scholar
  6. Chaitin, G. J., Auslander, M. A., Chandra, A. K., Cocke, J., Hopkins, M. E., and Markstein, P. W. 1981. Register allocation via coloring. Comput. Lang. 6, 1, 47--57.Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Cooper, K. D. and Harvey, T. J. 1998. Compiler-Controlled memory. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems. 2--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. George, L. 1999. Smlnj: Intel x86 back end compiler controlled memory. http://www.smlnj.org/compiler-notes/k32.ps.Google ScholarGoogle Scholar
  9. George, L. and Appel, A. W. 1996. Iterated register coalescing. ACM Trans. Program. Lang. Syst. 18, 3, 300--324. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Guthaus, M. R., Ringenberg, J. S., Ernst, D., Austin, T. M., Mudge, T., and Brown, R. B. 2001. Mibench: A free, commercially representative embedded benchmark suite. In IEEE 4th Annual Workshop on Workload Characterization. IEEE. Google ScholarGoogle ScholarCross RefCross Ref
  11. Intel Inc. 1998. SA-110 Microprocessor Technical Reference Manual. Intel, Santa Clara, CA.Google ScholarGoogle Scholar
  12. Kiyohara, T., Mahlke, S., Chen, W., Bringmann, R., Hank, R., Anik, S., and Hwu, W.-M. 1993. Register connection: A new approach to adding registers into instruction set architectures. In Proceedings of the 20th Annual International Symposium on Computer Architecture. 247--256. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Krishnaswamy, A. and Gupta, R. 2002. Profile guided selection of ARM and thumb instructions. In ACM SIGPLAN Joint Conference on Languages Compilers and Tools for Embedded Systems (LCTES). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Lam, M. S.-L. 1987. A Systolic Array Optimizing Compiler. Carnegie Mellon Pittsburgh, PA. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Lee, H.-H. S., Smelyanskiy, M., Tyson, G. S., and Newburn, C. J. 2001. Stack value file: Custom microarchitecture for the stack. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA). Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Liao, S., Devadas, S., Keutzer, K., Tjiang, S., and Wang, A. 1995. Storage assignment to decrease code size. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation (PLDI). ACM, New York. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Llosa, J., Valero, M., and Ayguadé, E. 1996. Heuristics for register-constrained software pipelining. In Proceedings of the 29th Annual ACM/IEEE International Symposium on Microarchitecture. 250--261. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. MIPS Technologies. 2001. MIPS32 Architecture for Programmers, volume IV-a: The MIPS16 Application Specific Extension to the MIPS32 Architecture. MIPS Technologies.Google ScholarGoogle Scholar
  19. Motorola Inc. 2000. Motorola DSP56300 Family Manual, revision 3.0. Motorola, Phoenix, AZ.Google ScholarGoogle Scholar
  20. Özer, E., Banerjia, S., and Conte, T. M. 1998. Unified assign and schedule: A new approach to scheduling for clustered register file microarchitectures. In Proceedings of the 31st Annual ACM/IEEE International Symposium on Microarchitecture (MICRO). 308--315. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rau, B. R., Lee, M., Tirumalai, P. P., and Schlansker, M. S. 1992. Register allocation for software pipelined loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 283--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Ravindran, R. A., Senger, R. M., Marsman, E. D., Dasika, G. S., Guthaus, M. R., Mahlke, S. A., and Brown, R. B. 2003. Increasing the number of effective registers in a low-power processor using a windowed register file. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems. 125--136. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Ruttenberg, J., Gao, G. R., Stoutchinin, A., and Lichtenstein, W. 1996. Software pipelining showdown: Optimal vs. heuristic methods in a production compiler. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Segars, S. 2001. Low power design techniques for micro-processors. In Tutorial on IEEE International Solid-State Circuits Conference (ISSCC).Google ScholarGoogle Scholar
  25. Wang, J., Krall, A., Ertl, M. A., and Eisenbeis, C. 1994. Software pipelining with register allocation and spilling. In Proceedings of the 27th Annual International Symposium on Microarchitecture. 95--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Zalamea, J., Llosa, J., Ayguadé, E., and Valero, M. 2000a. Improved spill code generation for software pipelined loops. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. 134--144. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Zalamea, J., Llosa, J., Ayguadé, E., and Valero, M. 2000b. Two-Level hierarchical register file organization for vliw processors. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture. 137--146. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Zhuang, X. and Pande, S. 2005. Differential register allocation. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation. to appear. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Zhuang, X., Zhang, T., and Pande, S. 2004. Hardware-Managed register allocation for embedded processors. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools. 192--201. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Allocating architected registers through differential encoding

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!