Abstract
Since register files suffer from some of the highest power densities within processors, designers have investigated several architectural strategies for register file power reduction, including "On Demand RF Read" where the register file is read only if the operand value is not available from the bypasses. However, we show in this paper that significant additional reductions in the register file power consumption can be obtained by scheduling instructions so that they transfer the operands via bypasses, rather than reading from the register file. Such instruction scheduling requires the compiler to be cognizant of the bypasses in the processor pipeline. In this paper, we develop several bypass aware instruction scheduling heuristics varying in time complexity, and study their effectiveness on the Intel XScale processor pipeline running MiBench benchmarks. Our experimental results show additional power consumption reductions of up to 26% and on average 12% over and above the register file power reduction achieved through existing techniques.
- J. L. Ayala, A. Veidenbaum, and M. López-Vallejo. Power-aware compilation for register file energy reduction. Int. J. Parallel Program., 31(6):451--467, 2003. Google Scholar
Digital Library
- A. Azevedo, I. Issenin, R. Cornea, R. Gupta, N. Dutt, A. Veidenbaum, and A. Nicolau. Profile-based dynamic voltage scheduling using program checkpoints in the copper framework, 2002. Google Scholar
Digital Library
- R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi. Reducing the complexity of the register file in dynamic superscalar processors. In MICRO 34: Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture, pages 237--248, Washington, DC, USA, 2001. IEEE Computer Society. Google Scholar
Digital Library
- J. Deeney. Thermal modeling and measurement of large high power silicon devices with asymmetric power distribution. In International Symposium on Microelectronics, 2002.Google Scholar
- A. Eichenberger and E. Davidson. Stage scheduling: A technique to reduce the register requirements of a modulo schedule. In Proceedings of MICRO, pages 338--349, 1995. Google Scholar
Digital Library
- D. R. Gonzales. Micro-RISC architecture for the wireless market. IEEE Micro, 19(4):30--37, 1999. Google Scholar
Digital Library
- S. H. Gunther, F. Binns, D. M. Carmean, and J. C. Hall. The impact of increasing microprocessor power consumption. In Intel Technology Journal, 2001.Google Scholar
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. MiBench: A free, commercially representative embedded benchmark suite. In IEEE Workshop in workload characterization, 2001. Google Scholar
Digital Library
- J. Hasan, A. Jalote, T. Vijaykumar, and C. Brodley. Heat stroke: Power-density-based denial of service in smt. In In Proceedings of International Symposium on High-Performance Computer Architecture, 2005. Google Scholar
Digital Library
- http://www.synopsys.com/products/logic/design_compiler.html. Synopsys Design Compiler, 2001.Google Scholar
- Z. Hu and M. Martonosi. Reducing register file power consumption by exploiting value lifetime.Google Scholar
- R. Huff. Lifetime-sensitive modulo scheduling. In Proceedings of the SIGPLAN Conference on Programming Language Design and Implementation, pages 258--267, 1993. Google Scholar
Digital Library
- Intel Corporation, http://www.intel.com/design/iio/manuals/273411.htm. Intel 80200 Processor based on Intel XScale Microarchitecture.Google Scholar
- Intel Corporation, http://www.intel.com/design/intelxscale/273473.htm. Intel XScale(R) Core: Developer's Manual.Google Scholar
- A. Kalambur and M. J. Irwin. An extended addressing mode for low power. In ISLPED '97: Proceedings of the 1997 international symposium on Low power electronics and design, pages 208--213, New York, NY, USA, 1997. ACM Press. Google Scholar
Digital Library
- N. S. Kim and T. Mudge. Reducing register ports using delayed write-back queues and operand pre-fetch. In ICS '03: Proceedings of the 17th annual international conference on Supercomputing, pages 172--182, New York, NY, USA, 2003. ACM Press. Google Scholar
Digital Library
- I. Park, M. D. Powell, and T. N. Vijaykumar. Reducing register ports for higher speed and lower energy. In MICRO 35: Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture, pages 171--182, Los Alamitos, CA, USA, 2002. IEEE Computer Society Press. Google Scholar
Digital Library
- P. Shivakumar and N. Jouppi. Cacti 3.0: An integrated cache timing, power, and area model. In WRL Technical Report 2001/2, 2001.Google Scholar
- A. Shrivastava, N. Dutt, A. Nicolau, and E. Earlie. Pbexplore: A framework for compiler-in-the-loop exploration of partial bypassing in embedded processors. In DATE '05: Proceedings of the conference on Design, Automation and Test in Europe, pages 1264--1269, Washington, DC, USA, 2005. IEEE Computer Society. Google Scholar
Digital Library
- A. Shrivastava, E. Earlie, N. Dutt, and A. Nicolau. Operation tables for scheduling in the presence of incomplete bypassing. In CODES+ISSS '04: Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis, pages 194--199, New York, NY, USA, 2004. ACM Press. Google Scholar
Digital Library
- R. M. Tomasulo. An efficient algorithm for exploiting multiple arithmetic units. IBM Journal of Research and Development, 11(1), 1967.Google Scholar
Digital Library
- J. H. Tseng and K. Asanovic. Energy-efficient register access. In SBCCI '00: Proceedings of the 13th symposium on Integrated circuits and systems design, page 377, Washington, DC, USA, 2000. IEEE Computer Society. Google Scholar
Digital Library
- L. Wehmeyer, M. K. Jain, S. Steinke, P. Marwedel, and M. Balakrishnan. Analysis of the influence of register file size on energy consumption, code size, and execution time. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 20(11):1329--1337, 2001. Google Scholar
Digital Library
- H.-S. Yun and J. Kim. Power-aware modulo scheduling for high-performance vliw, 2001. Google Scholar
Digital Library
- V. Zyuban and P. Kogge. The energy complexity of register files. In ISLPED '98: Proceedings of the 1998 international symposium on Low power electronics and design, pages 305--310, New York, NY, USA, 1998. ACM Press. Google Scholar
Digital Library
Index Terms
Bypass aware instruction scheduling for register file power reduction
Recommendations
Register File Power Reduction Using Bypass Sensitive Compiler
This paper explores, develops, and investigates several bypass-sensitive compilation techniques to reduce the register file power by reducing the access frequency to the register file. We study the effectiveness of our techniques on the Intel XScale ...
The instruction register file micro-architecture
Special issue: Parallel computing technologiesIn this paper, we address the issue of feeding future superscalar processor cores with enough instructions. Hardware techniques targeting an increase in the instruction fetch bandwidth have been proposed such as the trace cache microarchitecture. We ...
Operation Tables for Scheduling in the Presence of Incomplete Bypassing
CODES+ISSS '04: Proceedings of the international conference on Hardware/Software Codesign and System Synthesis: 2004Register bypassing is a powerful and widely used feature in modern processors to eliminate certain data hazards. Although complete bypassing is ideal for performance, bypassing has significant impact on cycle time, area, and power consumption of the ...






Comments