ABSTRACT
While Ultra Deep Submicron (UDSM) CMOS scaling gives embedded processor designers ample silicon budget to increase processor resources to improve performance, restrictions with the power budget and practically achievable operating clock frequencies act as limiting factors. In this paper we show how just increasing processor resource size is not effective in improving performance due to constraints on achievable operating clock frequency. In response we propose two adaptive resource resizing techniques L2RS and L2ML1RS that adaptively resize resources by exploiting cache misses. Our results show a significant performance improvement and overall energy-delay reduction of on average 9.2% (upto 34%) and 3.8% respectively across SPEC2K benchmarks for L2ML1RS. Applying L2RS resulted in 6.8% performance improvement (upto 24%) and 4.6% energy-delay reduction. We also present the required circuit modification to apply these techniques which shown to be minimal.
- A. Terechko, M. Garg, H. Corporaal, "Evaluation of speed and area of clustered VLIW processors," VLSI Design, 2005. 18th International Conference on , vol., no., pp. 557--563, 3-7 Jan. 2005. Google Scholar
Digital Library
- J.L. Cruz, A. González, et al., "Multiple-banked register file architectures", International Symposium on Computer Architecture, pp. 316--325, Vancouver, Canada, June 2000. Google Scholar
Digital Library
- J.H. Tseng, K. Asanovic, et al., "Banked Multiported Register Files for High-Frequency Superscalar Microprocessors", International Symposium on Computer Architecture, San Diego, California, USA, 9-11 June 2003. Google Scholar
Digital Library
- Stijn Eyerman, Lieven Eeckhout, Koen De Bosschere, "Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors", DATE 2006. Google Scholar
Digital Library
- Joseph Sharkey, Dmitry Ponomarev, "An L2-Miss-Driven Early Register Deallocation for SMT Processors", ICS 2007.Google Scholar
- O. Ergin, et al.,, "Increasing Processor Performance through Early Register Release", Int'l Conference on Computer Design, 2004. Google Scholar
Digital Library
- S. Rixner,W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. "Register organization for media processing." In Proc. of the 6th Intl. Symp. on High-Performance Computer Architecture, pages 375--386, 1999.Google Scholar
- Keith I. Farkas, Paul Chow, Norman P. Jouppi, and Zvonko G. Vranesic. "The Multicluster architecture: Reducing cycle time through partitioning." In MICRO-30, pages 149--159, 1997. Google Scholar
Digital Library
- A. Seznec, E. Toullec, and O. Rochecouste. "Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors." In MICRO-35, Turkey, November 2002. Google Scholar
Digital Library
- R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi. "Reducing the complexity of the register file in dynamic superscalar processors." In MICRO-34, December 2001. Google Scholar
Digital Library
- IBM Corporation. PowerPC 750 RISC Microprocessor Technical Summary. www.ibm.com.Google Scholar
- G. Kucuk, D. Ponomarev, and K. Ghose. "Low-complexity reorder buffer architecture." Proceedings of the 16th ACM International Conference on Supercomputing, 2002. Google Scholar
Digital Library
- Goto, M. and Sato, T., "Leakage Energy Reduction in Register Renaming", in Proc. 1st Int'l Workshop on Embedded Computing Systems (ECS) held in conjunction with 24th ICDCS, pp.890--895, March 2004. Google Scholar
Digital Library
- A. Buyuktosunoglu, D. Albonesi, S. Schuster, D. Brooks, P. Bose, and P. Cook, "A Circuit Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors," Proc. Great Lakes Symp. VLSI Design, 2001. Google Scholar
Digital Library
- D. Folegnani and A. Gonzalez, "Energy-Effective Issue Logic," Proc. Int'l Symp. Computer Architecture, pp. 230--239, 2001. Google Scholar
Digital Library
- D. Ponomarev, G. Kucuk, K. Ghose, "Dynamic Resizing of Superscalar Datapath Components for Energy Efficiency," IEEE Transactions on Computers ,vol. 55, no. 2, pp. 199--213, February, 2006. Google Scholar
Digital Library
- Steven E. Raasch, Nathan L. Binkert and Steven K. Reinhardt, "A Scalable Instruction Queue Design Using Dependence Chains", Proceedings of 29th Annual of International Symposium on Computer Architecture, 2002 Page(s): 318--329. Google Scholar
Digital Library
- S. Palacharla, N. Jouppi, and J. E. Smith. "Complexity effective superscalar processors." In ISCA-24, pages 206--218, June 1997. Google Scholar
Digital Library
- Cacti4," http://quid.hpl.hp.com:9081/cacti/.Google Scholar
- SimpleScalar4 tutorial, SimpleScalar LLC. http://www.simplescalar.com/tutorial.htmlGoogle Scholar
- D. Brooks, V. Tiwari, and M. Martonosi. "Wattch: A framework for architectural-level power analysis and optimizations." In 27th Annual International Symposium on Computer Architecture, June 2000. Google Scholar
Digital Library
- S. Geissler et al., "A low-power RISC microprocessor using dual PLLs in a 0.13/spl mu/m SOI technology with copper interconnect and low-k BEOL dielectric", in ISSCC 2002.Google Scholar
Index Terms
Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors
Recommendations
Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors
LCTES '08While Ultra Deep Submicron (UDSM) CMOS scaling gives embedded processor designers ample silicon budget to increase processor resources to improve performance, restrictions with the power budget and practically achievable operating clock frequencies act ...
Multithreading to Improve Cycle Width and CPI in Superpipelined Superscalar Processors
ISPAN '96: Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and NetworksThis paper presents a multithreaded superpipelined superscalar processor design. It is expected to have a sustained rate of 5.4 instructions run per cycle, with 4 threads on chip. Multithreading serves to improve the superscalar CPI by interleaving ...
Dynamic register file resizing and frequency scaling to improve embedded processor performance and energy-delay efficiency
DAC '08: Proceedings of the 45th annual Design Automation ConferenceWith CMOS scaling leading to ever increasing levels of transistor integration on a chip, designers of high-performance embedded processors have ample area available to increase processor resources in order to improve performance. However, increasing ...









Comments