skip to main content
10.1145/1375657.1375668acmconferencesArticle/Chapter ViewAbstractPublication PagescpsweekConference Proceedingsconference-collections
research-article

Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors

Published:12 June 2008Publication History

ABSTRACT

While Ultra Deep Submicron (UDSM) CMOS scaling gives embedded processor designers ample silicon budget to increase processor resources to improve performance, restrictions with the power budget and practically achievable operating clock frequencies act as limiting factors. In this paper we show how just increasing processor resource size is not effective in improving performance due to constraints on achievable operating clock frequency. In response we propose two adaptive resource resizing techniques L2RS and L2ML1RS that adaptively resize resources by exploiting cache misses. Our results show a significant performance improvement and overall energy-delay reduction of on average 9.2% (upto 34%) and 3.8% respectively across SPEC2K benchmarks for L2ML1RS. Applying L2RS resulted in 6.8% performance improvement (upto 24%) and 4.6% energy-delay reduction. We also present the required circuit modification to apply these techniques which shown to be minimal.

References

  1. A. Terechko, M. Garg, H. Corporaal, "Evaluation of speed and area of clustered VLIW processors," VLSI Design, 2005. 18th International Conference on , vol., no., pp. 557--563, 3-7 Jan. 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. J.L. Cruz, A. González, et al., "Multiple-banked register file architectures", International Symposium on Computer Architecture, pp. 316--325, Vancouver, Canada, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. J.H. Tseng, K. Asanovic, et al., "Banked Multiported Register Files for High-Frequency Superscalar Microprocessors", International Symposium on Computer Architecture, San Diego, California, USA, 9-11 June 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Stijn Eyerman, Lieven Eeckhout, Koen De Bosschere, "Efficient Design Space Exploration of High Performance Embedded Out-of-Order Processors", DATE 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Joseph Sharkey, Dmitry Ponomarev, "An L2-Miss-Driven Early Register Deallocation for SMT Processors", ICS 2007.Google ScholarGoogle Scholar
  6. O. Ergin, et al.,, "Increasing Processor Performance through Early Register Release", Int'l Conference on Computer Design, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. S. Rixner,W. Dally, B. Khailany, P. Mattson, U. Kapasi, and J. Owens. "Register organization for media processing." In Proc. of the 6th Intl. Symp. on High-Performance Computer Architecture, pages 375--386, 1999.Google ScholarGoogle Scholar
  8. Keith I. Farkas, Paul Chow, Norman P. Jouppi, and Zvonko G. Vranesic. "The Multicluster architecture: Reducing cycle time through partitioning." In MICRO-30, pages 149--159, 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. A. Seznec, E. Toullec, and O. Rochecouste. "Register write specialization register read specialization: A path to complexity-effective wide-issue superscalar processors." In MICRO-35, Turkey, November 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. R. Balasubramonian, S. Dwarkadas, and D.H. Albonesi. "Reducing the complexity of the register file in dynamic superscalar processors." In MICRO-34, December 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. IBM Corporation. PowerPC 750 RISC Microprocessor Technical Summary. www.ibm.com.Google ScholarGoogle Scholar
  12. G. Kucuk, D. Ponomarev, and K. Ghose. "Low-complexity reorder buffer architecture." Proceedings of the 16th ACM International Conference on Supercomputing, 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Goto, M. and Sato, T., "Leakage Energy Reduction in Register Renaming", in Proc. 1st Int'l Workshop on Embedded Computing Systems (ECS) held in conjunction with 24th ICDCS, pp.890--895, March 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Buyuktosunoglu, D. Albonesi, S. Schuster, D. Brooks, P. Bose, and P. Cook, "A Circuit Level Implementation of an Adaptive Issue Queue for Power-Aware Microprocessors," Proc. Great Lakes Symp. VLSI Design, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. D. Folegnani and A. Gonzalez, "Energy-Effective Issue Logic," Proc. Int'l Symp. Computer Architecture, pp. 230--239, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. D. Ponomarev, G. Kucuk, K. Ghose, "Dynamic Resizing of Superscalar Datapath Components for Energy Efficiency," IEEE Transactions on Computers ,vol. 55, no. 2, pp. 199--213, February, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Steven E. Raasch, Nathan L. Binkert and Steven K. Reinhardt, "A Scalable Instruction Queue Design Using Dependence Chains", Proceedings of 29th Annual of International Symposium on Computer Architecture, 2002 Page(s): 318--329. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. S. Palacharla, N. Jouppi, and J. E. Smith. "Complexity effective superscalar processors." In ISCA-24, pages 206--218, June 1997. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Cacti4," http://quid.hpl.hp.com:9081/cacti/.Google ScholarGoogle Scholar
  20. SimpleScalar4 tutorial, SimpleScalar LLC. http://www.simplescalar.com/tutorial.htmlGoogle ScholarGoogle Scholar
  21. D. Brooks, V. Tiwari, and M. Martonosi. "Wattch: A framework for architectural-level power analysis and optimizations." In 27th Annual International Symposium on Computer Architecture, June 2000. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. S. Geissler et al., "A low-power RISC microprocessor using dual PLLs in a 0.13/spl mu/m SOI technology with copper interconnect and low-k BEOL dielectric", in ISSCC 2002.Google ScholarGoogle Scholar

Index Terms

  1. Improving performance and reducing energy-delay with adaptive resource resizing for out-of-order embedded processors

      Recommendations

      Reviews

      Carlos Juiz

      Homayoun et al. contend that increasing the size of processor resources is not an effective way to improve performance, due to constraints on achievable clock frequency during operation. In fact, while increasing the size of processor resources, such as the reorder buffer, instruction queue, and register file, can deliver a higher number of instructions per cycle, the effect the resource increase has on achievable operating frequency can have a negative impact, resulting in an overall degradation of execution time. They observe that after either one L2 cache miss or several L1 cache misses, one of the incremented resources completely fills, becoming a performance bottleneck. They propose two adaptive resource resizing techniques, namely L2RS and L2ML1RS, by exploiting cache misses. The results of experiments run across SPEC2K benchmarking show significant improvements in performance and energy-delay reduction on both approximations. The proposed solution adapts the resource size by upsizing only during cache misses; that is, during normal periods, resources are kept at their normal level, and during a cache miss period, resources are incremented. Pipelining on these upsized resources permits meeting frequency targets. One of the most interesting features of the proposal is that the authors also present the circuit modification to realize the approximations, which seems very simple. Online Computing Reviews Service

      Access critical reviews of Computing literature here

      Become a reviewer for Computing Reviews.

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in
      • Published in

        cover image ACM Conferences
        LCTES '08: Proceedings of the 2008 ACM SIGPLAN-SIGBED conference on Languages, compilers, and tools for embedded systems
        June 2008
        180 pages
        ISBN:9781605581040
        DOI:10.1145/1375657
        • cover image ACM SIGPLAN Notices
          ACM SIGPLAN Notices  Volume 43, Issue 7
          LCTES '08
          July 2008
          167 pages
          ISSN:0362-1340
          EISSN:1558-1160
          DOI:10.1145/1379023
          Issue’s Table of Contents

        Copyright © 2008 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 12 June 2008

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article

        Acceptance Rates

        Overall Acceptance Rate116of438submissions,26%

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!