skip to main content
article

Static strands: Safely exposing dependence chains for increasing embedded power efficiency

Published:01 September 2007Publication History
Skip Abstract Section

Abstract

Modern embedded processors are designed to maximize execution efficiency—the amount of performance achieved per unit of energy dissipated while meeting minimum performance levels. To increase this efficiency, we propose utilizing static strands, dependence chains without fan-out, which are exposed by a compiler pass. These dependent instructions are resequenced to be sequential and annotated to communicate their location to the hardware. Importantly, this modified application is binary compatible and functionally identical to the original, allowing transparent execution on a baseline processor. However, these static strands can be easily collapsed and optimized by simple processor modifications, significantly reducing the workload energy. Results show that over 30% of MediaBench and Spec2000int dynamic instructions can be collapsed, reducing issue logic energy by 20%, bypass energy 19%, and register file energy 14%. In addition, by increasing the effective capactity of pipeline resources by almost a third, average IPC can be improved up to 15%. This performance gain can then be traded in for a lower clock frequency to maintain a basline level of performance, further reducing energy.

References

  1. Bik, A., Girkar, M., Grey, P., and Tian, X. 2001. Efficient exploitation of parallelism on Pentium III and Pentium 4 processor-based systems. In Intel Technology Journal.Google ScholarGoogle Scholar
  2. Bracy, A., Prahlad, P., and Roth, A. 2004. Dataflow mini-graphs: Amplifying superscalar capacity and bandwidth. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Brash, D. 2002. The ARM architecture version 6 (ARMv6). White paper, ARM.Google ScholarGoogle Scholar
  4. Burger, D. and Austin, T. 1997. The Simplescalar tool set, version 2.0. Tech. Rep. 1342, Dept of Computer Science, University of Wisconsin-Madison.Google ScholarGoogle Scholar
  5. Butts, A. and Sohi, G. 2002. Characterizing and predicting value degree of use. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Cao, Y., Sato, T., Sylvester, D., Orshansky, M., and Hu, C. 2000. New paradigm of predictive mosfet and interconnect modeling for early circuit design. In Proceedings of IEEE Custom Integrated Circuits Conference.Google ScholarGoogle Scholar
  7. Clark, N., Kudlur, M., Park, H., Mahlke, S., and Flautner, K. 2004. Application-specific processing on a general-purpose core via transparent instruction set customization. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Corbal, J., Valero, M., and Espasa, R. 1999. Exploiting a new level of DLP in multimedia applications. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Costa, A., Franca, F., and Filho, E. 2000. The dynamic trace memoization reuse technique. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ernst, D. and Austin, T. 2002. Efficient dynamic scheduling through tag elimination. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Gochman, S., Ronen, R., Anati, I., Berkovits, A., Kurts, T., Naveh, A., Saeed, A., Sperber, Z., and Valentine, R. 2003. The Intel Pentium M processor: Microarchitecture and performance. Intel Technology Journal 7, 2 (May).Google ScholarGoogle Scholar
  12. Huang, J. and Lilja, D. 1999. Exploiting basic block value locality with block reuse. In Proceedings of the International Symposium on High Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Hwu, W., Mahlke, S., Chen, W., Chang, P., Water, N., Bringmann, R., Ouellette, R., Hank, R., Kiyohara, T., Haab, G., Holm, J., and Lavery, D. 1993. The superblock: An effective structure for VLIW and superscalar compilation. Journal of Supercomputing 7, 1 (Jan.). Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. IBM Corporation. PowerPC 750 RISC Microprocessor Technical Summary. http://www-3.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF778525699300470399/$file/750_ts.pdfwww.ibm.com.Google ScholarGoogle Scholar
  15. Kim, H. and Smith, J. 2003. Dynamic binary translation for accumulator-oriented architectures. In Proceedings of the International Conference on Code Generation and Optimization. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Kim, I. and Lipasti, M. 2003a. Half-price architecture. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Kim, I. and Lipasti, M. 2003b. Macro-op scheduling: Relaxing scheduling loop constraints. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Lee, C., Potkonjak, M., and Mangione-Smith, W. 1997. Mediabench: A tool for evaluating multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Mamidipaka, M. and Dutt, N. 2004. eCACTI: An enhanced power estimation model for on-chip caches. Tech. Rep. 04-28, Center for Embedded Computer Systems, University of California, Irvine.Google ScholarGoogle Scholar
  20. Marquez, A., Theobald, K., Tang, X., and Gao, G. 1997. A superstrand architecture. Technical Memo 14, University of Delaware, Computer Architecture and Parallel Systems Laboratory.Google ScholarGoogle Scholar
  21. Palacharla, S., Jouppi, N., and Smith, J. 1997. Complexity-effective superscalar processors. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Park, I., Powell, M., and Vijaykumar, T. 2002. Reducing register ports for higher speed and lower energy. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Pilla, M., Navaux, P., Costa, A., Franca, F., Childers, B., and Soffa, M. 2003. The limits of speculative trace reuse on deeply pipelined processors. In Proceedings of the Computer Architecture and High Performance Computing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Raasch, S., Binkert, N., and Reinhardt, S. 2002. A scalable instruction queue design using dependence chains. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Renesas Technology. SH-4A Software Manual. http://documentation.renesas.com/eng/products/mpumcu/rej09b0003_sh4a.pdfwww.renesas.com.Google ScholarGoogle Scholar
  26. Sassone, P. and Wills, D. 2004. Dynamic strands: Collapsing speculative dependence chains for reducing pipeline communication. In Proceedings of the International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. UC Berkeley. Berkeley predictive technology model. http://www-device.eecs.berkeley.edu/~ptmwww-device.eecs.berkeley.edu/~ptm.Google ScholarGoogle Scholar
  28. Yehia, S. and Temam, O. 2004. From sequences of dependent instructions to functions: A complexity-effective approach for improving performance without ILP or speculation. In Proceedings of the International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Static strands: Safely exposing dependence chains for increasing embedded power efficiency

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!