skip to main content
article

Coupling compiler-enabled and conventional memory accessing for energy efficiency

Published:01 May 2004Publication History
Skip Abstract Section

Abstract

This article presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general-purpose architectures. The solutions provided in this article leverage on interlayer tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves power reduction by statically matching memory operations with energy-efficient cache and virtual memory access mechanisms. It combines statically speculative cache access modes, a dynamic content addressable memory-based (CAM-based) Tag-Cache used as backup for statically mispredicted accesses, different conventional multilevel associative cache organizations, embedded protection checking along all cache access mechanisms, as well as architectural organizations to reduce the power consumed by address translation in virtual memory. Because it is based on speculative static information, a superset of the predictable program information available at compile-time, our approach removes the burden of provable correctness in compiler analysis passes that extract static information. This makes Cool-Mem highly practical, applicable for large and complex applications, without having any limitations due to complexity issues in our compiler passes or the presence of precompiled static libraries. Based on extensive evaluation, for both SPEC2000 and Mediabench applications, we obtain from 6% to 19% total energy savings in the processor, with performance ranging from 1.5% degradation to 6% improvement, for the applications studied. We have also compared Cool-Mem to several prior arts and have found Cool-Mem to perform better in almost all cases.

References

  1. Albonesi, D. H. 1999. Selective cache ways: On-demand cache resource allocation. In International Symposium on Microarchitecture.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Balasubramonian, R., Albonesi, D. H., Buyuktosunoglu, A., and Dwarkadas, S. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In International Symposium on Microarchitecture.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Benini, L., Macii, A., and Poncino, M. 2000. A recursive algorithm for low-power memory partitioning. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED '00, July).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Borkar, S., Ye, Y., and De, V. 1998. A technique for standby leakage reduction in high-performance circuits. In Symposium on VLSI Circuits. 40--41.]]Google ScholarGoogle Scholar
  5. Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA '00, June).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Burger, D. C. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. Tech. rep. CS-TR-1997--1342, University of Wisconsin-Madison, Madison, WI.]]Google ScholarGoogle Scholar
  7. Chandrakasan, A. P., Bowhill, W., and Fox, F. (Eds.). 2000. Design of High-Performance Microprocessor Circuits. John Wiley & Sons, New York, NY.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Chase, J. S., Levy, H. M., Lazowska, E. D., and Baker-Harvey, M. 1992. Lightweight shared objects in a 64-bit operating system. Tech. rep. 92-03-09. University of Washington, Seattle, WA (March).]]Google ScholarGoogle Scholar
  9. Chen, J. B., Borg, A., and Jouppi, N. P. 1992. A Simulation-based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA '92, May).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cheng, R. 1987. Virtual address cache in Unix. In Proceedings of the 1987 Summer Usenix Conference. 217--224.]]Google ScholarGoogle Scholar
  11. Cheriton, D. R., Slavenberg, G. A., and Boyle, P. D. 1986. Software-controlled caches in the VMP multiprocessor. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA '86, Jan.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Cortadella, J. and Llaberia, J. M. 1992. Evaluation of A + B = T condition without carry propogation. IEEE Trans. Comput. 41, 11 (Nov.), 1484--1488.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Digital Equipment Corporation. 1997. 21164 Alpha Microprocessor Hardware Reference Manual. Digital Equipment Corporation, Maynard, MA.]]Google ScholarGoogle Scholar
  14. Flautner, K., Kim, N. S., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: Simple techniques for reducing leakage Power. In International Symposium on Computer Architecture (May).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Goodman, J. and Woest, P. 1988. The Wisconsin multicube: A new large-scale cache-coherent multiprocessor. In Proceedings of the 15th International Symposium on Computer Architecture (ISCA '88, June).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Goodman, J. R. 1987. Coherency for multiprocessor virtual address caches. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '87, Oct.).]] Google ScholarGoogle ScholarCross RefCross Ref
  17. Gowan, M. K., Biro, L. L., and Jackson, D. B. 1998. Power considerations in the design of the Alpha 21264 microprocessor. In Proceedings of the 35th Design Automation Conference (DAC '98).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Henning, J. L. 2000. SPEC CPU2000: Measuring CPU Performance in the New Millennium. In IEEE Comput. July, 28--35. Available online at http://www.specbench.org.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Hu, Z., Juang, P., Diodato, P., Kaxiras, S., Skadron, K., Martonosi, M., and Clark, D. 2002. Managing leakage for transient data: Decay and quasi-static 4T memory cells. In International Symposium on Low-Power Electronics and Design (Aug.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Huang, M., Renau, J., Yoo, S.-M., and Torrellas, J. 2001. L1 data cache decomposition for energy efficiency. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISPLED '01, Aug.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Inoue, K., Ishihara, T., and Murakami, K. 1999. Way-Predicting set-associative cache for high performance and low energy consumption. In Proceedings of the International Symposium on Low-Power Electronic Design (ISPLED '99, Aug.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Iyer, A. and Marculescu, D. 2001. Power aware microarchitecture resource scaling. In Proceedings of the IEEE Design, Automation and Test in Europe (DATE, March).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Jacob, B. L. and Mudge, T. N. 1997. Software-managed address translation. In Proceedings of the 3rd International Symposium on High Performance Computer Architecture (HPCA '97, Feb.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Jacob, B. L. and Mudge, T. N. 2001. Uniprocessor virtual memory without TLBs. In IEEE Trans. Comput. 50, 5 (May), 482--499.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing TLB power requirements. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED '97, Aug.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Kao, J. T. and Chandrakasan, A. P. 2000. Dual-threshold voltage techniques for low-power digital circuits. IEEE J. Solid-State Circ. 35, 7 (July), 1009--1018.]]Google ScholarGoogle ScholarCross RefCross Ref
  27. Kin, J., Gupta, M., and Smith, W. M. 1997. The Filter Cache: An energy efficient memory structure. In Proceedings of the 30th Annual Symposium on Microarchitecture (MICRO '97, Dec.). IEEE Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Kuroda, T. and Sakurai, T. 1996. Threshold-Voltage Control Schemes through Substrate-Bias for Low-Power High-Speed CMOS LSI Design. In J. VLSI Signal Process. Syst. 30, 2/3 (Aug.), 191--202.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Kuroda, T., Suzuki, K., Mira, S., Fujita, T., Yamane, F., Sano, F., Akihiko, C., Watanabe, Y., Yoshinori, M., Matsuda, K., Maeda, T., Sakurai, T., and Tohru, F. 1998. Variable supply-voltage scheme for low-power high-speed CMOS digital design. IEEE J. Solid-State Circ. 33, 3 (March), 454--462.]]Google ScholarGoogle ScholarCross RefCross Ref
  30. Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the 30th Annual Symposium on Microarchitecture (MICRO '97). IEEE Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Ma, A., Zhang, M., and Asanovic, K. 2001. Way memoization to reduce fetch energy in instruction caches. In Workshop on Complexity Effective Design, 28th International Symposium on Computer Architecture (ISCA '01, July).]]Google ScholarGoogle Scholar
  32. Montanaro, J. et al. 1997. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. In Digital Tech. J. 9, 1, 49--62.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Moritz, C. A., Frank, M., and Amarasinghe, S. 2001. FlexCache: A framework for compiler generated data caching. In Intelligent Memory Systems: Second International Workshop (IMS 2000), Cambridge, MA, November, 12, 2000, Revised Papers, F. T. Chong, C. E Kozyrakis, and M. Oskin, Eds. Lecture Notes in Computer Science, vol. 2107. Springer-Verlag, Heidelberg, Germany, 135--146.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Moritz, C. A., Frank, M., Lee, W., and Amarasinghe, S. 1999. Hot Pages: Software caching for raw microprocessors. MIT-LCS Tech. Memo LCS-TM-599. MIT, Cambridge, MA.]]Google ScholarGoogle Scholar
  35. Mutoh, S., Douseki, T., Aoki, Y. M. T., Shingematsu, S., and Yamada, J. 1995. 1-V power supply high-speed digital circuit technology with multi-threshold CMOS technology. IEEE J. Solid-State Circ. 30, 8 (Aug.), 847--854.]]Google ScholarGoogle ScholarCross RefCross Ref
  36. Patterson, D. A. and Hennessy, J. L. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Powell, M., Yang, S., Falsafi, B., Roy, K., and Vijaykumar, T. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of ISLPED.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Powell, M. D., Agarwal, A., Vijaykumar, T. N., Falsafi, B., and Roy, K. 2001. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Reinman, G. and Jouppi, N. 2000. An integrated cache timing and power model. Compaq WRL Res. rep. 2000/70 Compaq Computer Corporation Western Research Laboratory, Palo Alto, CA.]]Google ScholarGoogle Scholar
  40. Sair, S. and Charney, M. 2000. Memory behaviour of the SPEC2000 benchmark suite. IBM T. J. Watson Research Center technical report. IBM T. J. Watson Research Center, Yorktown Heights, NY.]]Google ScholarGoogle Scholar
  41. Scott, M. L., LeBlanc, T. J., and Marsh, B. D. 1988. Design rationale for Psyche, a general-purpose multiprocessor operating system. In Proceedings of the 1988 International Conference on Parallel Processing.]]Google ScholarGoogle Scholar
  42. Shigematsu, S. et al. 1997. A 1-V high-speed MTCMOS circuit scheme for power-down application circuits. IEEE J. Solid-State Circ. 32, 6 (June), 861--869.]]Google ScholarGoogle ScholarCross RefCross Ref
  43. Smith, A. J. 1982. Cache memories. Comput. Surv. 14, 3 (Sept.), 473--530.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Unsal, O. S., Ashok, R., Koren, I., Krishna, C. M., and Moritz, C. A. 2001. Cool-cache for hot multimedia. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Unsal, O. S., Koren, I., Krishna, C. M., and Moritz, C. A. 2002. Cool-Fetch: Compiler-enabled power-aware fetch throttling. In IEEE Comput. Architect. Lett. 1.]]Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. Villa, L., Zhang, M., and Asanovic, K. 2000. Dynamic zero compression for cache energy reduction. In International Symposium on Microarchitecture.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  47. Wang, W.-H., Baer, J.-L., and Levy, H. M. 1989. Organization and performance of a two-level virtual-real cache hierarchy. In Proceedings of the 16th International Symposium on Computer Architecture (ISCA '89, June).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Wheeler, B. and Bershad, B. N. 1992. Consistency management for virtually indexed caches. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '92, Oct.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Witchel, E., Larsen, S., Ananian, C. S., and Asanovic, K. 2001. Direct addressed caches for reduced power consumption. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. Wood, D. A., Eggers, S. J., Gibson, G., Hill, M. D., Pendleton, J. M., Ritchie, S. A., Taylor, G. S., Katz, R. H., and Patterson, D. A. 1986. An in-cache address translation mechanism. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA '86, Jan.).]] Google ScholarGoogle ScholarDigital LibraryDigital Library
  51. Zhang, M. and Asanovic, K. 2000. Highly-associative caches for low-power processors. In Kool Chips Workshop, 33rd Annual Symposium on Microarchitecture (MICRO '00, Dec.).]]Google ScholarGoogle Scholar

Index Terms

  1. Coupling compiler-enabled and conventional memory accessing for energy efficiency

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Computer Systems
      ACM Transactions on Computer Systems  Volume 22, Issue 2
      May 2004
      144 pages
      ISSN:0734-2071
      EISSN:1557-7333
      DOI:10.1145/986533
      Issue’s Table of Contents

      Copyright © 2004 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 1 May 2004
      Published in tocs Volume 22, Issue 2

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • article
    • Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0

      Other Metrics

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader