Abstract
This article presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general-purpose architectures. The solutions provided in this article leverage on interlayer tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves power reduction by statically matching memory operations with energy-efficient cache and virtual memory access mechanisms. It combines statically speculative cache access modes, a dynamic content addressable memory-based (CAM-based) Tag-Cache used as backup for statically mispredicted accesses, different conventional multilevel associative cache organizations, embedded protection checking along all cache access mechanisms, as well as architectural organizations to reduce the power consumed by address translation in virtual memory. Because it is based on speculative static information, a superset of the predictable program information available at compile-time, our approach removes the burden of provable correctness in compiler analysis passes that extract static information. This makes Cool-Mem highly practical, applicable for large and complex applications, without having any limitations due to complexity issues in our compiler passes or the presence of precompiled static libraries. Based on extensive evaluation, for both SPEC2000 and Mediabench applications, we obtain from 6% to 19% total energy savings in the processor, with performance ranging from 1.5% degradation to 6% improvement, for the applications studied. We have also compared Cool-Mem to several prior arts and have found Cool-Mem to perform better in almost all cases.
- Albonesi, D. H. 1999. Selective cache ways: On-demand cache resource allocation. In International Symposium on Microarchitecture.]] Google Scholar
Digital Library
- Balasubramonian, R., Albonesi, D. H., Buyuktosunoglu, A., and Dwarkadas, S. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In International Symposium on Microarchitecture.]] Google Scholar
Digital Library
- Benini, L., Macii, A., and Poncino, M. 2000. A recursive algorithm for low-power memory partitioning. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED '00, July).]] Google Scholar
Digital Library
- Borkar, S., Ye, Y., and De, V. 1998. A technique for standby leakage reduction in high-performance circuits. In Symposium on VLSI Circuits. 40--41.]]Google Scholar
- Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA '00, June).]] Google Scholar
Digital Library
- Burger, D. C. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. Tech. rep. CS-TR-1997--1342, University of Wisconsin-Madison, Madison, WI.]]Google Scholar
- Chandrakasan, A. P., Bowhill, W., and Fox, F. (Eds.). 2000. Design of High-Performance Microprocessor Circuits. John Wiley & Sons, New York, NY.]] Google Scholar
Digital Library
- Chase, J. S., Levy, H. M., Lazowska, E. D., and Baker-Harvey, M. 1992. Lightweight shared objects in a 64-bit operating system. Tech. rep. 92-03-09. University of Washington, Seattle, WA (March).]]Google Scholar
- Chen, J. B., Borg, A., and Jouppi, N. P. 1992. A Simulation-based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA '92, May).]] Google Scholar
Digital Library
- Cheng, R. 1987. Virtual address cache in Unix. In Proceedings of the 1987 Summer Usenix Conference. 217--224.]]Google Scholar
- Cheriton, D. R., Slavenberg, G. A., and Boyle, P. D. 1986. Software-controlled caches in the VMP multiprocessor. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA '86, Jan.).]] Google Scholar
Digital Library
- Cortadella, J. and Llaberia, J. M. 1992. Evaluation of A + B = T condition without carry propogation. IEEE Trans. Comput. 41, 11 (Nov.), 1484--1488.]] Google Scholar
Digital Library
- Digital Equipment Corporation. 1997. 21164 Alpha Microprocessor Hardware Reference Manual. Digital Equipment Corporation, Maynard, MA.]]Google Scholar
- Flautner, K., Kim, N. S., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: Simple techniques for reducing leakage Power. In International Symposium on Computer Architecture (May).]] Google Scholar
Digital Library
- Goodman, J. and Woest, P. 1988. The Wisconsin multicube: A new large-scale cache-coherent multiprocessor. In Proceedings of the 15th International Symposium on Computer Architecture (ISCA '88, June).]] Google Scholar
Digital Library
- Goodman, J. R. 1987. Coherency for multiprocessor virtual address caches. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '87, Oct.).]] Google Scholar
Cross Ref
- Gowan, M. K., Biro, L. L., and Jackson, D. B. 1998. Power considerations in the design of the Alpha 21264 microprocessor. In Proceedings of the 35th Design Automation Conference (DAC '98).]] Google Scholar
Digital Library
- Henning, J. L. 2000. SPEC CPU2000: Measuring CPU Performance in the New Millennium. In IEEE Comput. July, 28--35. Available online at http://www.specbench.org.]] Google Scholar
Digital Library
- Hu, Z., Juang, P., Diodato, P., Kaxiras, S., Skadron, K., Martonosi, M., and Clark, D. 2002. Managing leakage for transient data: Decay and quasi-static 4T memory cells. In International Symposium on Low-Power Electronics and Design (Aug.).]] Google Scholar
Digital Library
- Huang, M., Renau, J., Yoo, S.-M., and Torrellas, J. 2001. L1 data cache decomposition for energy efficiency. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISPLED '01, Aug.).]] Google Scholar
Digital Library
- Inoue, K., Ishihara, T., and Murakami, K. 1999. Way-Predicting set-associative cache for high performance and low energy consumption. In Proceedings of the International Symposium on Low-Power Electronic Design (ISPLED '99, Aug.).]] Google Scholar
Digital Library
- Iyer, A. and Marculescu, D. 2001. Power aware microarchitecture resource scaling. In Proceedings of the IEEE Design, Automation and Test in Europe (DATE, March).]] Google Scholar
Digital Library
- Jacob, B. L. and Mudge, T. N. 1997. Software-managed address translation. In Proceedings of the 3rd International Symposium on High Performance Computer Architecture (HPCA '97, Feb.).]] Google Scholar
Digital Library
- Jacob, B. L. and Mudge, T. N. 2001. Uniprocessor virtual memory without TLBs. In IEEE Trans. Comput. 50, 5 (May), 482--499.]] Google Scholar
Digital Library
- Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing TLB power requirements. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED '97, Aug.).]] Google Scholar
Digital Library
- Kao, J. T. and Chandrakasan, A. P. 2000. Dual-threshold voltage techniques for low-power digital circuits. IEEE J. Solid-State Circ. 35, 7 (July), 1009--1018.]]Google Scholar
Cross Ref
- Kin, J., Gupta, M., and Smith, W. M. 1997. The Filter Cache: An energy efficient memory structure. In Proceedings of the 30th Annual Symposium on Microarchitecture (MICRO '97, Dec.). IEEE Press, Los Alamitos, CA.]] Google Scholar
Digital Library
- Kuroda, T. and Sakurai, T. 1996. Threshold-Voltage Control Schemes through Substrate-Bias for Low-Power High-Speed CMOS LSI Design. In J. VLSI Signal Process. Syst. 30, 2/3 (Aug.), 191--202.]] Google Scholar
Digital Library
- Kuroda, T., Suzuki, K., Mira, S., Fujita, T., Yamane, F., Sano, F., Akihiko, C., Watanabe, Y., Yoshinori, M., Matsuda, K., Maeda, T., Sakurai, T., and Tohru, F. 1998. Variable supply-voltage scheme for low-power high-speed CMOS digital design. IEEE J. Solid-State Circ. 33, 3 (March), 454--462.]]Google Scholar
Cross Ref
- Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the 30th Annual Symposium on Microarchitecture (MICRO '97). IEEE Press, Los Alamitos, CA.]] Google Scholar
Digital Library
- Ma, A., Zhang, M., and Asanovic, K. 2001. Way memoization to reduce fetch energy in instruction caches. In Workshop on Complexity Effective Design, 28th International Symposium on Computer Architecture (ISCA '01, July).]]Google Scholar
- Montanaro, J. et al. 1997. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. In Digital Tech. J. 9, 1, 49--62.]] Google Scholar
Digital Library
- Moritz, C. A., Frank, M., and Amarasinghe, S. 2001. FlexCache: A framework for compiler generated data caching. In Intelligent Memory Systems: Second International Workshop (IMS 2000), Cambridge, MA, November, 12, 2000, Revised Papers, F. T. Chong, C. E Kozyrakis, and M. Oskin, Eds. Lecture Notes in Computer Science, vol. 2107. Springer-Verlag, Heidelberg, Germany, 135--146.]] Google Scholar
Digital Library
- Moritz, C. A., Frank, M., Lee, W., and Amarasinghe, S. 1999. Hot Pages: Software caching for raw microprocessors. MIT-LCS Tech. Memo LCS-TM-599. MIT, Cambridge, MA.]]Google Scholar
- Mutoh, S., Douseki, T., Aoki, Y. M. T., Shingematsu, S., and Yamada, J. 1995. 1-V power supply high-speed digital circuit technology with multi-threshold CMOS technology. IEEE J. Solid-State Circ. 30, 8 (Aug.), 847--854.]]Google Scholar
Cross Ref
- Patterson, D. A. and Hennessy, J. L. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA.]] Google Scholar
Digital Library
- Powell, M., Yang, S., Falsafi, B., Roy, K., and Vijaykumar, T. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of ISLPED.]] Google Scholar
Digital Library
- Powell, M. D., Agarwal, A., Vijaykumar, T. N., Falsafi, B., and Roy, K. 2001. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]] Google Scholar
Digital Library
- Reinman, G. and Jouppi, N. 2000. An integrated cache timing and power model. Compaq WRL Res. rep. 2000/70 Compaq Computer Corporation Western Research Laboratory, Palo Alto, CA.]]Google Scholar
- Sair, S. and Charney, M. 2000. Memory behaviour of the SPEC2000 benchmark suite. IBM T. J. Watson Research Center technical report. IBM T. J. Watson Research Center, Yorktown Heights, NY.]]Google Scholar
- Scott, M. L., LeBlanc, T. J., and Marsh, B. D. 1988. Design rationale for Psyche, a general-purpose multiprocessor operating system. In Proceedings of the 1988 International Conference on Parallel Processing.]]Google Scholar
- Shigematsu, S. et al. 1997. A 1-V high-speed MTCMOS circuit scheme for power-down application circuits. IEEE J. Solid-State Circ. 32, 6 (June), 861--869.]]Google Scholar
Cross Ref
- Smith, A. J. 1982. Cache memories. Comput. Surv. 14, 3 (Sept.), 473--530.]] Google Scholar
Digital Library
- Unsal, O. S., Ashok, R., Koren, I., Krishna, C. M., and Moritz, C. A. 2001. Cool-cache for hot multimedia. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]] Google Scholar
Digital Library
- Unsal, O. S., Koren, I., Krishna, C. M., and Moritz, C. A. 2002. Cool-Fetch: Compiler-enabled power-aware fetch throttling. In IEEE Comput. Architect. Lett. 1.]]Google Scholar
Digital Library
- Villa, L., Zhang, M., and Asanovic, K. 2000. Dynamic zero compression for cache energy reduction. In International Symposium on Microarchitecture.]] Google Scholar
Digital Library
- Wang, W.-H., Baer, J.-L., and Levy, H. M. 1989. Organization and performance of a two-level virtual-real cache hierarchy. In Proceedings of the 16th International Symposium on Computer Architecture (ISCA '89, June).]] Google Scholar
Digital Library
- Wheeler, B. and Bershad, B. N. 1992. Consistency management for virtually indexed caches. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '92, Oct.).]] Google Scholar
Digital Library
- Witchel, E., Larsen, S., Ananian, C. S., and Asanovic, K. 2001. Direct addressed caches for reduced power consumption. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]] Google Scholar
Digital Library
- Wood, D. A., Eggers, S. J., Gibson, G., Hill, M. D., Pendleton, J. M., Ritchie, S. A., Taylor, G. S., Katz, R. H., and Patterson, D. A. 1986. An in-cache address translation mechanism. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA '86, Jan.).]] Google Scholar
Digital Library
- Zhang, M. and Asanovic, K. 2000. Highly-associative caches for low-power processors. In Kool Chips Workshop, 33rd Annual Symposium on Microarchitecture (MICRO '00, Dec.).]]Google Scholar
Index Terms
Coupling compiler-enabled and conventional memory accessing for energy efficiency
Recommendations
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh
Spin-Torque Transfer RAM (STT-RAM) is a promising candidate for SRAM replacement because of its excellent features, such as fast read access, high density, low leakage power, and CMOS technology compatibility. However, wide adoption of STT-RAM as cache ...
A Front-end Execution Architecture for High Energy Efficiency
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on MicroarchitectureSmart phones and tablets have recently become widespread and dominant in the computer market. Users require that these mobile devices provide a high-quality experience and an even higher performance. Hence, major developers adopt out-of-order ...
Energy-efficient register caching with compiler assistance
The register file is a critical component in a modern superscalar processor. It must be large enough to accommodate the results of all in-flight instructions. It must also have enough ports to allow simultaneous issue and writeback of many values each ...





Comments