skip to main content
article

Compiler-managed partitioned data caches for low power

Published:13 June 2007Publication History
Skip Abstract Section

Abstract

Set-associative caches are traditionally managed using hardware-based lookup and replacement schemes that have high energy overheads. Ideally, the caching strategy should be tailored to the application's memory needs, thus enabling optimal use of this on-chip storage to maximize performance while minimizing power consumption. However, doing this in hardware alone is difficult due to hardware complexity, high power dissipation, overheads of dynamic discovery of application characteristics, and increased likelihood of making locally optimal decisions. The compiler can instead determine the caching strategy by analyzing the application code and providing hints to the hardware. We propose a hardware/software co-managed partitioned cache architecture in which enhanced load/store instructions are used to control fine-grain data placement within a set of cache partitions. In comparison to traditional partitioning techniques, load and store instructions can individually specify the set of partitions for lookup and replacement. This fine grain control can avoid conflicts, thus providing the performance benefits of highly associative caches, while saving energy by eliminating redundant tag and data array accesses. Using four direct-mapped partitions, we eliminated 25% of the tag checks and recorded an average 15% reduction in the energy-delay product compared to a hardware-managed 4-way set-associative cache.

References

  1. ALBONESI, D. Selective Cache Ways: On Demand Cache Resource Allocation. In Proc. of the 32nd Annual International Symposium on Microarchitecture (Nov. 1999), pp. 248--259. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. AMD. Am41PDS3228D SRAM, 2004.Google ScholarGoogle Scholar
  3. BALASUBRAMONIAN, R., ET AL. Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures. In Proc. of the 33rd Annual International Symposium on Microarchitecture (Dec. 2000), pp. 245--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. BANAKAR, R., ET AL. Scratchpad Memory: A Design Alternative for Cache On-Chip Memory in Embedded Systems. In Proc. of the Tenth International Conference on Hardware/Software Codesign (May 2002), pp. 26--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. BREHOB, M., AND ENBODY, R. An Analytical Model of Locality and Caching. Tech. Rep. MSU-CSE-99-31, Michigan State University, Sept. 1999.Google ScholarGoogle Scholar
  6. BROOKS, D., TIWARI, V., AND MARTONOSI, M. A framework for architectural-level power analysis and optimizations. In Proc. of the 27th Annual International Symposium on Computer Architecture (June 2000), pp. 83--94. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. CALDER, B., GRUNWALD, D., AND EMER, J. Predictive Sequential Associative Caches. In Proc. of the 2nd International Symposium on High-Performance Computer Architecture (Feb. 1996), p. 244. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. CHILIMBI, T.M., HILL, M.D., AND LARUS, J. R. Cache-conscious structure layout. In Proc. of the SIGPLAN '99 Conference on Programming Language Design and Implementation (May 1999), pp. 1--12. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. CHIOU, D., JAIN, P., RUDOLPH, L., AND DEVDAS, S. Application Specific Memory Management in Embedded Systems Using Software-Controlled Caches. In Proc. of the 37th Design Automation Conference (2000), pp. 416--419. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. GHOSE, K., AND KAMBLE, M. B. Reducing Power in Superscalar Processor Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation. In Proc. of the 1999 International Symposium on Low Power Electronics and Design (Aug. 1999), pp. 70--75. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. GONZALEZ, A., ALIAGAS, C., AND VALERO, M. A data cache with multiple caching strategies tuned to different types of locality. In Proc. of the 1995 International Conference on Supercomputing (July 1995), pp. 338--347. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. HUANG, M., ET AL. L1 Data Cache Decomposition for Energy Efficiency. In Proc. of the 2001 International Symposium on Low Power Electronics and Design (Aug. 2001), pp. 10--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. INOUE, K., ISHIHARA, T., AND MURAKAMI, K. Way Predicting Set-Associative Caches for High Performance and Low Energy. In Proc. of the 1999 International Symposium on Low Power Electronics and Design (Aug. 1999), pp. 273--275. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. IRWIN, J., ET AL. Predictable Instruction Caching for Media Processors. In IEEE 13th International Conference on Application-specific Systems, Architectures and Processors (July 2002), pp. 141--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. KANDEMIR, M., ET AL. A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 23, 2 (Feb. 2004), 243--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. KIM, N., ET AL. Circuit and Microarchitectural Techniques for Reducing Cache Leakage Power. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 12, 2 (Feb. 2004), 167--184. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. LARSEN, S., AND AMARASINGHE, S. Increasing and detecting memory address congruence. In Proc. of the 11th International Conference on Parallel Architectures and Compilation Techniques (Sept. 2002), pp. 18--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. LEE, H.-H.S., AND TYSON, G. S. Region-based Caching: an Energy Efficient Memory Architecture for Embedded Processors. In Proc. of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (Nov. 2000), pp. 120--127. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. MATSSON, R.L., GECSEI, J., SLUTZ, D., AND TRAIGER, I. Evaluation techniques for storage hierarchies. IBM Systems Journal 9, 2 (1970), 78--117.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. MONTANARO, J., ET AL. A 160 MHz, 32-b, 0.5-W CMOS RISC Microprocessor. Digital Technical Journal 9, 1 (1997), 49--62. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. MORITZ, C., FRANK, M., AND AMARASINGHE, S. Flexcache: A framework for flexible compiler generated data caching. In Proc. of the Workshop on Intelligent Memory Systems (Nov. 2000), pp. 8--14.Google ScholarGoogle Scholar
  22. M. POWELL, S.YANG, B.FALSAFI, K.ROY, AND T.N. VIJAYKUMAR. Gated-VDD: A circuit technique to reduce leakage in deep-submicron cache memories. In Proc. of the 2000 International Symposium on Low Power Electronics and Design (July 2000), pp. 90--95. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. NYSTROM, E., KIM, H.-S., AND HWU, W. Bottom-up and top-down context-sensitive summary-based pointer analysis. In Proc. of the 11th Static Analysis Symposium (Aug. 2004), pp. 165--180.Google ScholarGoogle ScholarCross RefCross Ref
  24. OZTURK, O., AND KANDEMIR, M. Nonuniform Banking for Reducing Memory Energy Consumption. In Proc. of the 2005 Design, Automation and Test in Europe (Mar. 2005), pp. 814--819. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. PETROV, P., AND ORAILOGLU, A. Performance and Power Effectiveness in Embedded Processors - Customizable Partitioned Caches. IEEE Trans. Comput.-Aided Design Integrated Circuits 20, 11 (Nov. 2001), 1309--1318. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. PRVULOVIC, M., ET AL. The Split Spatial/Non-Spatial Cache: A Performance and Complexity Evaluation. In Proc. of the IEEE TCCA Newsletter (July 1999), pp. 3--10.Google ScholarGoogle Scholar
  27. REINMAN, G., AND JOUPPI, N. P. Cacti 2.0: An integrated cache timing and power model. Tech. Rep. WRL-2000-7, Hewlett-Packard Laboratories, Feb. 2000.Google ScholarGoogle Scholar
  28. RIVERA, G., AND TSENG, C. -W. Data transformations for eliminating conflict misses. In Proc. of the SIGPLAN '98 Conference on Programming Language Design and Implementation (June 1998), pp. 38--49. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. RIVERS, J.A., AND DAVI DS ON, E. S. Reducing Conflicts in Direct-Mapped Caches with a Temporality-Based Design. In Proc. of the 1996 International Conference on Parallel Processing (Aug. 1996), pp. 154--163.Google ScholarGoogle ScholarCross RefCross Ref
  30. SANCHEZ, J., AND GONZALEZ, A. A Locality Sensitive Multi-Module Cache with Explicit Management. In Proc. of the 1999 International Conference on Supercomputing (June 1999), pp. 51--59. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. SARTOR, J., VENKITESWARAN, S., MCKINLEY, K., AND WANG, Z. Cooperative Caching with Keep-Me and Evict-Me. In Proc. of the 9 th Annual Workshop on Interaction between Compilers and Computer (Feb. 2005), pp. 46--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. SHERWOOD, T., CALDER, B., AND EMER, J. Reducing cache misses using hardware and software page placement. In Proc. of the 1999 International Conference on Supercomputing (June 1999), pp. 155--164. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. SHRIVASTAVA, A., ISSENIN, I., AND DUTT, N. Compilation techniques for energy reduction in horizontally partitioned cache architectures. In Proc. of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (Sept. 2005), pp. 90--96. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. SUH, G., RUDOLPH, L., AND DEVDAS, S. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing 28, 1(Apr. 2004), 7--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. TEMAM, O. An algorithm for optimally exploiting spatial and temporal locality in upper memory level. IEEE Transactions on Computers 48, 2 (1999), 150--158. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. TRIMARAN. An infrastructure for research in ILP, 2000. http://www.trimaran.org/.Google ScholarGoogle Scholar
  37. TYSON, G., FARRENS, M., MATTHEWS, J., AND PLESZKUN, A.R. A Modified Approach to Data Cache Management. In Proc. of the 28th Annual International Symposium on Microarchitecture (Dec. 1995), pp. 93--103. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. UNSAL, O.S., ET AL. The Minimax Cache: An Energy-Efficient Framework for Media Processors. In Proc. of the 8th International Symposium on High-Performance Computer Architecture (Feb. 2002), pp. 131--141. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. WITCHEL, E., LARSEN, S., ANANIAN, S., AND ASANOVIC, K. Direct Addressed Caches for Reduced Power Consumption. In Proc. of the 34th Annual International Symposium on Microarchitecture (Dec. 2001), pp. 124--133. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. YANG, S.-H., FALSAFI, B., POWELL, M.D., AND VIJAYKUMAR, T. Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay. In Proc. of the 8th International Symposium on High-Performance Computer Architecture (Feb. 2002), pp. 151--161. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. ZHANG, C. Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches. In Proc. of the 33rd Annual International Symposium on Computer Architecture (June 2006), pp. 155--166. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. ZHANG, C., ZHANG, X., AND YAN, Y. Multi-column implementations for cache associativity. In Proc. of the 1997 International Conference on Computer Design (1997), pp. 504--509. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Compiler-managed partitioned data caches for low power

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!