Abstract
Set-associative caches are traditionally managed using hardware-based lookup and replacement schemes that have high energy overheads. Ideally, the caching strategy should be tailored to the application's memory needs, thus enabling optimal use of this on-chip storage to maximize performance while minimizing power consumption. However, doing this in hardware alone is difficult due to hardware complexity, high power dissipation, overheads of dynamic discovery of application characteristics, and increased likelihood of making locally optimal decisions. The compiler can instead determine the caching strategy by analyzing the application code and providing hints to the hardware. We propose a hardware/software co-managed partitioned cache architecture in which enhanced load/store instructions are used to control fine-grain data placement within a set of cache partitions. In comparison to traditional partitioning techniques, load and store instructions can individually specify the set of partitions for lookup and replacement. This fine grain control can avoid conflicts, thus providing the performance benefits of highly associative caches, while saving energy by eliminating redundant tag and data array accesses. Using four direct-mapped partitions, we eliminated 25% of the tag checks and recorded an average 15% reduction in the energy-delay product compared to a hardware-managed 4-way set-associative cache.
- ALBONESI, D. Selective Cache Ways: On Demand Cache Resource Allocation. In Proc. of the 32nd Annual International Symposium on Microarchitecture (Nov. 1999), pp. 248--259. Google Scholar
Digital Library
- AMD. Am41PDS3228D SRAM, 2004.Google Scholar
- BALASUBRAMONIAN, R., ET AL. Memory Hierarchy Reconfiguration for Energy and Performance in General-Purpose Processor Architectures. In Proc. of the 33rd Annual International Symposium on Microarchitecture (Dec. 2000), pp. 245--257. Google Scholar
Digital Library
- BANAKAR, R., ET AL. Scratchpad Memory: A Design Alternative for Cache On-Chip Memory in Embedded Systems. In Proc. of the Tenth International Conference on Hardware/Software Codesign (May 2002), pp. 26--32. Google Scholar
Digital Library
- BREHOB, M., AND ENBODY, R. An Analytical Model of Locality and Caching. Tech. Rep. MSU-CSE-99-31, Michigan State University, Sept. 1999.Google Scholar
- BROOKS, D., TIWARI, V., AND MARTONOSI, M. A framework for architectural-level power analysis and optimizations. In Proc. of the 27th Annual International Symposium on Computer Architecture (June 2000), pp. 83--94. Google Scholar
Digital Library
- CALDER, B., GRUNWALD, D., AND EMER, J. Predictive Sequential Associative Caches. In Proc. of the 2nd International Symposium on High-Performance Computer Architecture (Feb. 1996), p. 244. Google Scholar
Digital Library
- CHILIMBI, T.M., HILL, M.D., AND LARUS, J. R. Cache-conscious structure layout. In Proc. of the SIGPLAN '99 Conference on Programming Language Design and Implementation (May 1999), pp. 1--12. Google Scholar
Digital Library
- CHIOU, D., JAIN, P., RUDOLPH, L., AND DEVDAS, S. Application Specific Memory Management in Embedded Systems Using Software-Controlled Caches. In Proc. of the 37th Design Automation Conference (2000), pp. 416--419. Google Scholar
Digital Library
- GHOSE, K., AND KAMBLE, M. B. Reducing Power in Superscalar Processor Caches Using Subbanking, Multiple Line Buffers and Bit-Line Segmentation. In Proc. of the 1999 International Symposium on Low Power Electronics and Design (Aug. 1999), pp. 70--75. Google Scholar
Digital Library
- GONZALEZ, A., ALIAGAS, C., AND VALERO, M. A data cache with multiple caching strategies tuned to different types of locality. In Proc. of the 1995 International Conference on Supercomputing (July 1995), pp. 338--347. Google Scholar
Digital Library
- HUANG, M., ET AL. L1 Data Cache Decomposition for Energy Efficiency. In Proc. of the 2001 International Symposium on Low Power Electronics and Design (Aug. 2001), pp. 10--15. Google Scholar
Digital Library
- INOUE, K., ISHIHARA, T., AND MURAKAMI, K. Way Predicting Set-Associative Caches for High Performance and Low Energy. In Proc. of the 1999 International Symposium on Low Power Electronics and Design (Aug. 1999), pp. 273--275. Google Scholar
Digital Library
- IRWIN, J., ET AL. Predictable Instruction Caching for Media Processors. In IEEE 13th International Conference on Application-specific Systems, Architectures and Processors (July 2002), pp. 141--150. Google Scholar
Digital Library
- KANDEMIR, M., ET AL. A compiler-based approach for dynamically managing scratch-pad memories in embedded systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 23, 2 (Feb. 2004), 243--260. Google Scholar
Digital Library
- KIM, N., ET AL. Circuit and Microarchitectural Techniques for Reducing Cache Leakage Power. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 12, 2 (Feb. 2004), 167--184. Google Scholar
Digital Library
- LARSEN, S., AND AMARASINGHE, S. Increasing and detecting memory address congruence. In Proc. of the 11th International Conference on Parallel Architectures and Compilation Techniques (Sept. 2002), pp. 18--29. Google Scholar
Digital Library
- LEE, H.-H.S., AND TYSON, G. S. Region-based Caching: an Energy Efficient Memory Architecture for Embedded Processors. In Proc. of the 2000 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (Nov. 2000), pp. 120--127. Google Scholar
Digital Library
- MATSSON, R.L., GECSEI, J., SLUTZ, D., AND TRAIGER, I. Evaluation techniques for storage hierarchies. IBM Systems Journal 9, 2 (1970), 78--117.Google Scholar
Digital Library
- MONTANARO, J., ET AL. A 160 MHz, 32-b, 0.5-W CMOS RISC Microprocessor. Digital Technical Journal 9, 1 (1997), 49--62. Google Scholar
Digital Library
- MORITZ, C., FRANK, M., AND AMARASINGHE, S. Flexcache: A framework for flexible compiler generated data caching. In Proc. of the Workshop on Intelligent Memory Systems (Nov. 2000), pp. 8--14.Google Scholar
- M. POWELL, S.YANG, B.FALSAFI, K.ROY, AND T.N. VIJAYKUMAR. Gated-VDD: A circuit technique to reduce leakage in deep-submicron cache memories. In Proc. of the 2000 International Symposium on Low Power Electronics and Design (July 2000), pp. 90--95. Google Scholar
Digital Library
- NYSTROM, E., KIM, H.-S., AND HWU, W. Bottom-up and top-down context-sensitive summary-based pointer analysis. In Proc. of the 11th Static Analysis Symposium (Aug. 2004), pp. 165--180.Google Scholar
Cross Ref
- OZTURK, O., AND KANDEMIR, M. Nonuniform Banking for Reducing Memory Energy Consumption. In Proc. of the 2005 Design, Automation and Test in Europe (Mar. 2005), pp. 814--819. Google Scholar
Digital Library
- PETROV, P., AND ORAILOGLU, A. Performance and Power Effectiveness in Embedded Processors - Customizable Partitioned Caches. IEEE Trans. Comput.-Aided Design Integrated Circuits 20, 11 (Nov. 2001), 1309--1318. Google Scholar
Digital Library
- PRVULOVIC, M., ET AL. The Split Spatial/Non-Spatial Cache: A Performance and Complexity Evaluation. In Proc. of the IEEE TCCA Newsletter (July 1999), pp. 3--10.Google Scholar
- REINMAN, G., AND JOUPPI, N. P. Cacti 2.0: An integrated cache timing and power model. Tech. Rep. WRL-2000-7, Hewlett-Packard Laboratories, Feb. 2000.Google Scholar
- RIVERA, G., AND TSENG, C. -W. Data transformations for eliminating conflict misses. In Proc. of the SIGPLAN '98 Conference on Programming Language Design and Implementation (June 1998), pp. 38--49. Google Scholar
Digital Library
- RIVERS, J.A., AND DAVI DS ON, E. S. Reducing Conflicts in Direct-Mapped Caches with a Temporality-Based Design. In Proc. of the 1996 International Conference on Parallel Processing (Aug. 1996), pp. 154--163.Google Scholar
Cross Ref
- SANCHEZ, J., AND GONZALEZ, A. A Locality Sensitive Multi-Module Cache with Explicit Management. In Proc. of the 1999 International Conference on Supercomputing (June 1999), pp. 51--59. Google Scholar
Digital Library
- SARTOR, J., VENKITESWARAN, S., MCKINLEY, K., AND WANG, Z. Cooperative Caching with Keep-Me and Evict-Me. In Proc. of the 9 th Annual Workshop on Interaction between Compilers and Computer (Feb. 2005), pp. 46--57. Google Scholar
Digital Library
- SHERWOOD, T., CALDER, B., AND EMER, J. Reducing cache misses using hardware and software page placement. In Proc. of the 1999 International Conference on Supercomputing (June 1999), pp. 155--164. Google Scholar
Digital Library
- SHRIVASTAVA, A., ISSENIN, I., AND DUTT, N. Compilation techniques for energy reduction in horizontally partitioned cache architectures. In Proc. of the 2005 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (Sept. 2005), pp. 90--96. Google Scholar
Digital Library
- SUH, G., RUDOLPH, L., AND DEVDAS, S. Dynamic Partitioning of Shared Cache Memory. Journal of Supercomputing 28, 1(Apr. 2004), 7--26. Google Scholar
Digital Library
- TEMAM, O. An algorithm for optimally exploiting spatial and temporal locality in upper memory level. IEEE Transactions on Computers 48, 2 (1999), 150--158. Google Scholar
Digital Library
- TRIMARAN. An infrastructure for research in ILP, 2000. http://www.trimaran.org/.Google Scholar
- TYSON, G., FARRENS, M., MATTHEWS, J., AND PLESZKUN, A.R. A Modified Approach to Data Cache Management. In Proc. of the 28th Annual International Symposium on Microarchitecture (Dec. 1995), pp. 93--103. Google Scholar
Digital Library
- UNSAL, O.S., ET AL. The Minimax Cache: An Energy-Efficient Framework for Media Processors. In Proc. of the 8th International Symposium on High-Performance Computer Architecture (Feb. 2002), pp. 131--141. Google Scholar
Digital Library
- WITCHEL, E., LARSEN, S., ANANIAN, S., AND ASANOVIC, K. Direct Addressed Caches for Reduced Power Consumption. In Proc. of the 34th Annual International Symposium on Microarchitecture (Dec. 2001), pp. 124--133. Google Scholar
Digital Library
- YANG, S.-H., FALSAFI, B., POWELL, M.D., AND VIJAYKUMAR, T. Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay. In Proc. of the 8th International Symposium on High-Performance Computer Architecture (Feb. 2002), pp. 151--161. Google Scholar
Digital Library
- ZHANG, C. Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches. In Proc. of the 33rd Annual International Symposium on Computer Architecture (June 2006), pp. 155--166. Google Scholar
Digital Library
- ZHANG, C., ZHANG, X., AND YAN, Y. Multi-column implementations for cache associativity. In Proc. of the 1997 International Conference on Computer Design (1997), pp. 504--509. Google Scholar
Digital Library
Index Terms
Compiler-managed partitioned data caches for low power
Recommendations
Compiler-managed partitioned data caches for low power
LCTES '07: Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systemsSet-associative caches are traditionally managed using hardware-based lookup and replacement schemes that have high energy overheads. Ideally, the caching strategy should be tailored to the application's memory needs, thus enabling optimal use of this ...
A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and designChip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
An Energy-Efficient Partitioned Instruction Cache Architecture for Embedded Processors
Energy efficiency of cache memories is crucial in designing embedded processors. Reducing energy consumption in the instruction cache is especially important, since the instruction cache consumes a significant portion of total processor energy. This ...







Comments