Abstract
Caches are known to consume a large part of total microprocessor energy. Traditionally, voltage scaling has been used to reduce both dynamic and leakage power in caches. However, aggressive voltage reduction causes process-variation-induced failures in cache SRAM arrays, thus compromising cache reliability. We present MultiCopy Cache (MC2), a new cache architecture that achieves significant reduction in energy consumption through aggressive voltage scaling while maintaining high error resilience (reliability) by exploiting multiple copies of each data item in the cache. Unlike many previous approaches, MC2 does not require any error map characterization and therefore is responsive to changing operating conditions (e.g., Vdd noise, temperature, and leakage) of the cache. MC2 also incurs significantly lower overheads compared to other ECC-based caches. Our experimental results on embedded benchmarks demonstrate that MC2 achieves up to 60% reduction in energy and energy-delay product (EDP) with only 3.5% reduction in IPC and no appreciable area overhead.
- N. Aboughazaleh, A. Ferreira, C. Rusu, R. Xu, F. Liberato, et al. 2007. Integrated cpu and l2 cache voltage scaling using machine learning. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'07). 41--50. Google Scholar
Digital Library
- A. Agarwal, B. Paul, H. Mahmoodi, A. Datta, and K. Roy. 2005. A process-tolerant cache architecture for improved yield in nanoscale technologies. IEEE Trans. VLSI Syst. 13, 1, 27--38. Google Scholar
Digital Library
- ARM. 2010. ARM cortex-a8 technical reference manual. http://www.arm.com/products/CPUs/ARM_Cortex-A8.html.Google Scholar
- T. Austin, E. Larson, and D. Ernst. 2002. SimpleScalar: An infrastructure for computer system modeling. IEEE J. Comput. 35, 2, 59--67. Google Scholar
Digital Library
- F. Behmann. 2009. Embedded.com - The itrs process roadmap and nextgen embedded multicore soc design. http://www.embedded.com/design/mcus-processors-and-socs/4008253/The-ITRS-process-roadmap-and-nextgen-embedded-multicore-SoC-design.Google Scholar
- Y. Cai, M. T. Schmitz, A. Ejlali, B. M. Al-Hashimi, and S. M. Reddy. 2006. Cache size selection for performance, energy and reliability of time-constrained systems. In Proceedings of the Asia and South Pacific Conference on Design Automation (ASP-DAC'06). Google Scholar
Digital Library
- B. Calhoun and A. Chandrakasan. 2006. A 256kb sub-threshold sram in 65nm cmos. In IEEE International Solid State Circuits Conference Digest of Technical Papers (ISSCC'06). 2592--2601.Google Scholar
- V. Chandra and R. Aitken. 2009. Impact of voltage scaling on nanoscale sram reliability. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 387--392. Google Scholar
Digital Library
- L. Chang, D. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, et al. 2005. Stable sram cell design for the 32 nm node and beyond. In Proceedings of the Symposium on VLSI Technology Digest of Technical Papers. 128--129.Google Scholar
Cross Ref
- Q. Chen, H. Mahmoodi, S. Bhunia, and K. Roy. 2005. Modeling and testing of sram for new failure mechanisms due to process variations in nanoscale cmos. In Proceedings of the 23rd IEEE Symposium on VLSI Test (VTS'05). 292--297. Google Scholar
Digital Library
- Z. Chishti, A. Alameldeen, C. Wilkerson, W. Wu, and S.-L. Lu. 2009. Improving cache lifetime reliability at ultra-low voltages. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 89--99. Google Scholar
Digital Library
- A. Diril, Y. S. Dhillon, A. Chatterjee, and A. D. Singh. 2005. Level-shifter free design of low power dual supply voltage cmos circuits using dual threshold voltages. IEEE Trans. VLSI Syst. 13, 9, 1103--1107. Google Scholar
Digital Library
- A. K. Djahromi, A. M. Eltawil, F. J. Kurdahi, and R. Kanj. 2007. Cross layer error exploitation for aggressive voltage scaling. In Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07). 192--197. Google Scholar
Digital Library
- J. Fritts and W. Wolf. 2000. Multi-level cache hierarchy evaluation for programmable media processors. In Proceedings of the IEEE Workshop on Signal Processing Systems (SiPS'00). 228--237.Google Scholar
- J. Fritts, W. Wolf, and B. Liu. 1999. Understanding multimedia application characteristics for designing programmable media processors. In Proceedings of the SPIE Conference on Media Processors. Vol. 3655.Google Scholar
- P. Genua. 2004. A cache primer. http://www.csd.uwo.ca/∼moreno/CS433-CS9624/Resources/AN2663.pdf.Google Scholar
- M. Guthaus, J. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC'01). 3--14. Google Scholar
Digital Library
- M. Y. Hsiao. 1970. A class of optimal minimum odd-weight-column sec-ded codes. IBM J. Res. Develop. 14, 4, 395--401. Google Scholar
Digital Library
- M. Huang, J. Renau, S.-M. Yoo, and J. Torrellas. 2001. L1 data cache decomposition for energy efficiency. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'01). 10--15. Google Scholar
Digital Library
- ITRS. 2008. International technology roadmap for semiconductors. http://www.itrs.net/Links/2008ITRS/home 2008.htm.Google Scholar
- A. Khajeh, A. Gupta, N. Dutt, F. Kurdahi, A. Eltawil, K. Khouri, and M. Abadir. 2009. TRAM: A tool for temperature and reliability aware memory design. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 340--345. Google Scholar
Digital Library
- J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. Hoe. 2007. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 197--209. Google Scholar
Digital Library
- J. kulkarni, K. Kim, and K. Roy. 2007. A 160 mv robust schmitt trigger based subthreshold sram. IEEE J. Solid State Circ. 42, 10, 2303--2313.Google Scholar
Cross Ref
- S. Lin and D. J. Costello. 1983. Error Control Coding: Fundamentals and Applications. Prentice Hall.Google Scholar
- M. Makhzan, A. Khajeh, A. Eltawil, and F. Kurdahi. 2007. Limits on voltage scaling for caches utilizing fault tolerant techniques. In Proceedings of the International Conference on Computer Design (ICCD'07). 488--495.Google Scholar
- M. Mamidipaka and N. Dutt. 2004. eCACTI: An enhanced power estimation model for on-chip caches. Tech. rep. R-04-28, CECS, University of California, Irvine. http://ftp.cecs.uci.edu/technical_report/TR04-28.pdf.Google Scholar
- P. Mazumder. 1993. Design of a fault-tolerant three-dimensional dynamic random-access memory with on-chip error-correcting circuit. IEEE Trans. Comput. 42, 12, 1452--1468. Google Scholar
Digital Library
- M. Meterelliyoz, J. P. Kulkarni, and K. Roy. 2008. Thermal analysis of 8-t sram for nano-scaled technologies. In Proceedings of the 13th International Symposium on Low Power Electronics and Design (ISLPED'08). 123--128. Google Scholar
Digital Library
- S. Mukhopadhyay, H. Mahmoodi, and K. Roy. 2005. Modeling of failure probability and statistical design of sram array for yield enhancement in nanoscaled cmos. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 24, 12, 1859--1880. Google Scholar
Digital Library
- R. Naseer and J. Draper. 2008. Parallel double error correcting code design to mitigate multi-bit upsets in srams. In Proceedings of the 34th European Solid State Circuits Conference (ESSCIRC'08). 222--225.Google Scholar
- PTM. 2010. Predictive technology model (ptm). http://ptm.asu.edu.Google Scholar
- S. Roy. 2009. H-Nmru: A low area, high performance cache replacement policy for embedded processors. In Proceedings of the 22nd International Conference on VLSI Design. 553--558. Google Scholar
Digital Library
- A. Sasan, H. Homayoun, A. Eltawil, and F. Kurdahi. 2009a. A fault tolerant cache architecture for sub 500mv operation: Resizable data composer cache (rdc-cache). In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'09). 251--260. Google Scholar
Digital Library
- A. Sasan, H. Homayoun, A. Eltawil, and F. Kurdahi. 2009b. Process variation aware sram/cache for aggressive voltage-frequency scaling. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 911--916. Google Scholar
Digital Library
- S. Schuster. 1978. Multiple word/bit line redundancy for semiconductor memories. IEEE J. Solid State Circ. 13, 5, 698--703.Google Scholar
Cross Ref
- P. Shirvani and E. McCluskey. 1999. PADded cache: A new fault-tolerance technique for cache memories. In Proceedings of the 17th IEEE VLSI Test Symposium (VTS'99). 440. Google Scholar
Digital Library
- G. Sohi. 1989. Cache memory organization to enhance the yield of high performance vlsi processors. IEEE Trans. Comput. 38, 4, 484--492. Google Scholar
Digital Library
- D. Tarjan, S. Thoziyoor, and N. P. Jouppi. 2006. CACTI 4.0. Tech. rep. 2006-86, HP Laboratories. http://www.hpl.hp.com/techreports/2006/HPL-2006-86.pdf.Google Scholar
- C. Wilkerson, H. Gao, A. R. Alameldeen, Z. Chishti, M. Khellah, and S.-L. Lu. 2008. Trading off cache capacity for reliability to enable low voltage operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 203--214. Google Scholar
Digital Library
- W. Wong, C.-K. Koh, Y. Chen, and H. Li. 2007. VOSCH: Voltage scaled cache hierarchies. In Proceedings of the 25th Conference on Computer Design (ICCD'07). 496--503.Google Scholar
- C. Zhang, F. Vahid, and W. Najjar. 2005. A highly configurable cache for low energy embedded systems. ACM Trans. Embed. Comput. Syst. 4, 2, 363--387. Google Scholar
Digital Library
- W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam. 2003. ICR: In-cache replication for enhancing data cache reliability. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'03). 293--300.Google Scholar
- W. Zhao and Y. Cao. 2007. Predictive technology model for nano-cmos design exploration. ACM J. Emerg. Technol. Comput. Syst. 3, 1. Google Scholar
Digital Library
Index Terms
Multicopy Cache: A Highly Energy-Efficient Cache Architecture
Recommendations
Design and analysis of low-power cache using two-level filter scheme
Power consumption is an increasingly pressing problem in modern processor design. Since the on-chip caches usually consume a significant amount of power, it is one of the most attractive targets for power reduction. This paper presents a two-level ...
A separated bit-line unified cache: conciliating small on-chip cache die-area and low miss ratio
This paper describes an on-chip cache, called a separated bit-line unified cache, which minimizes the chip-area cost in high-performance microprocessors. This unified cache has two ports; one for the instruction bus and the other for the data bus. A ...
Tag Skipping Technique Using WTS Buffer for Optimal Low Power Cache Design
MTDT '04: Proceedings of the Records of the 2004 International Workshop on Memory Technology, Design and TestingIn this paper we present a robust technique to reduce the power consumption for a 4-way set-associativity cache. Our algorithm is a modification of the technique proposed by Choi et al, [5] which allows skipping tag look-ups to achieve a better power ...






Comments