skip to main content
research-article

Multicopy Cache: A Highly Energy-Efficient Cache Architecture

Published:23 July 2014Publication History
Skip Abstract Section

Abstract

Caches are known to consume a large part of total microprocessor energy. Traditionally, voltage scaling has been used to reduce both dynamic and leakage power in caches. However, aggressive voltage reduction causes process-variation-induced failures in cache SRAM arrays, thus compromising cache reliability. We present MultiCopy Cache (MC2), a new cache architecture that achieves significant reduction in energy consumption through aggressive voltage scaling while maintaining high error resilience (reliability) by exploiting multiple copies of each data item in the cache. Unlike many previous approaches, MC2 does not require any error map characterization and therefore is responsive to changing operating conditions (e.g., Vdd noise, temperature, and leakage) of the cache. MC2 also incurs significantly lower overheads compared to other ECC-based caches. Our experimental results on embedded benchmarks demonstrate that MC2 achieves up to 60% reduction in energy and energy-delay product (EDP) with only 3.5% reduction in IPC and no appreciable area overhead.

References

  1. N. Aboughazaleh, A. Ferreira, C. Rusu, R. Xu, F. Liberato, et al. 2007. Integrated cpu and l2 cache voltage scaling using machine learning. In Proceedings of the ACM SIGPLAN/SIGBED Conference on Languages, Compilers, and Tools for Embedded Systems (LCTES'07). 41--50. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. A. Agarwal, B. Paul, H. Mahmoodi, A. Datta, and K. Roy. 2005. A process-tolerant cache architecture for improved yield in nanoscale technologies. IEEE Trans. VLSI Syst. 13, 1, 27--38. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. ARM. 2010. ARM cortex-a8 technical reference manual. http://www.arm.com/products/CPUs/ARM_Cortex-A8.html.Google ScholarGoogle Scholar
  4. T. Austin, E. Larson, and D. Ernst. 2002. SimpleScalar: An infrastructure for computer system modeling. IEEE J. Comput. 35, 2, 59--67. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Behmann. 2009. Embedded.com - The itrs process roadmap and nextgen embedded multicore soc design. http://www.embedded.com/design/mcus-processors-and-socs/4008253/The-ITRS-process-roadmap-and-nextgen-embedded-multicore-SoC-design.Google ScholarGoogle Scholar
  6. Y. Cai, M. T. Schmitz, A. Ejlali, B. M. Al-Hashimi, and S. M. Reddy. 2006. Cache size selection for performance, energy and reliability of time-constrained systems. In Proceedings of the Asia and South Pacific Conference on Design Automation (ASP-DAC'06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. B. Calhoun and A. Chandrakasan. 2006. A 256kb sub-threshold sram in 65nm cmos. In IEEE International Solid State Circuits Conference Digest of Technical Papers (ISSCC'06). 2592--2601.Google ScholarGoogle Scholar
  8. V. Chandra and R. Aitken. 2009. Impact of voltage scaling on nanoscale sram reliability. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 387--392. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. L. Chang, D. Fried, J. Hergenrother, J. W. Sleight, R. H. Dennard, et al. 2005. Stable sram cell design for the 32 nm node and beyond. In Proceedings of the Symposium on VLSI Technology Digest of Technical Papers. 128--129.Google ScholarGoogle ScholarCross RefCross Ref
  10. Q. Chen, H. Mahmoodi, S. Bhunia, and K. Roy. 2005. Modeling and testing of sram for new failure mechanisms due to process variations in nanoscale cmos. In Proceedings of the 23rd IEEE Symposium on VLSI Test (VTS'05). 292--297. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Z. Chishti, A. Alameldeen, C. Wilkerson, W. Wu, and S.-L. Lu. 2009. Improving cache lifetime reliability at ultra-low voltages. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'09). 89--99. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. A. Diril, Y. S. Dhillon, A. Chatterjee, and A. D. Singh. 2005. Level-shifter free design of low power dual supply voltage cmos circuits using dual threshold voltages. IEEE Trans. VLSI Syst. 13, 9, 1103--1107. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. K. Djahromi, A. M. Eltawil, F. J. Kurdahi, and R. Kanj. 2007. Cross layer error exploitation for aggressive voltage scaling. In Proceedings of the 8th International Symposium on Quality Electronic Design (ISQED'07). 192--197. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Fritts and W. Wolf. 2000. Multi-level cache hierarchy evaluation for programmable media processors. In Proceedings of the IEEE Workshop on Signal Processing Systems (SiPS'00). 228--237.Google ScholarGoogle Scholar
  15. J. Fritts, W. Wolf, and B. Liu. 1999. Understanding multimedia application characteristics for designing programmable media processors. In Proceedings of the SPIE Conference on Media Processors. Vol. 3655.Google ScholarGoogle Scholar
  16. P. Genua. 2004. A cache primer. http://www.csd.uwo.ca/∼moreno/CS433-CS9624/Resources/AN2663.pdf.Google ScholarGoogle Scholar
  17. M. Guthaus, J. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the IEEE International Workshop on Workload Characterization (WWC'01). 3--14. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. M. Y. Hsiao. 1970. A class of optimal minimum odd-weight-column sec-ded codes. IBM J. Res. Develop. 14, 4, 395--401. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. M. Huang, J. Renau, S.-M. Yoo, and J. Torrellas. 2001. L1 data cache decomposition for energy efficiency. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED'01). 10--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. ITRS. 2008. International technology roadmap for semiconductors. http://www.itrs.net/Links/2008ITRS/home 2008.htm.Google ScholarGoogle Scholar
  21. A. Khajeh, A. Gupta, N. Dutt, F. Kurdahi, A. Eltawil, K. Khouri, and M. Abadir. 2009. TRAM: A tool for temperature and reliability aware memory design. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 340--345. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. J. Kim, N. Hardavellas, K. Mai, B. Falsafi, and J. Hoe. 2007. Multi-bit error tolerant caches using two-dimensional error coding. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'07). 197--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. J. kulkarni, K. Kim, and K. Roy. 2007. A 160 mv robust schmitt trigger based subthreshold sram. IEEE J. Solid State Circ. 42, 10, 2303--2313.Google ScholarGoogle ScholarCross RefCross Ref
  24. S. Lin and D. J. Costello. 1983. Error Control Coding: Fundamentals and Applications. Prentice Hall.Google ScholarGoogle Scholar
  25. M. Makhzan, A. Khajeh, A. Eltawil, and F. Kurdahi. 2007. Limits on voltage scaling for caches utilizing fault tolerant techniques. In Proceedings of the International Conference on Computer Design (ICCD'07). 488--495.Google ScholarGoogle Scholar
  26. M. Mamidipaka and N. Dutt. 2004. eCACTI: An enhanced power estimation model for on-chip caches. Tech. rep. R-04-28, CECS, University of California, Irvine. http://ftp.cecs.uci.edu/technical_report/TR04-28.pdf.Google ScholarGoogle Scholar
  27. P. Mazumder. 1993. Design of a fault-tolerant three-dimensional dynamic random-access memory with on-chip error-correcting circuit. IEEE Trans. Comput. 42, 12, 1452--1468. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. M. Meterelliyoz, J. P. Kulkarni, and K. Roy. 2008. Thermal analysis of 8-t sram for nano-scaled technologies. In Proceedings of the 13th International Symposium on Low Power Electronics and Design (ISLPED'08). 123--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Mukhopadhyay, H. Mahmoodi, and K. Roy. 2005. Modeling of failure probability and statistical design of sram array for yield enhancement in nanoscaled cmos. IEEE Trans. Comput.-Aided Des. Integr. Circ. Syst. 24, 12, 1859--1880. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. R. Naseer and J. Draper. 2008. Parallel double error correcting code design to mitigate multi-bit upsets in srams. In Proceedings of the 34th European Solid State Circuits Conference (ESSCIRC'08). 222--225.Google ScholarGoogle Scholar
  31. PTM. 2010. Predictive technology model (ptm). http://ptm.asu.edu.Google ScholarGoogle Scholar
  32. S. Roy. 2009. H-Nmru: A low area, high performance cache replacement policy for embedded processors. In Proceedings of the 22nd International Conference on VLSI Design. 553--558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. A. Sasan, H. Homayoun, A. Eltawil, and F. Kurdahi. 2009a. A fault tolerant cache architecture for sub 500mv operation: Resizable data composer cache (rdc-cache). In Proceedings of the International Conference on Compilers, Architecture, and Synthesis for Embedded Systems (CASES'09). 251--260. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. A. Sasan, H. Homayoun, A. Eltawil, and F. Kurdahi. 2009b. Process variation aware sram/cache for aggressive voltage-frequency scaling. In Proceedings of the Design, Automation, and Test in Europe Conference (DATE'09). 911--916. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Schuster. 1978. Multiple word/bit line redundancy for semiconductor memories. IEEE J. Solid State Circ. 13, 5, 698--703.Google ScholarGoogle ScholarCross RefCross Ref
  36. P. Shirvani and E. McCluskey. 1999. PADded cache: A new fault-tolerance technique for cache memories. In Proceedings of the 17th IEEE VLSI Test Symposium (VTS'99). 440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Sohi. 1989. Cache memory organization to enhance the yield of high performance vlsi processors. IEEE Trans. Comput. 38, 4, 484--492. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. D. Tarjan, S. Thoziyoor, and N. P. Jouppi. 2006. CACTI 4.0. Tech. rep. 2006-86, HP Laboratories. http://www.hpl.hp.com/techreports/2006/HPL-2006-86.pdf.Google ScholarGoogle Scholar
  39. C. Wilkerson, H. Gao, A. R. Alameldeen, Z. Chishti, M. Khellah, and S.-L. Lu. 2008. Trading off cache capacity for reliability to enable low voltage operation. In Proceedings of the 35th Annual International Symposium on Computer Architecture (ISCA'08). 203--214. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. W. Wong, C.-K. Koh, Y. Chen, and H. Li. 2007. VOSCH: Voltage scaled cache hierarchies. In Proceedings of the 25th Conference on Computer Design (ICCD'07). 496--503.Google ScholarGoogle Scholar
  41. C. Zhang, F. Vahid, and W. Najjar. 2005. A highly configurable cache for low energy embedded systems. ACM Trans. Embed. Comput. Syst. 4, 2, 363--387. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. W. Zhang, S. Gurumurthi, M. Kandemir, and A. Sivasubramaniam. 2003. ICR: In-cache replication for enhancing data cache reliability. In Proceedings of the International Conference on Dependable Systems and Networks (DSN'03). 293--300.Google ScholarGoogle Scholar
  43. W. Zhao and Y. Cao. 2007. Predictive technology model for nano-cmos design exploration. ACM J. Emerg. Technol. Comput. Syst. 3, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Multicopy Cache: A Highly Energy-Efficient Cache Architecture

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!