skip to main content
research-article
Public Access

Protecting Caches from Soft Errors: A Microarchitect’s Perspective

Published:11 May 2017Publication History
Skip Abstract Section

Abstract

Soft error is one of the most important design concerns in modern embedded systems with aggressive technology scaling. Among various microarchitectural components in a processor, cache is the most susceptible component to soft errors. Error detection and correction codes are common protection techniques for cache memory due to their design simplicity. In order to design effective protection techniques for caches, it is important to quantitatively estimate the susceptibility of caches without and even with protections. At the architectural level, vulnerability is the metric to quantify the susceptibility of data in caches. However, existing tools and techniques calculate the vulnerability of data in caches through coarse-grained block-level estimation. Further, they ignore common cache protection techniques such as error detection and correction codes. In this article, we demonstrate that our word-level vulnerability estimation is accurate through intensive fault injection campaigns as compared to block-level one. Further, our extensive experiments over benchmark suites reveal several counter-intuitive and interesting results. Parity checking when performed over just reads provides reliable and power-efficient protection than that when performed over both reads and writes. On the other hand, checking error correcting codes only at reads alone can be vulnerable even for single-bit soft errors, while that at both reads and writes provides the perfect reliability.

References

  1. ARM. 2007. ARM1156T2-S Technical Manual. (2007). http://infocenter.arm.com/help/topic/com.arm.doc.ddi0338g/index.html.Google ScholarGoogle Scholar
  2. ARM. 2010. ARM Cortex-R4 and Cortex-R4F Technical Reference Manual. (2010). http://infocenter.arm.com/help/topic/com.arm.doc.ddi0363e/index.html.Google ScholarGoogle Scholar
  3. ARM. 2014. Cortex-A8 Technical Reference Manual. (2014). http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344h/index.html.Google ScholarGoogle Scholar
  4. G.-H. Asadi, V. S. Mehdi, B. Tahoori, and D. Kaeli. 2005. Balancing performance and reliability in the memory hierarchy. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’05). IEEE Computer Society, Washington, D.C., 269--279. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, and others. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Michael Demshki and Robert Shiveley. 2010. Advanced reliability for Intel Xeon processor-based servers. Intel Corporation.Google ScholarGoogle Scholar
  7. A. Dixit and A. Wood. 2011. The impact of new technology on soft error rates. In IEEE International Reliability Physics Symposium. 5B.4.1--5B.4.7. Google ScholarGoogle ScholarCross RefCross Ref
  8. L. Entrena, M. Garcia-Valderas, R. Fernandez-Cardenal, A. Lindoso, M. Portela, and C. Lopez-Ongil. 2012. Soft error sensitivity evaluation of microprocessors by multilevel emulation-based fault injection. IEEE Trans. Comput. 61, 3 (March 2012), 313--322. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Ronaldo R. Ferreira, Gabriel L. Nazar, Jean Da Rolt, Álvaro F. Moreira, and Luigi Carro. 2016. Live-out register fencing: Interrupt-triggered soft error correction based on the elimination of register-to-register communication. ACM Transactions on Embedded Computing Systems 15, 3, Article 60 (May 2016), 25 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization (WWC-4). IEEE Computer Society, 3--14. Google ScholarGoogle ScholarCross RefCross Ref
  11. John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Charlie X. Huang, Bill Zhang, An-Chang Deng, and Burkhard Swirski. 1995. The design and implementation of PowerMill. In International Symposium on Low Power Design (ISLPED’95). ACM, 105--110.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Imagination. 2012. interAptiv Multiprocessing System Datasheet. (2012).Google ScholarGoogle Scholar
  14. R. Jeyapaul and A. Shrivastava. 2011. Smart cache cleaning: Energy efficient vulnerability reduction in embedded processors. In International Conference on Compilers, Architectures and Synthesis for Embedded Systems. 105--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Yohan Ko, Reiley Jeyapaul, Youngbin Kim, Kyoungwoo Lee, and Aviral Shrivastava. 2015. Guidelines to design parity protected write-back L1 data cache. In Design Automation Conference (DAC’15). ACM, Article 24, 6 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yohan Ko, Jihoon Kang, Jongwon Lee, Yongjoo Kim, Joonhyun Kim, Hwisoo So, Kyoungwoo Lee, and Yunheung Paek. 2016. Software-based selective validation techniques for robust CGRAs against soft errors. ACM Transactions on Embedded Computing Systems 15, 1, Article 20 (Jan. 2016), 26 pages.Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. PaKJW Kudva, J. Kellington, P. Sanda, Ryan McBeth, John Schumann, and Ron Kalla. 2007. Fault injection verification of IBM POWER6 soft error resilience. In Architectural Support for Gigascale Integration Workshop. Citeseer.Google ScholarGoogle Scholar
  18. Kyoungwoo Lee, Aviral Shrivastava, Ilya Issenin, Nikil Dutt, and Nalini Venkatasubramanian. 2006. Mitigating soft error failures for multimedia applications by selective data protection. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, 411--420. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Lin Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. 2004. Soft error and energy consumption interactions: A data cache perspective. In International Symposium on Low Power Electronics and Design. 132--137. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Mehrtash Manoochehri, Murali Annavaram, and Michel Dubois. 2011. CPPC: Correctable parity protected cache. In International Symposium on Computer Architecture (ISCA’11). ACM, New York, NY, 223--234. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Frank H. McMahon. 1986. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report. Lawrence Livermore National Lab., CA.Google ScholarGoogle Scholar
  22. C. McNairy and D. Soltis. 2003. Itanium 2 processor microarchitecture. Micro, IEEE 23, 2 (2003), 44--55. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Subhasish Mitra, Norbert Seifert, Ming Zhang, Quan Shi, and Kee Sup Kim. 2005. Robust system design with built-in soft-error resilience. Computer 38, 2 (2005), 43--52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Sparsh Mittal and Jeffrey S. Vetter. 2016. Reducing soft-error vulnerability of caches using data compression. In Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, 197--202.Google ScholarGoogle Scholar
  25. Shubhendu S. Mukherjee, Christopher Weaver, Joel Emer, Steven K. Reinhardt, and Todd Austin. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In IEEE/ACM International Symposium on Microarchitecture. 29--40. Google ScholarGoogle ScholarCross RefCross Ref
  26. R. Naseer, Y. Boulghassoul, J. Draper, S. DasGupta, and A. Witulski. 2007. Critical charge characterization for soft error rate modeling in 90nm SRAM. In IEEE International Symposium on Circuits and Systems. 1879--1882. Google ScholarGoogle ScholarCross RefCross Ref
  27. Richard Phelan. 2003. Addressing soft errors in ARM core-based designs. White Paper, ARM Ltd. (Dec. 2003).Google ScholarGoogle Scholar
  28. N. N. Sadler and D. J. Sorin. 2006. Choosing an error protection scheme for a microprocessor’s L1 data cache. In International Conference on Computer Design. 499--505. Google ScholarGoogle ScholarCross RefCross Ref
  29. Freescale Semiconductor Application Note. 2007. Error Correction and Error Handling on PowerQUICC III Processors. (2007). http://application-notes.digchip.com/314/314-66495.pdf.Google ScholarGoogle Scholar
  30. S. Z. Shazli, M. Abdul-Aziz, M. B. Tahoori, and D. R. Kaeli. 2008. A field analysis of system-level effects of soft errors occurring in microprocessors used in information systems. In IEEE International Test Conference. 1--10. Google ScholarGoogle ScholarCross RefCross Ref
  31. C. Slayman. 2010. Alpha particle or neutron SER-What will dominate in future IC technology. (2010).Google ScholarGoogle Scholar
  32. Texas Instruments. 2011. AM3359 Sitara Processor. (2011). http://www.ti.com/lit/ds/symlink/am3351.pdf.Google ScholarGoogle Scholar
  33. Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. HP Laboratories, April 2 (2008).Google ScholarGoogle Scholar
  34. Nicholas J. Wang and Sanjay J. Patel. 2006. ReStore: Symptom-based soft error detection in microprocessors. Dependable and Secure Computing, IEEE Trans on 3, 3 (2006), 188--201.Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Wei Zhang. 2005a. Computing cache vulnerability to transient errors and its implication. In 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05). 427--435.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Wei Zhang. 2005b. Computing cache vulnerability to transient errors and its implication. In IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems. 427--435. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Protecting Caches from Soft Errors: A Microarchitect’s Perspective

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!