Abstract
Soft error is one of the most important design concerns in modern embedded systems with aggressive technology scaling. Among various microarchitectural components in a processor, cache is the most susceptible component to soft errors. Error detection and correction codes are common protection techniques for cache memory due to their design simplicity. In order to design effective protection techniques for caches, it is important to quantitatively estimate the susceptibility of caches without and even with protections. At the architectural level, vulnerability is the metric to quantify the susceptibility of data in caches. However, existing tools and techniques calculate the vulnerability of data in caches through coarse-grained block-level estimation. Further, they ignore common cache protection techniques such as error detection and correction codes. In this article, we demonstrate that our word-level vulnerability estimation is accurate through intensive fault injection campaigns as compared to block-level one. Further, our extensive experiments over benchmark suites reveal several counter-intuitive and interesting results. Parity checking when performed over just reads provides reliable and power-efficient protection than that when performed over both reads and writes. On the other hand, checking error correcting codes only at reads alone can be vulnerable even for single-bit soft errors, while that at both reads and writes provides the perfect reliability.
- ARM. 2007. ARM1156T2-S Technical Manual. (2007). http://infocenter.arm.com/help/topic/com.arm.doc.ddi0338g/index.html.Google Scholar
- ARM. 2010. ARM Cortex-R4 and Cortex-R4F Technical Reference Manual. (2010). http://infocenter.arm.com/help/topic/com.arm.doc.ddi0363e/index.html.Google Scholar
- ARM. 2014. Cortex-A8 Technical Reference Manual. (2014). http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344h/index.html.Google Scholar
- G.-H. Asadi, V. S. Mehdi, B. Tahoori, and D. Kaeli. 2005. Balancing performance and reliability in the memory hierarchy. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS’05). IEEE Computer Society, Washington, D.C., 269--279. Google Scholar
Digital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, and others. 2011. The gem5 simulator. ACM SIGARCH Computer Architecture News 39, 2 (2011), 1--7. Google Scholar
Digital Library
- Michael Demshki and Robert Shiveley. 2010. Advanced reliability for Intel Xeon processor-based servers. Intel Corporation.Google Scholar
- A. Dixit and A. Wood. 2011. The impact of new technology on soft error rates. In IEEE International Reliability Physics Symposium. 5B.4.1--5B.4.7. Google Scholar
Cross Ref
- L. Entrena, M. Garcia-Valderas, R. Fernandez-Cardenal, A. Lindoso, M. Portela, and C. Lopez-Ongil. 2012. Soft error sensitivity evaluation of microprocessors by multilevel emulation-based fault injection. IEEE Trans. Comput. 61, 3 (March 2012), 313--322. Google Scholar
Digital Library
- Ronaldo R. Ferreira, Gabriel L. Nazar, Jean Da Rolt, Álvaro F. Moreira, and Luigi Carro. 2016. Live-out register fencing: Interrupt-triggered soft error correction based on the elimination of register-to-register communication. ACM Transactions on Embedded Computing Systems 15, 3, Article 60 (May 2016), 25 pages.Google Scholar
Digital Library
- M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In Proceedings of the International Workshop on Workload Characterization (WWC-4). IEEE Computer Society, 3--14. Google Scholar
Cross Ref
- John L. Henning. 2006. SPEC CPU2006 benchmark descriptions. SIGARCH Comput. Archit. News 34, 4 (Sept. 2006), 1--17. Google Scholar
Digital Library
- Charlie X. Huang, Bill Zhang, An-Chang Deng, and Burkhard Swirski. 1995. The design and implementation of PowerMill. In International Symposium on Low Power Design (ISLPED’95). ACM, 105--110.Google Scholar
Digital Library
- Imagination. 2012. interAptiv Multiprocessing System Datasheet. (2012).Google Scholar
- R. Jeyapaul and A. Shrivastava. 2011. Smart cache cleaning: Energy efficient vulnerability reduction in embedded processors. In International Conference on Compilers, Architectures and Synthesis for Embedded Systems. 105--114. Google Scholar
Digital Library
- Yohan Ko, Reiley Jeyapaul, Youngbin Kim, Kyoungwoo Lee, and Aviral Shrivastava. 2015. Guidelines to design parity protected write-back L1 data cache. In Design Automation Conference (DAC’15). ACM, Article 24, 6 pages.Google Scholar
Digital Library
- Yohan Ko, Jihoon Kang, Jongwon Lee, Yongjoo Kim, Joonhyun Kim, Hwisoo So, Kyoungwoo Lee, and Yunheung Paek. 2016. Software-based selective validation techniques for robust CGRAs against soft errors. ACM Transactions on Embedded Computing Systems 15, 1, Article 20 (Jan. 2016), 26 pages.Google Scholar
Digital Library
- PaKJW Kudva, J. Kellington, P. Sanda, Ryan McBeth, John Schumann, and Ron Kalla. 2007. Fault injection verification of IBM POWER6 soft error resilience. In Architectural Support for Gigascale Integration Workshop. Citeseer.Google Scholar
- Kyoungwoo Lee, Aviral Shrivastava, Ilya Issenin, Nikil Dutt, and Nalini Venkatasubramanian. 2006. Mitigating soft error failures for multimedia applications by selective data protection. In International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES’06). ACM, 411--420. Google Scholar
Digital Library
- Lin Li, V. Degalahal, N. Vijaykrishnan, M. Kandemir, and M. J. Irwin. 2004. Soft error and energy consumption interactions: A data cache perspective. In International Symposium on Low Power Electronics and Design. 132--137. Google Scholar
Digital Library
- Mehrtash Manoochehri, Murali Annavaram, and Michel Dubois. 2011. CPPC: Correctable parity protected cache. In International Symposium on Computer Architecture (ISCA’11). ACM, New York, NY, 223--234. Google Scholar
Digital Library
- Frank H. McMahon. 1986. The Livermore Fortran Kernels: A computer test of the numerical performance range. Technical Report. Lawrence Livermore National Lab., CA.Google Scholar
- C. McNairy and D. Soltis. 2003. Itanium 2 processor microarchitecture. Micro, IEEE 23, 2 (2003), 44--55. Google Scholar
Digital Library
- Subhasish Mitra, Norbert Seifert, Ming Zhang, Quan Shi, and Kee Sup Kim. 2005. Robust system design with built-in soft-error resilience. Computer 38, 2 (2005), 43--52. Google Scholar
Digital Library
- Sparsh Mittal and Jeffrey S. Vetter. 2016. Reducing soft-error vulnerability of caches using data compression. In Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, 197--202.Google Scholar
- Shubhendu S. Mukherjee, Christopher Weaver, Joel Emer, Steven K. Reinhardt, and Todd Austin. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In IEEE/ACM International Symposium on Microarchitecture. 29--40. Google Scholar
Cross Ref
- R. Naseer, Y. Boulghassoul, J. Draper, S. DasGupta, and A. Witulski. 2007. Critical charge characterization for soft error rate modeling in 90nm SRAM. In IEEE International Symposium on Circuits and Systems. 1879--1882. Google Scholar
Cross Ref
- Richard Phelan. 2003. Addressing soft errors in ARM core-based designs. White Paper, ARM Ltd. (Dec. 2003).Google Scholar
- N. N. Sadler and D. J. Sorin. 2006. Choosing an error protection scheme for a microprocessor’s L1 data cache. In International Conference on Computer Design. 499--505. Google Scholar
Cross Ref
- Freescale Semiconductor Application Note. 2007. Error Correction and Error Handling on PowerQUICC III Processors. (2007). http://application-notes.digchip.com/314/314-66495.pdf.Google Scholar
- S. Z. Shazli, M. Abdul-Aziz, M. B. Tahoori, and D. R. Kaeli. 2008. A field analysis of system-level effects of soft errors occurring in microprocessors used in information systems. In IEEE International Test Conference. 1--10. Google Scholar
Cross Ref
- C. Slayman. 2010. Alpha particle or neutron SER-What will dominate in future IC technology. (2010).Google Scholar
- Texas Instruments. 2011. AM3359 Sitara Processor. (2011). http://www.ti.com/lit/ds/symlink/am3351.pdf.Google Scholar
- Shyamkumar Thoziyoor, Naveen Muralimanohar, Jung Ho Ahn, and Norman P. Jouppi. 2008. CACTI 5.1. HP Laboratories, April 2 (2008).Google Scholar
- Nicholas J. Wang and Sanjay J. Patel. 2006. ReStore: Symptom-based soft error detection in microprocessors. Dependable and Secure Computing, IEEE Trans on 3, 3 (2006), 188--201.Google Scholar
Digital Library
- Wei Zhang. 2005a. Computing cache vulnerability to transient errors and its implication. In 20th IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems (DFT’05). 427--435.Google Scholar
Digital Library
- Wei Zhang. 2005b. Computing cache vulnerability to transient errors and its implication. In IEEE International Symposium on Defect and Fault Tolerance in VLSI Systems. 427--435. Google Scholar
Digital Library
Index Terms
Protecting Caches from Soft Errors: A Microarchitect’s Perspective
Recommendations
Modeling soft errors for data caches and alleviating their effects on data reliability
Soft errors caused by strikes arising from energetic particles pose a significant reliability concern for computing systems. In this study, we first introduce a model for soft error occurrence and propagation in cache memories. Based on this model, we ...
Soft error benchmarking of L2 caches with PARMA
SIGMETRICS '11: Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systemsThe amount of charge stored in an SRAM cell shrinks rapidly with each technology generation thus increasingly exposing caches to soft errors. Benchmarking the FIT rate of caches due to soft errors is critical to evaluate the relative merits of a ...
Soft error benchmarking of L2 caches with PARMA
Performance evaluation reviewThe amount of charge stored in an SRAM cell shrinks rapidly with each technology generation thus increasingly exposing caches to soft errors. Benchmarking the FIT rate of caches due to soft errors is critical to evaluate the relative merits of a ...






Comments