Abstract
Increased access to space has led to an increase in the usage of commodity processors in radiation environments. These processors are vulnerable to transient faults such as single event upsets that may cause bit-flips in processor components. Caches in particular are vulnerable due to their relatively large area, yet are often omitted from fault injection testing because many processors do not provide direct access to cache contents and they are often not fully modeled by simulators. The performance benefits of caches make disabling them undesirable, and the presence of error correcting codes is insufficient to correct for increasingly common multiple bit upsets.
This work explores building a program’s cache profile by collecting cache usage information at an instruction granularity via commonly available on-chip debugging interfaces. The profile provides a tighter bound than cache utilization for cache vulnerability estimates (50% for several benchmarks). This can be applied to reduce the number of fault injections required to characterize behavior by at least two-thirds for the benchmarks we examine. The profile enables future work in hardware fault injection for caches that avoids the biases of existing techniques.
- Premkishore Shivakumar, Michael Kistler, Stephen W. Keckler, Doug Burger, and Lorenzo Alvisi. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In DSN.Google Scholar
- Whitney Q. Lohmeyer, Kerri Cahoy, and Shiyang Liu. 2013. Causal relationships between solar proton events and single event upsets for communication satellites. In AeroConf.Google Scholar
- Robert E. Lyons and Wouter Vanderkulk. 1962. The use of triple-modular redundancy to improve computer reliability. IBM Journal of Research and Development 6, 2 (1962), 200--209.Google Scholar
Digital Library
- Sammy Kayali, William McAlpine, Heidi Becker, and Leif Scheick. 2012. Juno radiation design and implementation. In AeroConf.Google Scholar
- Hank Heidt, Jordi Puig-Suari, Augustus Moore, Shinichi Nakasuka, and Robert Twiggs. 2000. CubeSat: A new generation of picosatellite for education and industry low-cost space experimentation. In SmallSat.Google Scholar
- M. A. Swartwout. CubeSat Database. Retrieved April 7, 2016 from https://sites.google.com/a/slu.edu/swartwout/home/cubesat-database.Google Scholar
- Rex Ridenoure, Riki Munakata, Alex Diaz, Stephanie Wong, Barbara Plante, Doug Stetson, Dave Spencer, and Justin Foley. 2015. LightSail program status: One down, one to go. In SmallSat.Google Scholar
- Alex Shye, Joseph Blomstedt, Tipp Moseley, Vijay Janapa Reddi, and Daniel A. Connors. 2009. PLR: A software approach to transient fault tolerance for multicore architectures. TDSC 6, 2 (2009), 135--148.Google Scholar
- Martin Hoffmann, Florian Lukas, Christian Dietrich, and Daniel Lohmann. 2015. dOSEK: The design and implementation of a dependability-oriented static embedded kernel. In RTAS.Google Scholar
- David M. Hiemstra and Allan Baril. 1999. Single event upset characterization of the Pentium (R) MMX and Pentium (R) II microprocessors using proton irradiation. TNS 46, 6 (1999), 1453--1460.Google Scholar
Cross Ref
- Farokh Irom. 2008. Guideline for Ground Radiation Testing of Microprocessors in the Space Radiation Environment. Technical Report. Pasadena, CA: JPL, NASA.Google Scholar
- Haissam Ziade, Rafic A. Ayoubi, Raoul Velazco, et al. 2004. A survey on fault injection techniques. Int. Arab J. Inf. Technol. 1, 2 (2004), 171--186.Google Scholar
- Hyungmin Cho, Shahrzad Mirkhani, Chen-Yong Cher, Jacob A. Abraham, and Subhasish Mitra. 2013. Quantitative evaluation of soft error injection techniques for robust system design. In DAC.Google Scholar
- Anna Thomas and Karthik Pattabiraman. 2013. LLFI: An intermediate code level fault injector for soft computing applications. In SELSE.Google Scholar
- Guanpeng Li, Karthik Pattabiraman, Siva Kumar Sastry Hari, Michael Sullivan, and Timothy Tsai. 2018. Modeling soft-error propagation in programs. In DSN.Google Scholar
- Behrooz Sangchoolie, Karthik Pattabiraman, and Johan Karlsson. 2017. One bit is (not) enough: An empirical study of the impact of single and multiple bit-flip errors. In DSN.Google Scholar
- Jiesheng Wei, Anna Thomas, Guanpeng Li, and Karthik Pattabiraman. 2014. Quantifying the accuracy of high-level fault injection techniques for hardware faults. In DSN.Google Scholar
- IEEE 1149.1 Working Group. IEEE Std. 1149.1 - Standard Test Access Port and Boundary-Scan Architecture. Retrieved March 9, 2017 from http://grouper.ieee.org/groups/1149/1/.Google Scholar
- G.-H. Asadi, V. S. Mehdi, B. Tahoori, and David Kaeli. 2005. Balancing performance and reliability in the memory hierarchy. In ISPASS.Google Scholar
- Andreas Heinig, Ingo Korb, Florian Schmoll, Peter Marwedel, and Michael Engel. 2013. Fast and low-cost instruction-aware fault injection. In GI-Jahrestagung.Google Scholar
- Nicholas Wulf, Grzegorz Cieslewski, Ann Gordon-Ross, and Alan D. George. 2011. SCIPS: An emulation methodology for fault injection in processor caches. In AeroConf.Google Scholar
- Edward Carlisle, Nicholas Wulf, James MacKinnon, and Alan George. 2016. DrSEUs: A dynamic robust single-event upset simulator. In AeroConf.Google Scholar
- Semeen Rehman, Muhammad Shafique, Florian Kriebel, and Jörg Henkel. 2011. Reliable software for unreliable hardware: Embedded code generation aiming at reliability. In CODES + ISSS.Google Scholar
- Raphael R. Some, Won S. Kim, Garen Khanoyan, Leslie Callum, Anil Agrawal, and John J. Beahan. 2001. A software-implemented fault injection methodology for design and validation of system fault tolerance. In DSN.Google Scholar
- Horst Schirmeier, Christoph Borchert, and Olaf Spinczyk. 2015. Avoiding pitfalls in fault-injection based comparison of program susceptibility to soft errors. In DSN.Google Scholar
- Edward Carlisle and Alan D. George. 2018. Cache fault injection with DrSEUs. In AeroConf.Google Scholar
- Anthony Gutierrez, Joseph Pusdesris, Ronald G. Dreslinski, Trevor Mudge, Chander Sudanthi, Christopher D. Emmons, Mitchell Hayenga, and Nigel Paver. 2014. Sources of error in full-system simulation. In ISPASS.Google Scholar
- Manolis Kaliorakis, Sotiris Tselonis, Athanasios Chatzidimitriou, Nikos Foutris, and Dimitris Gizopoulos. 2015. Differential fault injection on microarchitectural simulators. In IISWC.Google Scholar
- Tony Nowatzki, Jaikrishnan Menon, Chen-Han Ho, and Karthikeyan Sankaralingam. 2014. gem5, GPGPUsim, McPAT, GPUWattch, “your favorite simulator here” considered harmful. In 11th Annual Workshop on Duplicating, Deconstructing and Debunking.Google Scholar
- Hossein Asadi, Vilas Sridharan, Mehdi B. Tahoori, and David Kaeli. 2006. Vulnerability analysis of L2 cache elements to single event upsets. In DATE.Google Scholar
- Luis Entrena, Mario Garcia-Valderas, Raul Fernandez-Cardenal, Almudena Lindoso, Marta Portela, and Celia Lopez-Ongil. 2012. Soft error sensitivity evaluation of microprocessors by multilevel emulation-based fault injection. IEEE Trans. Comput. 61, 3 (2012), 313--322.Google Scholar
Digital Library
- Maurizio Rebaudengo and M. Sonza Reorda. 1999. Evaluating the fault tolerance capabilities of embedded systems via BDM. In VLSI Test Symposium.Google Scholar
- Marta Portela-Garcia, Celia Lopez-Ongil, Mario Garcia Valderas, and Luis Entrena. 2011. Fault injection in modern microprocessors using on-chip debugging infrastructures. TDSC 8, 2 (2011), 308--314.Google Scholar
- Nicholas Nethercote. 2004. Dynamic Binary Analysis and Instrumentation. Technical Report. University of Cambridge, Computer Laboratory.Google Scholar
- Hadi Brais and Preeti Ranjan Panda. 2019. Alleria: An advanced memory access profiling framework. TECS 18, 5s (2019), 1--22.Google Scholar
Digital Library
- Alan D. George and Christopher M. Wilson. 2018. Onboard processing with hybrid and reconfigurable computing on small satellites. Proc. IEEE 106, 3 (2018), 458--470.Google Scholar
Cross Ref
- Thiago Santini, Paolo Rech, Luigi Carro, and Flávio Rech Wagner. 2015. Exploiting cache conflicts to reduce radiation sensitivity of operating systems on embedded systems. In CASES.Google Scholar
- Lucas Antunes Tambara, Fernanda Lima Kastensmidt, Nilberto H. Medina, Nemitala Added, Vitor A. P. Aguiar, Fernando Aguirre, Eduardo L. A. Macchione, and Marcilei A. G. Silveira. 2015. Heavy ions induced single event upsets testing of the 28 nm Xilinx Zynq-7000 all programmable SoC. In REDW.Google Scholar
- Thiago Santini, Paolo Rech, Gabriel Nazar, Luigi Carro, and Flávio Rech Wagner. 2014. Reducing embedded software radiation-induced failures through cache memories. In ETS.Google Scholar
- Michael Wirthlin, David Lee, Gary Swift, and Heather Quinn. 2014. A method and case study on identifying physically adjacent multiple-cell upsets using 28-nm, interleaved and SECDED-protected arrays. TNS 61, 6 (2014), 3080--3087.Google Scholar
Cross Ref
- Alex Hands, Paul Morris, Keith Ryden, and Clive Dyer. 2012. Large-scale multiple cell upsets in 90 nm commercial SRAMs during neutron irradiation. TNS 59, 6 (2012), 2824--2830.Google Scholar
Cross Ref
- Eishi Ibe, Hitoshi Taniguchi, Yasuo Yahagi, Ken-ichi Shimbo, and Tadanobu Toba. 2010. Impact of scaling on neutron-induced soft error in SRAMs from a 250 nm to a 22 nm design rule. TED 57, 7 (2010), 1527--1538.Google Scholar
Cross Ref
- David S. Lee, Gary M. Swift, Michael J. Wirthlin, and Jeffrey Draper. 2015. Addressing angular single-event effects in the estimation of on-orbit error rates. TNS 62, 6 (2015), 2563--2569.Google Scholar
- Cornelius Dennehy, Kenneth Lebsock, and John West. 2007. GN&C engineering best practices for human-rated spacecraft systems. In AIAA Guidance, Navigation and Control Conference and Exhibit.Google Scholar
Cross Ref
- Dominic Rath. 2005. OpenOCD: Open On-Chip Debugging. (2005). Diploma Thesis. FH Augsburg.Google Scholar
- Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In WWC-4.Google Scholar
- Markus F. X. J. Oberhumer. LZO real-time data compression library. Retrieved March 12, 2018 from http://www.oberhumer.com/opensource/lzo/.Google Scholar
- Heather Quinn, William H. Robinson, Paolo Rech, Miguel Aguirre, Arno Barnard, Marco Desogus, Luis Entrena, Mario Garcia-Valderas, Steven M. Guertin, David Kaeli, et al. 2015. Using benchmarks for radiation testing of microprocessors and FPGAs. TNS 62, 6 (2015), 2547--2554.Google Scholar
Cross Ref
- Digilent Inc. ZYBO FPGA Board Reference Manual. Retrieved July 11, 2917 from https://reference.digilentinc.com/reference/programmable-logic/zybo/reference-manual.Google Scholar
- Xilinx 2016. Zynq-7000 All Programmable SoC Technical Reference Manual. Xilinx. v1.11.Google Scholar
- Christopher Wilson, Jacob Stewart, Patrick Gauvin, James MacKinnon, James Coole, Jonathan Urriste, Alan George, Gary Crum, Elizabeth Timmons, Jaclyn Beck, et al. 2015. CSP hybrid space computing for STP-H5/ISEM on ISS. In SmallSat.Google Scholar
- Régis Leveugle, A. Calvez, Paolo Maistri, and Pierre Vanhauwaert. 2009. Statistical fault injection: Quantified error and confidence. In DATE.Google Scholar
- Guanpeng Li, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W. Keckler. 2017. Understanding error propagation in deep learning neural network (DNN) accelerators and applications. In SC.Google Scholar
Index Terms
Precise Cache Profiling for Studying Radiation Effects
Recommendations
Performance Implications of Tolerating Cache Faults
The authors investigate how much cache miss ratios increase when blocks are disabled. It is shown how the mean miss ratio increase can be characterized as a function of the miss ratios of related caches, an efficient approach is developed for ...
Simulation based Performance Study of Cache Coherence Protocols
INIS '15: Proceedings of the 2015 IEEE International Symposium on Nanoelectronic and Information Systems (iNIS)Cache coherence protocol maintains data consistency between different cores / processors in a shared memory multi-core (MC) / multi-processor (MP) system. Coherency can be achieved at the cost of increased miss rate because of invalidations. Coherency ...
A Performance Study of Instruction Cache Prefetching Methods
Prefetching methods for instruction caches are studied via trace-driven simulation. The two primary methods are "fall-through" prefetch (sometimes referred to as "one block lookahead") and "target" prefetch. Fall-through prefetches are for sequential ...






Comments