skip to main content
research-article

Precise Cache Profiling for Studying Radiation Effects

Published:27 March 2021Publication History
Skip Abstract Section

Abstract

Increased access to space has led to an increase in the usage of commodity processors in radiation environments. These processors are vulnerable to transient faults such as single event upsets that may cause bit-flips in processor components. Caches in particular are vulnerable due to their relatively large area, yet are often omitted from fault injection testing because many processors do not provide direct access to cache contents and they are often not fully modeled by simulators. The performance benefits of caches make disabling them undesirable, and the presence of error correcting codes is insufficient to correct for increasingly common multiple bit upsets.

This work explores building a program’s cache profile by collecting cache usage information at an instruction granularity via commonly available on-chip debugging interfaces. The profile provides a tighter bound than cache utilization for cache vulnerability estimates (50% for several benchmarks). This can be applied to reduce the number of fault injections required to characterize behavior by at least two-thirds for the benchmarks we examine. The profile enables future work in hardware fault injection for caches that avoids the biases of existing techniques.

References

  1. Premkishore Shivakumar, Michael Kistler, Stephen W. Keckler, Doug Burger, and Lorenzo Alvisi. 2002. Modeling the effect of technology trends on the soft error rate of combinational logic. In DSN.Google ScholarGoogle Scholar
  2. Whitney Q. Lohmeyer, Kerri Cahoy, and Shiyang Liu. 2013. Causal relationships between solar proton events and single event upsets for communication satellites. In AeroConf.Google ScholarGoogle Scholar
  3. Robert E. Lyons and Wouter Vanderkulk. 1962. The use of triple-modular redundancy to improve computer reliability. IBM Journal of Research and Development 6, 2 (1962), 200--209.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Sammy Kayali, William McAlpine, Heidi Becker, and Leif Scheick. 2012. Juno radiation design and implementation. In AeroConf.Google ScholarGoogle Scholar
  5. Hank Heidt, Jordi Puig-Suari, Augustus Moore, Shinichi Nakasuka, and Robert Twiggs. 2000. CubeSat: A new generation of picosatellite for education and industry low-cost space experimentation. In SmallSat.Google ScholarGoogle Scholar
  6. M. A. Swartwout. CubeSat Database. Retrieved April 7, 2016 from https://sites.google.com/a/slu.edu/swartwout/home/cubesat-database.Google ScholarGoogle Scholar
  7. Rex Ridenoure, Riki Munakata, Alex Diaz, Stephanie Wong, Barbara Plante, Doug Stetson, Dave Spencer, and Justin Foley. 2015. LightSail program status: One down, one to go. In SmallSat.Google ScholarGoogle Scholar
  8. Alex Shye, Joseph Blomstedt, Tipp Moseley, Vijay Janapa Reddi, and Daniel A. Connors. 2009. PLR: A software approach to transient fault tolerance for multicore architectures. TDSC 6, 2 (2009), 135--148.Google ScholarGoogle Scholar
  9. Martin Hoffmann, Florian Lukas, Christian Dietrich, and Daniel Lohmann. 2015. dOSEK: The design and implementation of a dependability-oriented static embedded kernel. In RTAS.Google ScholarGoogle Scholar
  10. David M. Hiemstra and Allan Baril. 1999. Single event upset characterization of the Pentium (R) MMX and Pentium (R) II microprocessors using proton irradiation. TNS 46, 6 (1999), 1453--1460.Google ScholarGoogle ScholarCross RefCross Ref
  11. Farokh Irom. 2008. Guideline for Ground Radiation Testing of Microprocessors in the Space Radiation Environment. Technical Report. Pasadena, CA: JPL, NASA.Google ScholarGoogle Scholar
  12. Haissam Ziade, Rafic A. Ayoubi, Raoul Velazco, et al. 2004. A survey on fault injection techniques. Int. Arab J. Inf. Technol. 1, 2 (2004), 171--186.Google ScholarGoogle Scholar
  13. Hyungmin Cho, Shahrzad Mirkhani, Chen-Yong Cher, Jacob A. Abraham, and Subhasish Mitra. 2013. Quantitative evaluation of soft error injection techniques for robust system design. In DAC.Google ScholarGoogle Scholar
  14. Anna Thomas and Karthik Pattabiraman. 2013. LLFI: An intermediate code level fault injector for soft computing applications. In SELSE.Google ScholarGoogle Scholar
  15. Guanpeng Li, Karthik Pattabiraman, Siva Kumar Sastry Hari, Michael Sullivan, and Timothy Tsai. 2018. Modeling soft-error propagation in programs. In DSN.Google ScholarGoogle Scholar
  16. Behrooz Sangchoolie, Karthik Pattabiraman, and Johan Karlsson. 2017. One bit is (not) enough: An empirical study of the impact of single and multiple bit-flip errors. In DSN.Google ScholarGoogle Scholar
  17. Jiesheng Wei, Anna Thomas, Guanpeng Li, and Karthik Pattabiraman. 2014. Quantifying the accuracy of high-level fault injection techniques for hardware faults. In DSN.Google ScholarGoogle Scholar
  18. IEEE 1149.1 Working Group. IEEE Std. 1149.1 - Standard Test Access Port and Boundary-Scan Architecture. Retrieved March 9, 2017 from http://grouper.ieee.org/groups/1149/1/.Google ScholarGoogle Scholar
  19. G.-H. Asadi, V. S. Mehdi, B. Tahoori, and David Kaeli. 2005. Balancing performance and reliability in the memory hierarchy. In ISPASS.Google ScholarGoogle Scholar
  20. Andreas Heinig, Ingo Korb, Florian Schmoll, Peter Marwedel, and Michael Engel. 2013. Fast and low-cost instruction-aware fault injection. In GI-Jahrestagung.Google ScholarGoogle Scholar
  21. Nicholas Wulf, Grzegorz Cieslewski, Ann Gordon-Ross, and Alan D. George. 2011. SCIPS: An emulation methodology for fault injection in processor caches. In AeroConf.Google ScholarGoogle Scholar
  22. Edward Carlisle, Nicholas Wulf, James MacKinnon, and Alan George. 2016. DrSEUs: A dynamic robust single-event upset simulator. In AeroConf.Google ScholarGoogle Scholar
  23. Semeen Rehman, Muhammad Shafique, Florian Kriebel, and Jörg Henkel. 2011. Reliable software for unreliable hardware: Embedded code generation aiming at reliability. In CODES + ISSS.Google ScholarGoogle Scholar
  24. Raphael R. Some, Won S. Kim, Garen Khanoyan, Leslie Callum, Anil Agrawal, and John J. Beahan. 2001. A software-implemented fault injection methodology for design and validation of system fault tolerance. In DSN.Google ScholarGoogle Scholar
  25. Horst Schirmeier, Christoph Borchert, and Olaf Spinczyk. 2015. Avoiding pitfalls in fault-injection based comparison of program susceptibility to soft errors. In DSN.Google ScholarGoogle Scholar
  26. Edward Carlisle and Alan D. George. 2018. Cache fault injection with DrSEUs. In AeroConf.Google ScholarGoogle Scholar
  27. Anthony Gutierrez, Joseph Pusdesris, Ronald G. Dreslinski, Trevor Mudge, Chander Sudanthi, Christopher D. Emmons, Mitchell Hayenga, and Nigel Paver. 2014. Sources of error in full-system simulation. In ISPASS.Google ScholarGoogle Scholar
  28. Manolis Kaliorakis, Sotiris Tselonis, Athanasios Chatzidimitriou, Nikos Foutris, and Dimitris Gizopoulos. 2015. Differential fault injection on microarchitectural simulators. In IISWC.Google ScholarGoogle Scholar
  29. Tony Nowatzki, Jaikrishnan Menon, Chen-Han Ho, and Karthikeyan Sankaralingam. 2014. gem5, GPGPUsim, McPAT, GPUWattch, “your favorite simulator here” considered harmful. In 11th Annual Workshop on Duplicating, Deconstructing and Debunking.Google ScholarGoogle Scholar
  30. Hossein Asadi, Vilas Sridharan, Mehdi B. Tahoori, and David Kaeli. 2006. Vulnerability analysis of L2 cache elements to single event upsets. In DATE.Google ScholarGoogle Scholar
  31. Luis Entrena, Mario Garcia-Valderas, Raul Fernandez-Cardenal, Almudena Lindoso, Marta Portela, and Celia Lopez-Ongil. 2012. Soft error sensitivity evaluation of microprocessors by multilevel emulation-based fault injection. IEEE Trans. Comput. 61, 3 (2012), 313--322.Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. Maurizio Rebaudengo and M. Sonza Reorda. 1999. Evaluating the fault tolerance capabilities of embedded systems via BDM. In VLSI Test Symposium.Google ScholarGoogle Scholar
  33. Marta Portela-Garcia, Celia Lopez-Ongil, Mario Garcia Valderas, and Luis Entrena. 2011. Fault injection in modern microprocessors using on-chip debugging infrastructures. TDSC 8, 2 (2011), 308--314.Google ScholarGoogle Scholar
  34. Nicholas Nethercote. 2004. Dynamic Binary Analysis and Instrumentation. Technical Report. University of Cambridge, Computer Laboratory.Google ScholarGoogle Scholar
  35. Hadi Brais and Preeti Ranjan Panda. 2019. Alleria: An advanced memory access profiling framework. TECS 18, 5s (2019), 1--22.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Alan D. George and Christopher M. Wilson. 2018. Onboard processing with hybrid and reconfigurable computing on small satellites. Proc. IEEE 106, 3 (2018), 458--470.Google ScholarGoogle ScholarCross RefCross Ref
  37. Thiago Santini, Paolo Rech, Luigi Carro, and Flávio Rech Wagner. 2015. Exploiting cache conflicts to reduce radiation sensitivity of operating systems on embedded systems. In CASES.Google ScholarGoogle Scholar
  38. Lucas Antunes Tambara, Fernanda Lima Kastensmidt, Nilberto H. Medina, Nemitala Added, Vitor A. P. Aguiar, Fernando Aguirre, Eduardo L. A. Macchione, and Marcilei A. G. Silveira. 2015. Heavy ions induced single event upsets testing of the 28 nm Xilinx Zynq-7000 all programmable SoC. In REDW.Google ScholarGoogle Scholar
  39. Thiago Santini, Paolo Rech, Gabriel Nazar, Luigi Carro, and Flávio Rech Wagner. 2014. Reducing embedded software radiation-induced failures through cache memories. In ETS.Google ScholarGoogle Scholar
  40. Michael Wirthlin, David Lee, Gary Swift, and Heather Quinn. 2014. A method and case study on identifying physically adjacent multiple-cell upsets using 28-nm, interleaved and SECDED-protected arrays. TNS 61, 6 (2014), 3080--3087.Google ScholarGoogle ScholarCross RefCross Ref
  41. Alex Hands, Paul Morris, Keith Ryden, and Clive Dyer. 2012. Large-scale multiple cell upsets in 90 nm commercial SRAMs during neutron irradiation. TNS 59, 6 (2012), 2824--2830.Google ScholarGoogle ScholarCross RefCross Ref
  42. Eishi Ibe, Hitoshi Taniguchi, Yasuo Yahagi, Ken-ichi Shimbo, and Tadanobu Toba. 2010. Impact of scaling on neutron-induced soft error in SRAMs from a 250 nm to a 22 nm design rule. TED 57, 7 (2010), 1527--1538.Google ScholarGoogle ScholarCross RefCross Ref
  43. David S. Lee, Gary M. Swift, Michael J. Wirthlin, and Jeffrey Draper. 2015. Addressing angular single-event effects in the estimation of on-orbit error rates. TNS 62, 6 (2015), 2563--2569.Google ScholarGoogle Scholar
  44. Cornelius Dennehy, Kenneth Lebsock, and John West. 2007. GN&C engineering best practices for human-rated spacecraft systems. In AIAA Guidance, Navigation and Control Conference and Exhibit.Google ScholarGoogle ScholarCross RefCross Ref
  45. Dominic Rath. 2005. OpenOCD: Open On-Chip Debugging. (2005). Diploma Thesis. FH Augsburg.Google ScholarGoogle Scholar
  46. Matthew R. Guthaus, Jeffrey S. Ringenberg, Dan Ernst, Todd M. Austin, Trevor Mudge, and Richard B. Brown. 2001. MiBench: A free, commercially representative embedded benchmark suite. In WWC-4.Google ScholarGoogle Scholar
  47. Markus F. X. J. Oberhumer. LZO real-time data compression library. Retrieved March 12, 2018 from http://www.oberhumer.com/opensource/lzo/.Google ScholarGoogle Scholar
  48. Heather Quinn, William H. Robinson, Paolo Rech, Miguel Aguirre, Arno Barnard, Marco Desogus, Luis Entrena, Mario Garcia-Valderas, Steven M. Guertin, David Kaeli, et al. 2015. Using benchmarks for radiation testing of microprocessors and FPGAs. TNS 62, 6 (2015), 2547--2554.Google ScholarGoogle ScholarCross RefCross Ref
  49. Digilent Inc. ZYBO FPGA Board Reference Manual. Retrieved July 11, 2917 from https://reference.digilentinc.com/reference/programmable-logic/zybo/reference-manual.Google ScholarGoogle Scholar
  50. Xilinx 2016. Zynq-7000 All Programmable SoC Technical Reference Manual. Xilinx. v1.11.Google ScholarGoogle Scholar
  51. Christopher Wilson, Jacob Stewart, Patrick Gauvin, James MacKinnon, James Coole, Jonathan Urriste, Alan George, Gary Crum, Elizabeth Timmons, Jaclyn Beck, et al. 2015. CSP hybrid space computing for STP-H5/ISEM on ISS. In SmallSat.Google ScholarGoogle Scholar
  52. Régis Leveugle, A. Calvez, Paolo Maistri, and Pierre Vanhauwaert. 2009. Statistical fault injection: Quantified error and confidence. In DATE.Google ScholarGoogle Scholar
  53. Guanpeng Li, Siva Kumar Sastry Hari, Michael Sullivan, Timothy Tsai, Karthik Pattabiraman, Joel Emer, and Stephen W. Keckler. 2017. Understanding error propagation in deep learning neural network (DNN) accelerators and applications. In SC.Google ScholarGoogle Scholar

Index Terms

  1. Precise Cache Profiling for Studying Radiation Effects

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          • Article Metrics

            • Downloads (Last 12 months)18
            • Downloads (Last 6 weeks)0

            Other Metrics

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader

          HTML Format

          View this article in HTML Format .

          View HTML Format
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!