Abstract
Due to ongoing innovations in both sensor technology and spacecraft autonomy, onboard space processing continues to be outpaced by the escalating computational demands required for next-generation missions. Commercial-off-the-shelf, hybrid system-on-chips, combining fixed-logic CPUs with reconfigurable-logic FPGAs, present numerous architectural advantages that address onboard computing challenges. However, commercial devices are highly susceptible to space radiation and require dependable computing strategies to mitigate radiation-induced single-event effects. Depending upon the mission, the dynamics of the near-Earth space-radiation environment expose spacecraft to radiation fluxes that can vary by several orders of magnitude. By adopting an adaptive approach to dependable computing, spacecraft computers can reconfigure system resources to efficiently accommodate changing environmental conditions to maximize system performance while satisfying availability constraints throughout the mission. In this article, we propose Hybrid, Adaptive, Reconfigurable Fault Tolerance (HARFT), a reconfigurable framework for environmentally adaptive resilience in hybrid space systems. Furthermore, we describe a methodology to model adaptive systems, represented as phased-mission systems using Markov chains, subject to the near-Earth space-radiation environment, using a combination of orbital perturbation, geomagnetic field, and single-event effect rate prediction tools. We apply this methodology to evaluate the HARFT architecture using various static and adaptive strategies for several orbital case studies and demonstrate the achievable performability gains.
- Francesco Abate, Luca Sterpone, Carlos A. Lisboa, Luigi Carro, and Massimo Violante. 2009. New techniques for improving the performance of the lockstep architecture for SEEs mitigation in FPGA embedded processors. IEEE Transactions on Nuclear Science 56, 4 (Aug. 2009), 1992--2000. DOI:https://doi.org/10.1109/TNS.2009.2013237Google Scholar
Cross Ref
- Dimitris Agiakatsikas, Nguyen T. H. Nguyen, Zhuoran Zhao, Tong Wu, Ediz Cetin, Oliver Diessel, and Lingkan Gong. 2016. Reconfiguration control networks for TMR systems with module-based recovery. In Proceedings of the 2016 IEEE 24th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’16). 88--91. DOI:https://doi.org/10.1109/FCCM.2016.30Google Scholar
Cross Ref
- Mansoor Alam and Ubaid M. Al-Saggaf. 1986. Quantitative reliability evaluation of repairable phased-mission systems using Markov approach. IEEE Transactions on Reliability 35, 5 (Dec. 1986), 498--503. DOI:https://doi.org/10.1109/TR.1986.4335529Google Scholar
Cross Ref
- Melanie Berg, Christian Poivey, David Petrick, Daniel Espinosa, Austin Lesea, Kenneth A. LaBel, Mark Friendlich, Hak Kim, and Anthony Phan. 2008. Effectiveness of internal versus external SEU scrubbing mitigation strategies in a Xilinx FPGA: Design, test, and analysis. IEEE Transactions on Nuclear Science 55, 4 (Aug. 2008), 2259--2266. DOI:https://doi.org/10.1109/TNS.2008.2001422Google Scholar
Cross Ref
- Cristiana Bolchini, Antonio Miele, and Marco D. Santambrogio. 2007. TMR and partial dynamic reconfiguration to mitigate SEU faults in FPGAs. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT’07). 87--95. DOI:https://doi.org/10.1109/DFT.2007.25Google Scholar
- Sébastien Bourdarie and Michael Xapsos. 2008. The near-earth space radiation environment. IEEE Transactions on Nuclear Science 55, 4 (Aug. 2008), 1810--1832. DOI:https://doi.org/10.1109/TNS.2008.2001409Google Scholar
Cross Ref
- Matthew J. Cannon, Andrew M. Keller, Hayden C. Rowberry, Corbin A. Thurlow, Andés Pérez-Celis, and Michael J. Wirthlin. 2019. Strategies for removing common mode failures from TMR designs deployed on SRAM FPGAs. IEEE Transactions on Nuclear Science 66, 1 (Jan. 2019), 207--215. DOI:https://doi.org/10.1109/TNS.2018.2877579Google Scholar
Cross Ref
- BAA DARPA. 2018. Blackjack (BAA HR001118S0032). DARPA.Google Scholar
- BAA DARPA. 2019. Blackjack Pit Boss (BAA HR001119S0012). DARPA.Google Scholar
- Bill Doncaster, Caleb Williams, and Stephanie DelPozzo. 2019. 2019 Nano/Microsatellite Market Forecast, 9th Edition. SpaceWorks Enterprises, Inc.Google Scholar
- Larry D. Edmonds. 2000. Proton SEU cross sections derived from heavy-ion test data. IEEE Transactions on Nuclear Science 47, 5 (Oct. 2000), 1713--1728. DOI:https://doi.org/10.1109/23.890997Google Scholar
Cross Ref
- Alan D. George and Christopher M. Wilson. 2018. Onboard processing with hybrid and reconfigurable computing on small satellites. Proceedings of the IEEE 106, 3 (March 2018), 458--470. DOI:https://doi.org/10.1109/JPROC.2018.2802438Google Scholar
Cross Ref
- Robért Glein, Florian Rittner, and Albert Heuberger. 2018. Adaptive single-event effect mitigation for dependable processing systems based on FPGAs. Microprocessors and Microsystems 59 (2018), 46--56. DOI:https://doi.org/10.1016/j.micpro.2018.03.004Google Scholar
Cross Ref
- James R. Heirtzler. 2002. The future of the South Atlantic anomaly and implications for radiation damage in space. Journal of Atmospheric and Solar-Terrestrial Physics 64, 16 (2002), 1701--1708. DOI:https://doi.org/10.1016/S1364-6826(02)00120-7Google Scholar
Cross Ref
- Felix R. Hoots and Ronald L. Roehrich. 1980. Models for Propagation of NORAD Element Sets. Technical Report. Aerospace Defense Command Peterson AFB Co Office of Astrodynamics.Google Scholar
- Adam Jacobs, Grzegorz Cieslewski, Alan D. George, Ann Gordon-Ross, and Herman Lam. 2012. Reconfigurable fault tolerance: A comprehensive framework for reliable and adaptive FPGA-based space computing. ACM Transactions on Reconfigurable Technology and Systems 5, 4 (Dec. 2012), Article 21, 30 pages. DOI:https://doi.org/10.1145/2392616.2392619Google Scholar
Digital Library
- Jonathan M. Johnson and Michael J. Wirthlin. 2010. Voter insertion algorithms for FPGA designs using triple modular redundancy. In Proceedings of the 18th Annual ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA’10). ACM, New York, NY, 249--258. DOI:https://doi.org/10.1145/1723112.1723154Google Scholar
- Josef Koller, Geoffrey D. Reeves, and Reiner H. W. Friedel. 2009. LANL* V1.0: A radiation belt drift shell model suitable for real-time and reanalysis applications. Geoscientific Model Development 2, 2 (2009), 113--122. DOI:https://doi.org/10.5194/gmd-2-113-2009Google Scholar
Cross Ref
- Israel Koren and C. Mani Krishna. 2007. Fault-Tolerant Systems. Morgan Kaufmann, San Francisco, CA.Google Scholar
- Kuang-Hua Huang and Jacob A. Abraham. 1984. Algorithm-based fault tolerance for matrix operations. IEEE Transactions on Computers C-33, 6 (June 1984), 518--528. DOI:https://doi.org/10.1109/TC.1984.1676475Google Scholar
- Kenneth A. LaBel and Jonathan A. Pellish. 2014. National Radiation Hardness Assurance (RHA) Planning for NASA Missions: Updated Guidance. NASA Electronic Parts and Packaging Program (NEPP).Google Scholar
- Robert Le. 2012. Soft Error Mitigation Using Prioritized Essential Bits. Xilinx XAPP538 (v1. 0). Xilinx.Google Scholar
- David S. Lee, Gregory R. Allen, Gary Swift, Matthew Cannon, Michael Wirthlin, Jeffrey S. George, Rokutaro Koga, and Kangsen Huey. 2015. Single-event characterization of the 20 nm Xilinx Kintex UltraScale field-programmable gate array under heavy ion irradiation. In Proceedings of the 2015 IEEE Radiation Effects Data Workshop (REDW’15). 1--6. DOI:https://doi.org/10.1109/REDW.2015.7336736Google Scholar
Cross Ref
- David S. Lee, Michael King, William Evans, Matthew Cannon, Andrés Pérez-Celis, Jordan Anderson, Michael Wirthlin, and William Rice. 2018. Single-event characterization of 16 nm FinFET Xilinx UltraScale+ devices with heavy ion and neutron irradiation. In Proceedings of the 2018 IEEE Nuclear Space Radiation Effects Conference (NSREC’18). 1--8. DOI:https://doi.org/10.1109/NSREC.2018.8584313Google Scholar
Cross Ref
- David S. Lee, Michael Wirthlin, Gary Swift, and Anthony C. Le. 2014. Single-event characterization of the 28 nm Xilinx Kintex-7 field-programmable gate array under heavy ion irradiation. In Proceedings of the 2014 IEEE Radiation Effects Data Workshop (REDW’14). 1--5. DOI:https://doi.org/10.1109/REDW.2014.7004595Google Scholar
Cross Ref
- Ganghee Lee, Dimitris Agiakatsikas, Tong Wu, Ediz Cetin, and Oliver Diessel. 2017. TLegUp: A TMR code generation tool for SRAM-based FPGA applications using HLS. In Proceedings of the 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM’17). 129--132. DOI:https://doi.org/10.1109/FCCM.2017.57Google Scholar
Cross Ref
- Tyler M. Lovelly, Donavon Bryan, Kevin Cheng, Rachel Kreynin, Alan D. George, Ann Gordon-Ross, and Gabriel Mounce. 2014. A framework to analyze processor architectures for next-generation on-board space computing. In Proceedings of the 2014 IEEE Aerospace Conference. 1--10. DOI:https://doi.org/10.1109/AERO.2014.6836387Google Scholar
Cross Ref
- Mischa Möstl, Alexander Dörflinger, Mark Albers, Harald Michalik, and Rolf Ernst. 2019. Self-adaptation for availability in CPU-FPGA systems under soft errors. In Proceedings of the 2019 NASA/ESA Conference on Adaptive Hardware and Systems (AHS’19). 9--16. DOI:https://doi.org/10.1109/AHS.2019.000-6Google Scholar
Cross Ref
- Shubhendu S. Mukherjee, Christopher Weaver, Joel Emer, Steven K. Reinhardt, and Todd Austin. 2003. A systematic methodology to compute the architectural vulnerability factors for a high-performance microprocessor. In Proceedings of the 2003 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO-36). 29--40. DOI:https://doi.org/10.1109/MICRO.2003.1253181Google Scholar
Cross Ref
- National Academies of Sciences, Engineering, and Medicine. 2016. Achieving Science with CubeSats: Thinking Inside the Box. National Academies Press, Washington, DC. DOI:https://doi.org/10.17226/23503Google Scholar
- National Academies of Sciences, Engineering, and Medicine. 2018. Testing at the Speed of Light: The State of U.S. Electronic Parts Space Radiation Testing Infrastructure. National Academies Press, Washington, DC. DOI:https://doi.org/10.17226/24993Google Scholar
- National Academies of Sciences, Engineering, and Medicine. 2018. Thriving on Our Changing Planet: A Decadal Strategy for Earth Observation from Space. National Academies Press, Washington, DC. DOI:https://doi.org/10.17226/24938Google Scholar
- Paul P. O’Brien and Sébastien Bourdarie. 2012. The IRBEM library -- open source tools for radiation belt modeling. In Proceedings of the 2012 Fall Meeting of the American Geophysical Union. Article IN53C-1760.Google Scholar
- Björn Osterloh, Harald Michalik, Sandi A. Habinc, and Björ Fiethe. 2009. Dynamic partial reconfiguration in space applications. In Proceedings of the 2009 NASA/ESA Conference on Adaptive Hardware and Systems. 336--343. DOI:https://doi.org/10.1109/AHS.2009.13Google Scholar
Digital Library
- Jason A. Poovey, Thomas M. Conte, Markus Levy, and Shay Gal-On. 2009. A benchmark characterization of the EEMBC benchmark suite. IEEE Micro 29, 5 (Sept. 2009), 18--29. DOI:https://doi.org/10.1109/MM.2009.74Google Scholar
Digital Library
- Paul Pukite and Jan Pukite. 1998. Modeling for Reliability Analysis: Markov Modeling for Reliability, Maintainability, Safety, and Supportability Analyses of Complex Systems. IEEE Press, Piscataway, NJ.Google Scholar
- Heather Quinn, Tom Fairbanks, Justin L. Tripp, George Duran, and Beatrice Lopez. 2014. Single-event effects in low-cost, low-power microprocessors. In Proceedings of the 2014 IEEE Radiation Effects Data Workshop (REDW’14). 1--9. DOI:https://doi.org/10.1109/REDW.2014.7004596Google Scholar
Cross Ref
- Daniel Sabogal and Alan D. George. 2018. Towards resilient spaceflight systems with virtualization. In Proceedings of the 2018 IEEE Aerospace Conference. 1--8. DOI:https://doi.org/10.1109/AERO.2018.8396689Google Scholar
Cross Ref
- Sebastian Sabogal, Patrick Gauvin, Brad Shea, Daniel Sabogal, Antony Gillette, Christopher Wilson, Ansel Barchowsky, Alan D. George, Gary Crum, and Thomas Flatley. 2017. SSIVP: Spacecraft supercomputing experiment for STP-H6. In Proceedings of the 31st Annual AIAA/USU Conference on Small Satellites. 1--12.Google Scholar
- Aitzan Sari and Mihalis Psarakis. 2011. Scrubbing-based SEU mitigation approach for systems-on-programmable-chips. In Proceedings of the 2011 International Conference on Field-Programmable Technology. 1--8. DOI:https://doi.org/10.1109/FPT.2011.6132703Google Scholar
Cross Ref
- Alex Shye, Joseph Blomstedt, Tipp Moseley, Vijay J. Reddi, and Daniel A. Connors. 2009. PLR: A software approach to transient fault tolerance for multicore architectures. IEEE Transactions on Dependable and Secure Computing 6, 2 (April 2009), 135--148. DOI:https://doi.org/10.1109/TDSC.2008.62Google Scholar
Digital Library
- Felix Siegle, Tanya Vladimirova, Jørgen Ilstad, and Omar Emam. 2015. Mitigation of radiation effects in SRAM-based FPGAs for space applications. ACM Computing Surveys 47, 2 (Jan. 2015), Article 37, 34 pages. DOI:https://doi.org/10.1145/2671181Google Scholar
Digital Library
- L. Sterpone and M. Violante. 2006. A new reliability-oriented place and route algorithm for SRAM-based FPGAs. IEEE Transactions on Computers 55, 6 (June 2006), 732--744. DOI:https://doi.org/10.1109/TC.2006.82Google Scholar
Digital Library
- Aaron Stoddard, Ammon Gruwell, Peter Zabriskie, and Michael J. Wirthlin. 2017. A hybrid approach to FPGA configuration scrubbing. IEEE Transactions on Nuclear Science 64, 1 (Jan. 2017), 497--503. DOI:https://doi.org/10.1109/TNS.2016.2636666Google Scholar
Cross Ref
- Michael A. Swartout. 2017. CubeSats and mission success, 2017 update. In Proceedings of the NASA Electronic Parts and Packaging (NEPP) Electronics Technology Workshop (ETW’17).Google Scholar
- Lucas A. Tambara, Felipe Almeida, Paolo Rech, Fernanda L. Kastensmidt, Giovanni Bruni, and Christopher Frost. 2015. Measuring failure probability of coarse and fine grain TMR schemes in SRAM-based FPGAs under neutron-induced effects. In Applied Reconfigurable Computing, K. Sano, D. Soudris, M. Hübner, and P. C. Diniz (Eds.). Springer International Publishing, Cham, Switzerland, 331--338.Google Scholar
- Erwan Thébault, Christopher C. Finlay, Ciarán D. Beggan, Patrick Alken, Julien Aubert, Olivier Barrois, Francois Bertrand, et al. 2015. International geomagnetic reference field: The 12th generation. Earth, Planets and Space 67, 1 (2015), 79.Google Scholar
Cross Ref
- Jorge Tonfat, Fernanda Lima Kastensmidt, Paolo Rech, Ricardo Reis, and Heather M. Quinn. 2015. Analyzing the effectiveness of a frame-level redundancy scrubbing technique for SRAM-based FPGAs. IEEE Transactions on Nuclear Science 62, 6 (Dec. 2015), 3080--3087. DOI:https://doi.org/10.1109/TNS.2015.2489601Google Scholar
Cross Ref
- Nikolai A. Tsyganenko. 1989. A magnetospheric magnetic field model with a warped tail current sheet. Planetary and Space Science 37, 1 (1989), 5--20. DOI:https://doi.org/10.1016/0032-0633(89)90066-4Google Scholar
Cross Ref
- Allan J. Tylka, James H. Adams, Paul R. Boberg, Buddy Brownstein, William F. Dietrich, Erwin O. Flueckiger, Edward L. Petersen, Margaret A. Shea, Don F. Smart, and Edward C. Smith. 1997. CREME96: A revision of the cosmic ray effects on micro-electronics code. IEEE Transactions on Nuclear Science 44, 6 (Dec. 1997), 2150--2160. DOI:https://doi.org/10.1109/23.659030Google Scholar
Cross Ref
- Dazhi Wang and Kishor S. Trivedi. 2007. Reliability analysis of phased-mission system with independent component repairs. IEEE Transactions on Reliability 56, 3 (Sept. 2007), 540--551. DOI:https://doi.org/10.1109/TR.2007.903268Google Scholar
Cross Ref
- Christopher Wilson and Alan D. George. 2018. CSP hybrid space computing. Journal of Aerospace Information Systems 15, 4 (Feb. 2018), 215--227. DOI:https://doi.org/10.2514/1.I010572Google Scholar
Cross Ref
- Christopher Wilson, Sebastian Sabogal, Alan D. George, and Ann Gordon-Ross. 2017. Hybrid, adaptive, and reconfigurable fault tolerance. In Proceedings of the 2017 IEEE Aerospace Conference. 1--11. DOI:https://doi.org/10.1109/AERO.2017.7943867Google Scholar
Cross Ref
- Michael Wirthlin. 2015. High-reliability FPGA-based systems: Space, high-energy physics, and beyond. Proceedings of the IEEE 103, 3 (March 2015), 379--389. DOI:https://doi.org/10.1109/JPROC.2015.2404212Google Scholar
Cross Ref
- Xilinx. 2018. Soft Error Mitigation Controller (v4.1 ed.). Xilinx Product Guide (PG036). Xilinx.Google Scholar
- Xilinx. 2018. Zynq-7000 SoC Technical Reference Manual (v1.12.2 ed.). Xilinx User Guide (UG585). Xilinx.Google Scholar
- Xilinx. 2019. Libmetal and OpenAMP for Zynq Devices User Guide (v2019.1 ed.). Xilinx User Guide (UG1186). Xilinx.Google Scholar
- Xilinx. 2019. Zynq UltraScale+ Device Technical Reference Manual (v1.9 ed.). Xilinx User Guide (UG1085). Xilinx.Google Scholar
- Hongyan Zhang, Michael A. Kochte, Michael E. Imhof, Lars Bauer, Hans-Joachim Wunderlich, and Jörg Henkel. 2014. GUARD: Guaranteed reliability in dynamically reconfigurable systems. In Proceedings of the 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14). 1--6. DOI:https://doi.org/10.1145/2593069.2593146Google Scholar
Digital Library
- Zhuoran Zhao, Dimitris Agiakatsikas, Nguyen T. H. Nguyen, Ediz Cetin, and Oliver Diessel. 2016. Fine-grained module-based error recovery in FPGA-based TMR systems. In Proceeedings of the 2016 International Conference on Field-Programmable Technology (FPT’16). 101--108. DOI:https://doi.org/10.1109/FPT.2016.7929433Google Scholar
Cross Ref
Index Terms
Reconfigurable Framework for Environmentally Adaptive Resilience in Hybrid Space Systems
Recommendations
Reconfigurable Framework for Resilient Semantic Segmentation for Space Applications
Deep learning (DL) presents new opportunities for enabling spacecraft autonomy, onboard analysis, and intelligent applications for space missions. However, DL applications are computationally intensive and often infeasible to deploy on radiation-hardened (...
Microarchitectural synthesis of gracefully degradable, dynamically reconfigurable ASICs
ICCD '96: Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and ProcessorsIn this paper, we propose a novel fault-tolerance scheme, band reconfiguration, to handle multiple permanent faults in functional units of general ASIC designs. An associated high-level synthesis procedure that automatically generates such fault-...
A Dependability Analysis for Systems with Global Spares
Systems with global spares, in which a spare can replace any of multiple identical primary modules, are widely used. We present efficient algorithms for approximating the probability distribution of performance level (number of working modules) in ...






Comments