Abstract
Commercial SRAM-based, field-programmable gate arrays (FPGAs) have the potential to provide space applications with the necessary performance to meet next-generation mission requirements. However, mitigating an FPGA’s susceptibility to single-event upset (SEU) radiation is challenging. Triple-modular redundancy (TMR) techniques are traditionally used to mitigate radiation effects, but TMR incurs substantial overheads such as increased area and power requirements. In order to reduce these overheads while still providing sufficient radiation mitigation, we propose a reconfigurable fault tolerance (RFT) framework that enables system designers to dynamically adjust a system’s level of redundancy and fault mitigation based on the varying radiation incurred at different orbital positions. This framework includes an adaptive hardware architecture that leverages FPGA reconfigurable techniques to enable significant processing to be performed efficiently and reliably when environmental factors permit. To accurately estimate upset rates, we propose an upset rate modeling tool that captures time-varying radiation effects for arbitrary satellite orbits using a collection of existing, publically available tools and models. We perform fault-injection testing on a prototype RFT platform to validate the RFT architecture and RFT performability models. We combine our RFT hardware architecture and the modeled upset rates using phased-mission Markov modeling to estimate performability gains achievable using our framework for two case-study orbits.
- Acree, R., Ullah, N., Karia, A., Rahmeh, J., and Abraham, J. 1993. An object-oriented approach for implementing algorithm-based fault tolerance. In Proceedings of the 12th Annual International Phoenix Conference on Computers and Communications. 210--216.Google Scholar
- Actel. 2010a. Actel product page. http://www.actel.com/products/milaero/rtsxsu/default.aspx.Google Scholar
- Actel. 2010b. Actel product page. http://www.actel.com/products/milaero/rtpa3/default.aspx.Google Scholar
- Alam, M., Song, M., Hester, S., and Seliga, T. 2006. Reliability analysis of phased-mission systems: A practical approach. In Proceedings of the Annual Reliability and Maintainability Symposium (RAMS). 551--558. Google Scholar
Digital Library
- Alnajiar, D., Ko, Y., Imagawa, T., Konoura, H., Hiromoto, M., Mitsuyama, Y., Hashimoto, M., Ochi, H., and Onoye, T. 2009. Coarse-grained dynamically reconfigurable architecture with flexible reliability. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). 186--192.Google Scholar
- Altera. 2010. Stratix V FPGAs: Ultimate flexibility through partial and dynamic reconfiguration. http://www.altera.com/products/devices/stratix-fpgas/stratix-v/overview/partial- reconfiguration/stxv-part-reconfig.html.Google Scholar
- Bowen, N. and Pradham, D. 1993. Processor- and memory-based checkpoint and rollback recovery. Comput. 26, 2, 22--31. Google Scholar
Digital Library
- Carmichael, C., Fuller, E., Blain, P., and Caffrey, M. 1999. SEU mitigation techniques for Virtex FPGAs in space applications. In Proceedings of the 2nd Annual Military and Aerospace Applications of Programmable Devices and Technologies Conference.Google Scholar
- Ciardo, G., Marie, R., Sericola, B., and Trivedi, K. 1990. Performability analysis using semi-Markov, reward processes. IEEE Trans. Comput. 39, 10, 1251--1264. Google Scholar
Digital Library
- Cieslewski, G., George, A., and Jacobs, A. 2010. Acceleration of FPGA fault injection through multibit testing. In Proceedings of the Engineering of Reconfigurable Systems and Algorithms Conference.Google Scholar
- Dawood, A., Visser, S., and Williams, J. 2002. Reconfigurable FPGAs for real time image processing in space. In Proceedings of the 14th International Conference on Digital Signal Processing (DSP). Vol. 2, 845--848.Google Scholar
- Dobias, R., Kubalik, P., and Kubatova, H. 2005. Dependability computations for fault-tolerant system based on FPGA. In Proceedings of the 12th IEEE International Conference on Electronics, Circuits and Systems (ICECS). 1--4.Google Scholar
- Flatley, T. 2010. Advanced hybrid on-board science data processor - SpaceCube 2.0. In Proceedings of the Earth Science Technology Forum.Google Scholar
- Gano, S. 2010. JSatTrak. http://www.gano.name/shawn/JSatTrak/index.html.Google Scholar
- Garvie, M. and Thompson, A. 2004. Scrubbing away transients and jiggling around the permanent: Long survival of FPGA systems through evolutionary self-repair. In Proceedings of the 10th IEEE International Online Testing Symposium (IOLTS). 155--160. Google Scholar
Digital Library
- Gupta, A., Nooshabadi, S., Taubman, D., and Dyer, M. 2006. Realizing low-cost high-throughput general-purpose block encoder for JPEG2000. IEEE Trans. Circuits Syst. Video Technol. 16, 7, 843--858. Google Scholar
Digital Library
- Hoots, F. R. and Roehrich, R. L. 1980. SPACETRACK REPORT NO. 3-Models for propagation of NORAD element sets. http://celestrak.com/NORAD/documentation/spacetrk.pdf.Google Scholar
- Hsueh, M. and Chang, C.-I. 2008. Field programmable gate arrays (FPGA) for pixel purity index using blocks of skewers for endmember extraction in hyperspectral imagery. Int. J. High Perform. Comput. Appl. 22, 408--423. Google Scholar
Digital Library
- Huang, K.-H. and Abraham, J. 1984. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 33, 6, 518--528. Google Scholar
Digital Library
- Johnson, J., Howes, W., Wirthlin, M., McMurtrey, D., Caffrey, M., Graham, P., and Morgan, K. 2008. Using duplication with compare for on-line error detection in FPGA-based designs. In Proceedings of the IEEE Aerospace Conference.Google Scholar
- Karnik, T. and Hazucha, P. 2004. Characterization of soft errors caused by single event upsets in CMOS processes. IEEE Trans. Depend. Secure Comput. 1, 2, 128--143. Google Scholar
Digital Library
- Kim, K. and Park, K. 1994. Phased-mission system reliability under Markov environment. IEEE Trans. Rel. 43, 2, 301--309.Google Scholar
Cross Ref
- Kyriakoulakos, K. and Pnevmatikatos, D. 2009. A novel SRAM-based FPGA architecture for efficient TMR fault tolerance support. In Proceedings of the International Conference on Field Programmable Logic and Applications(FPL). 193--198.Google Scholar
- Laprie, J.-C., Arlat, J., Beounes, C., and Kanoun, K. 1990. Definition and analysis of hardware- and software-fault-tolerant architectures. IEEE Trans. Comput. 23, 7, 39--51. Google Scholar
Digital Library
- Le, C., Chan, S., Cheng, F., Fang, W., Fischman, M., Hensley, S., Johnson, R., Jourdan, M., Marina, M., Parham, B., Rogez, F., Rosen, P., Shah, B., and Taft, S. 2004. Onboard FPGA-based SAR processing for future spaceborne systems. In Proceedings of the IEEE Radar Conference. 15--20.Google Scholar
- Macmillan, S. and Maus, S. 2010. IGRF10 Model Coefficients for 1945-2010. http://modelweb.gsfc.nasa.gov/magnetos/igrf.html.Google Scholar
- Maus, S., Macmillan, S., Chernova, T., Choi, S., Dater, D., Golovkov, V., Lesur, V., Lowes, F., Lhr, H., Mai, W., McLean, S., Olsen, N., Rother, M., Sabaka, T., Thomson, A., and Zvereva, T. 2005. The 10th generation international geomagnetic reference field. Phys. Earth Planetary Interiors 151, 3--4, 320--322.Google Scholar
Cross Ref
- Meyer, J. 1982. Closed-form solutions of performability. IEEE Trans. Comput. 31, 7, 648--657. Google Scholar
Digital Library
- Morgan, K., McMurtrey, D., Pratt, B., and Wirthlin, M. 2007. A comparison of TMR with alternative fault-tolerant design techniques for FPGAs. IEEE Trans. Nucl. Sci. 54, 6, 2065--2072.Google Scholar
Cross Ref
- Naeimi, H. and DeHon, A. 2008. Fault-tolerant sub-lithographic design with rollback recovery. Nanotechnol. 19, 11, 115708.Google Scholar
Cross Ref
- Pratt, B., Caffrey, M., Graham, P., Morgan, K., and Wirthlin, M. 2006. Improving FPGA design robustness with partial TMR. In Proceedings of the 44th Annual IEEE International Reliability Physics Symposium. 226--232.Google Scholar
- Pratt, B., Wirthlin, M., Caffrey, M., Graham, P., Morgan, K., Quinn, H., and Shelley, S. 2007. Improving FPGA reliability in harsh environments using triple modular redundancy with more frequent voting. In Proceedings of the Prentice Hall. Military and Aerospace FPGA Applications Conference.Google Scholar
- Rao, T. and Fujiwara, E. 1989. Error-Control Coding for Computer Systems. Google Scholar
Digital Library
- Ratter, D. 2004. FPGAs on Mars. Xcell J., 8--11.Google Scholar
- Sahner, R. A. and Trivedi, K. S. 1987. Reliability modeling using SHARPE. IEEE Trans. Rel. 36, 2, 186--193.Google Scholar
Cross Ref
- Shim, B., Sridhara, S., and Shanbhag, N. 2004. Reliable low-power digital signal processing via reduced precision redundancy. IEEE Trans. VLSI Syst. 12, 5, 497--510. Google Scholar
Digital Library
- Silva, J., Prata, P., Rela, M., and Madeira, H. 1998. Practical issues in the use of ABFT and a new failure model. In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing. 26--35. Google Scholar
Digital Library
- Swift, G., Allen, G., Tseng, C. W., Carmichael, C., Miller, G., and George, J. 2008. Static upset characteristics of the 90nm Virtex-4QV FPGAs. In Proceedings of the IEEE Radiation Effects Data Workshop. 98--105.Google Scholar
- Troxel, I., Fehringer, M., and Chenoweth, M. 2008. Achieving multipurpose space imaging with the ARTEMIS reconfigurable payload processor. In Proceedings of the IEEE Aerospace Conference. 1--8.Google Scholar
- Tylka, A., Adams, J.H., J., Boberg, P., Brownstein, B., Dietrich, W., Flueckiger, E., Petersen, E., Shea, M., Smart, D., and Smith, E. 1997. CREME96: A revision of the cosmic ray effects on micro-electronics code. IEEE Trans. Nucl. Sci. 44, 6, 2150--2160.Google Scholar
Cross Ref
- Wang, J. 2003. Radiation effects in FPGAs. In Proceedings of the 9th Workshop on Electronics for LHC Experiments.Google Scholar
- Wang, S.-J. and Jha, N. 1994. Algorithm-based fault tolerance for FFT networks. IEEE Trans. Comput. 43, 7, 849--854. Google Scholar
Digital Library
- Williams, J., Massie, C., George, A. D., Richardson, J., Gosrani, K., and Lam, H. 2010. Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Trans. Reconfigur. Technol. Syst. 3, 1--29. Google Scholar
Digital Library
- Xilinx 2004. XTMR Tool User Guide. Xilinx. Xilinx User Guide UG156.Google Scholar
- Xilinx 2010a. Partial Reconfiguration User Guide. Xilinx. Xilinx User Guide UG702.Google Scholar
- Xilinx 2010b. SEU Strategies for Virtex-5 Devices. Xilinx. Xilinx Application Note XAPP864.Google Scholar
- Xilinx 2010c. Space-Grade Virtex-4QV Family Overview. Xilinx. Xilinx Product Specification DS653.Google Scholar
Index Terms
Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing
Recommendations
Low-Overhead Fault-Tolerance Technique for a Dynamically Reconfigurable Softcore Processor
In this paper, we propose a new approach to implement a reliable softcore processor on SRAM-based FPGAs, which can mitigate radiation-induced temporary faults (single-event upsets (SEUs)) at moderate cost. A new Enhanced Lockstep scheme built using a ...
Fault tolerant techniques for reconfigurable platforms
A2CWiC '10: Proceedings of the 1st Amrita ACM-W Celebration on Women in Computing in IndiaReconfigurable architectures possess the flexibility of software solutions as well as the high performance typified by hardware implementations to offer an excellent platform for developing quality-driven embedded applications. Among reconfigurable ...
Fault Tolerant Soft-Core Processor Architecture Based on Temporal Redundancy
Embedded soft-core processors are becoming the usual solution to deal with network and data communications inside FPGAs. However, when developing space-based applications, the designer must consider the effects of ionizing radiation such as Total ...






Comments