skip to main content
research-article

Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing

Published:01 December 2012Publication History
Skip Abstract Section

Abstract

Commercial SRAM-based, field-programmable gate arrays (FPGAs) have the potential to provide space applications with the necessary performance to meet next-generation mission requirements. However, mitigating an FPGA’s susceptibility to single-event upset (SEU) radiation is challenging. Triple-modular redundancy (TMR) techniques are traditionally used to mitigate radiation effects, but TMR incurs substantial overheads such as increased area and power requirements. In order to reduce these overheads while still providing sufficient radiation mitigation, we propose a reconfigurable fault tolerance (RFT) framework that enables system designers to dynamically adjust a system’s level of redundancy and fault mitigation based on the varying radiation incurred at different orbital positions. This framework includes an adaptive hardware architecture that leverages FPGA reconfigurable techniques to enable significant processing to be performed efficiently and reliably when environmental factors permit. To accurately estimate upset rates, we propose an upset rate modeling tool that captures time-varying radiation effects for arbitrary satellite orbits using a collection of existing, publically available tools and models. We perform fault-injection testing on a prototype RFT platform to validate the RFT architecture and RFT performability models. We combine our RFT hardware architecture and the modeled upset rates using phased-mission Markov modeling to estimate performability gains achievable using our framework for two case-study orbits.

References

  1. Acree, R., Ullah, N., Karia, A., Rahmeh, J., and Abraham, J. 1993. An object-oriented approach for implementing algorithm-based fault tolerance. In Proceedings of the 12th Annual International Phoenix Conference on Computers and Communications. 210--216.Google ScholarGoogle Scholar
  2. Actel. 2010a. Actel product page. http://www.actel.com/products/milaero/rtsxsu/default.aspx.Google ScholarGoogle Scholar
  3. Actel. 2010b. Actel product page. http://www.actel.com/products/milaero/rtpa3/default.aspx.Google ScholarGoogle Scholar
  4. Alam, M., Song, M., Hester, S., and Seliga, T. 2006. Reliability analysis of phased-mission systems: A practical approach. In Proceedings of the Annual Reliability and Maintainability Symposium (RAMS). 551--558. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Alnajiar, D., Ko, Y., Imagawa, T., Konoura, H., Hiromoto, M., Mitsuyama, Y., Hashimoto, M., Ochi, H., and Onoye, T. 2009. Coarse-grained dynamically reconfigurable architecture with flexible reliability. In Proceedings of the International Conference on Field Programmable Logic and Applications (FPL). 186--192.Google ScholarGoogle Scholar
  6. Altera. 2010. Stratix V FPGAs: Ultimate flexibility through partial and dynamic reconfiguration. http://www.altera.com/products/devices/stratix-fpgas/stratix-v/overview/partial- reconfiguration/stxv-part-reconfig.html.Google ScholarGoogle Scholar
  7. Bowen, N. and Pradham, D. 1993. Processor- and memory-based checkpoint and rollback recovery. Comput. 26, 2, 22--31. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Carmichael, C., Fuller, E., Blain, P., and Caffrey, M. 1999. SEU mitigation techniques for Virtex FPGAs in space applications. In Proceedings of the 2nd Annual Military and Aerospace Applications of Programmable Devices and Technologies Conference.Google ScholarGoogle Scholar
  9. Ciardo, G., Marie, R., Sericola, B., and Trivedi, K. 1990. Performability analysis using semi-Markov, reward processes. IEEE Trans. Comput. 39, 10, 1251--1264. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Cieslewski, G., George, A., and Jacobs, A. 2010. Acceleration of FPGA fault injection through multibit testing. In Proceedings of the Engineering of Reconfigurable Systems and Algorithms Conference.Google ScholarGoogle Scholar
  11. Dawood, A., Visser, S., and Williams, J. 2002. Reconfigurable FPGAs for real time image processing in space. In Proceedings of the 14th International Conference on Digital Signal Processing (DSP). Vol. 2, 845--848.Google ScholarGoogle Scholar
  12. Dobias, R., Kubalik, P., and Kubatova, H. 2005. Dependability computations for fault-tolerant system based on FPGA. In Proceedings of the 12th IEEE International Conference on Electronics, Circuits and Systems (ICECS). 1--4.Google ScholarGoogle Scholar
  13. Flatley, T. 2010. Advanced hybrid on-board science data processor - SpaceCube 2.0. In Proceedings of the Earth Science Technology Forum.Google ScholarGoogle Scholar
  14. Gano, S. 2010. JSatTrak. http://www.gano.name/shawn/JSatTrak/index.html.Google ScholarGoogle Scholar
  15. Garvie, M. and Thompson, A. 2004. Scrubbing away transients and jiggling around the permanent: Long survival of FPGA systems through evolutionary self-repair. In Proceedings of the 10th IEEE International Online Testing Symposium (IOLTS). 155--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Gupta, A., Nooshabadi, S., Taubman, D., and Dyer, M. 2006. Realizing low-cost high-throughput general-purpose block encoder for JPEG2000. IEEE Trans. Circuits Syst. Video Technol. 16, 7, 843--858. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Hoots, F. R. and Roehrich, R. L. 1980. SPACETRACK REPORT NO. 3-Models for propagation of NORAD element sets. http://celestrak.com/NORAD/documentation/spacetrk.pdf.Google ScholarGoogle Scholar
  18. Hsueh, M. and Chang, C.-I. 2008. Field programmable gate arrays (FPGA) for pixel purity index using blocks of skewers for endmember extraction in hyperspectral imagery. Int. J. High Perform. Comput. Appl. 22, 408--423. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Huang, K.-H. and Abraham, J. 1984. Algorithm-based fault tolerance for matrix operations. IEEE Trans. Comput. 33, 6, 518--528. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Johnson, J., Howes, W., Wirthlin, M., McMurtrey, D., Caffrey, M., Graham, P., and Morgan, K. 2008. Using duplication with compare for on-line error detection in FPGA-based designs. In Proceedings of the IEEE Aerospace Conference.Google ScholarGoogle Scholar
  21. Karnik, T. and Hazucha, P. 2004. Characterization of soft errors caused by single event upsets in CMOS processes. IEEE Trans. Depend. Secure Comput. 1, 2, 128--143. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Kim, K. and Park, K. 1994. Phased-mission system reliability under Markov environment. IEEE Trans. Rel. 43, 2, 301--309.Google ScholarGoogle ScholarCross RefCross Ref
  23. Kyriakoulakos, K. and Pnevmatikatos, D. 2009. A novel SRAM-based FPGA architecture for efficient TMR fault tolerance support. In Proceedings of the International Conference on Field Programmable Logic and Applications(FPL). 193--198.Google ScholarGoogle Scholar
  24. Laprie, J.-C., Arlat, J., Beounes, C., and Kanoun, K. 1990. Definition and analysis of hardware- and software-fault-tolerant architectures. IEEE Trans. Comput. 23, 7, 39--51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Le, C., Chan, S., Cheng, F., Fang, W., Fischman, M., Hensley, S., Johnson, R., Jourdan, M., Marina, M., Parham, B., Rogez, F., Rosen, P., Shah, B., and Taft, S. 2004. Onboard FPGA-based SAR processing for future spaceborne systems. In Proceedings of the IEEE Radar Conference. 15--20.Google ScholarGoogle Scholar
  26. Macmillan, S. and Maus, S. 2010. IGRF10 Model Coefficients for 1945-2010. http://modelweb.gsfc.nasa.gov/magnetos/igrf.html.Google ScholarGoogle Scholar
  27. Maus, S., Macmillan, S., Chernova, T., Choi, S., Dater, D., Golovkov, V., Lesur, V., Lowes, F., Lhr, H., Mai, W., McLean, S., Olsen, N., Rother, M., Sabaka, T., Thomson, A., and Zvereva, T. 2005. The 10th generation international geomagnetic reference field. Phys. Earth Planetary Interiors 151, 3--4, 320--322.Google ScholarGoogle ScholarCross RefCross Ref
  28. Meyer, J. 1982. Closed-form solutions of performability. IEEE Trans. Comput. 31, 7, 648--657. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Morgan, K., McMurtrey, D., Pratt, B., and Wirthlin, M. 2007. A comparison of TMR with alternative fault-tolerant design techniques for FPGAs. IEEE Trans. Nucl. Sci. 54, 6, 2065--2072.Google ScholarGoogle ScholarCross RefCross Ref
  30. Naeimi, H. and DeHon, A. 2008. Fault-tolerant sub-lithographic design with rollback recovery. Nanotechnol. 19, 11, 115708.Google ScholarGoogle ScholarCross RefCross Ref
  31. Pratt, B., Caffrey, M., Graham, P., Morgan, K., and Wirthlin, M. 2006. Improving FPGA design robustness with partial TMR. In Proceedings of the 44th Annual IEEE International Reliability Physics Symposium. 226--232.Google ScholarGoogle Scholar
  32. Pratt, B., Wirthlin, M., Caffrey, M., Graham, P., Morgan, K., Quinn, H., and Shelley, S. 2007. Improving FPGA reliability in harsh environments using triple modular redundancy with more frequent voting. In Proceedings of the Prentice Hall. Military and Aerospace FPGA Applications Conference.Google ScholarGoogle Scholar
  33. Rao, T. and Fujiwara, E. 1989. Error-Control Coding for Computer Systems. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Ratter, D. 2004. FPGAs on Mars. Xcell J., 8--11.Google ScholarGoogle Scholar
  35. Sahner, R. A. and Trivedi, K. S. 1987. Reliability modeling using SHARPE. IEEE Trans. Rel. 36, 2, 186--193.Google ScholarGoogle ScholarCross RefCross Ref
  36. Shim, B., Sridhara, S., and Shanbhag, N. 2004. Reliable low-power digital signal processing via reduced precision redundancy. IEEE Trans. VLSI Syst. 12, 5, 497--510. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Silva, J., Prata, P., Rela, M., and Madeira, H. 1998. Practical issues in the use of ABFT and a new failure model. In Proceedings of the 28th Annual International Symposium on Fault-Tolerant Computing. 26--35. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Swift, G., Allen, G., Tseng, C. W., Carmichael, C., Miller, G., and George, J. 2008. Static upset characteristics of the 90nm Virtex-4QV FPGAs. In Proceedings of the IEEE Radiation Effects Data Workshop. 98--105.Google ScholarGoogle Scholar
  39. Troxel, I., Fehringer, M., and Chenoweth, M. 2008. Achieving multipurpose space imaging with the ARTEMIS reconfigurable payload processor. In Proceedings of the IEEE Aerospace Conference. 1--8.Google ScholarGoogle Scholar
  40. Tylka, A., Adams, J.H., J., Boberg, P., Brownstein, B., Dietrich, W., Flueckiger, E., Petersen, E., Shea, M., Smart, D., and Smith, E. 1997. CREME96: A revision of the cosmic ray effects on micro-electronics code. IEEE Trans. Nucl. Sci. 44, 6, 2150--2160.Google ScholarGoogle ScholarCross RefCross Ref
  41. Wang, J. 2003. Radiation effects in FPGAs. In Proceedings of the 9th Workshop on Electronics for LHC Experiments.Google ScholarGoogle Scholar
  42. Wang, S.-J. and Jha, N. 1994. Algorithm-based fault tolerance for FFT networks. IEEE Trans. Comput. 43, 7, 849--854. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. Williams, J., Massie, C., George, A. D., Richardson, J., Gosrani, K., and Lam, H. 2010. Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Trans. Reconfigur. Technol. Syst. 3, 1--29. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Xilinx 2004. XTMR Tool User Guide. Xilinx. Xilinx User Guide UG156.Google ScholarGoogle Scholar
  45. Xilinx 2010a. Partial Reconfiguration User Guide. Xilinx. Xilinx User Guide UG702.Google ScholarGoogle Scholar
  46. Xilinx 2010b. SEU Strategies for Virtex-5 Devices. Xilinx. Xilinx Application Note XAPP864.Google ScholarGoogle Scholar
  47. Xilinx 2010c. Space-Grade Virtex-4QV Family Overview. Xilinx. Xilinx Product Specification DS653.Google ScholarGoogle Scholar

Index Terms

  1. Reconfigurable Fault Tolerance: A Comprehensive Framework for Reliable and Adaptive FPGA-Based Space Computing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        • Published in

          cover image ACM Transactions on Reconfigurable Technology and Systems
          ACM Transactions on Reconfigurable Technology and Systems  Volume 5, Issue 4
          December 2012
          95 pages
          ISSN:1936-7406
          EISSN:1936-7414
          DOI:10.1145/2392616
          Issue’s Table of Contents

          Copyright © 2012 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 1 December 2012
          • Accepted: 1 June 2012
          • Revised: 1 February 2012
          • Received: 1 May 2011
          Published in trets Volume 5, Issue 4

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article
          • Research
          • Refereed

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!