Abstract
Space processing applications deployed on SRAM-based Field Programmable Gate Arrays (FPGAs) are vulnerable to radiation-induced Single Event Upsets (SEUs). Compared with the well-known SEU mitigation solution—Triple Modular Redundancy (TMR) with configuration memory scrubbing—TMR with module-based error recovery (MER) is notably more energy efficient and responsive in repairing soft-errors in the system. Unfortunately, TMR-MER systems also need to resort to scrubbing when errors occur between sub-components, such as in interconnection nets, which are not recovered by MER. This article addresses this problem by proposing a fine-grained module-based error recovery technique, which can localize and correct errors that classic MER fails to do without additional system hardware. We evaluate our proposal via fault-injection campaigns on three types of circuits implemented in Xilinx 7-Series devices. With respect to scrubbing, we observed reductions in the mean time to repair configuration memory errors of between 48.5% and 89.4%, while reductions in energy used recovering from configuration memory errors were estimated at between 77.4% and 96.1%. These improvements result in higher reliability for systems employing TMR with fine-grained reconfiguration than equivalent systems relying on scrubbing for configuration error recovery.
- Dimitris Agiakatsikas, Ediz Cetin, and Oliver Diessel. 2016. FMER: A hybrid configuration memory error recovery scheme for highly reliable FPGA SoCs. In FPL. 1--4.Google Scholar
- Dimitris Agiakatsikas, Nguyen T. H. Nguyen, Zhuoran Zhao, Tong Wu, Ediz Cetin, Oliver Diessel, and Lingkan Gong. 2016. Reconfiguration control networks for TMR systems with module-based recovery. In FCCM. 88--91.Google Scholar
- Ghazanfar Asadi and Mehdi B. Tahoori. 2005. Soft error rate estimation and mitigation for SRAM-based FPGAs. In FPGA. 149--160. Google Scholar
Digital Library
- Cristiana Bolchini, Antonio Miele, and Chiara Sandionigi. 2011. A novel design methodology for implementing reliability-aware systems on SRAM-based FPGAs. IEEE Trans. Comput. 60, 12 (2011), 1744--1758. Google Scholar
Digital Library
- Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In FPGA. 33--36. Google Scholar
Digital Library
- Ediz Cetin, Oliver Diessel, Lingkan Gong, and Victor Lai. 2013. Towards bounded error recovery time in FPGA-based TMR circuits using dynamic partial reconfiguration. In FPL. 1--4.Google Scholar
- Ediz Cetin, Oliver Diessel, Tuo Li, Jude A. Ambrose, Thomas Fisk, Sri Parameswaran, and Andrew G. Dempster. 2016. Overview and investigation of SEU detection and recovery approaches for FPGA-based heterogeneous systems. In FPGAs and Parallel Architectures for Aerospace Applications. Springer, 33--46.Google Scholar
- Sergio D’Angelo, Cecilia Metra, Sandro Pastore, A. Pogutz, and Giacomo R. Sechi. 1998. Fault-tolerant voting mechanism and recovery scheme for TMR FPGA-based systems. In DFT. 233--240. Google Scholar
Digital Library
- Jonathan M. Johnson and Michael J. Wirthlin. 2010. Voter insertion algorithms for FPGA designs using triple modular redundancy. In FPGA. 249--258. Google Scholar
Digital Library
- Ganghee Lee, Dimitris Agiakatsikas, Tong Wu, Ediz Cetin, and Oliver Diessel. 2017. TLegUp: A TMR code generation tool for SRAM-based FPGA applications using HLS. In FCCM. 1--4.Google Scholar
- Daniel McMurtrey, Keith S . Morgan, Brian Pratt, and Michael J Wirthlin. 2008. Estimating TMR Reliability on FPGAs Using Markov Models. Technical Report. Brigham Young University. Retrieved from http://scholarsarchive.byu.edu/facpub/149.Google Scholar
- Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, Jason Anderson, and Koen Bertels. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 35, 10 (2016), 1591--1604. Google Scholar
Digital Library
- Gabriel Luca Nazar, Leonardo Pereira Santos, and Luigi Carro. 2015. Fine-grained fast field-programmable gate array scrubbing. IEEE Trans. VLSI Syst. 23, 5 (2015), 893--904.Google Scholar
Digital Library
- QB50 Project. 2009. Homepage. Retrieved June 6, 2017 from https://www.qb50.eu.Google Scholar
- Luca Sterpone, Matteo Sonza Reorda, and Massimo Violante. 2005. RoRA: A reliability-oriented place and route algorithm for SRAM-based FPGAs. In PRIME, Vol. 1. IEEE, 173--176.Google Scholar
- Martin Straka, Jan Kastil, Zdenek Kotasek, and Lukas Miculka. 2013. Fault tolerant system design and SEU injection based testing. Microprocess Microsy 37, 2 (2013), 155--173. Google Scholar
Digital Library
- Jorge Tonfat, Fernanda Kastensmidt, and Ricardo Reis. 2015. Analyzing the effectiveness of a frame-level redundancy scrubbing technique for SRAM-based FPGAs. IEEE Trans. Nucl. Sci. 62, 6 (Dec. 2015), 3080--3087.Google Scholar
Cross Ref
- Xilinx Inc. 2013. UG470: 7 Series FPGAs Configuration User Guide. Retrieved from https://www.xilinx.com/support/documentation/user_guides/ug470_7Series_Config.pdf.Google Scholar
- Xilinx Inc. 2015. PG036: Product Guide - Soft Error Mitigation Controller (v4.1). Retrieved from https://www.xilinx.com/support/documentation/ip_documentation/sem/v4_1/pg036_sem.pdf.Google Scholar
- Xilinx Inc. 2015. UG909: Vivado Design Suite User Guide—Partial Reconfiguration. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_1/ug909-vivado-partial-reconfiguration.pdf.Google Scholar
- Xilinx Inc.2016. XAPP1222: Isolation Design Flow for Xilinx 7 Series FPGAs or Zynq-7000 AP SoCs (Vivado Tools). Retrieved from https://www.xilinx.com/support/documentation/application_notes/xapp1222-idf-for-7s-or-zynq-vivado.pdf.Google Scholar
- Zhuoran Zhao, Dimitris Agiakatsikas, Nguyen T. H. Nguyen, Ediz Cetin, and Oliver Diessel. 2016. Fine-grained module-based error recovery in FPGA-based TMR systems. In FPT. 101--108.Google Scholar
Index Terms
Fine-Grained Module-Based Error Recovery in FPGA-Based TMR Systems
Recommendations
Master-Slave TMR Inspired Technique for Fault Tolerance of SRAM-Based FPGA
ISVLSI '10: Proceedings of the 2010 IEEE Annual Symposium on VLSIIn order to increase reliability and availability of Static-RAM based field programmable gate arrays (SRAM-based FPGAs), several methods of tolerating defects and permanent faults have been developed and applied. These methods are not well adapted for ...
Fault recovery technique for TMR softcore processor system using partial reconfiguration
ICA3PP'12: Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part ISystem LSI is used for the dependable system, such as in-vehicle system. However, the miniaturization of semiconductor manufacturing process degrades the system dependability. We focus attention on SRAM-based FPGAs (Field Programmable Gate Arrays) which ...
Self-reference scrubber for TMR systems based on xilinx virtex FPGAs
PATMOS'11: Proceedings of the 21st international conference on Integrated circuit and system design: power and timing modeling, optimization, and simulationSRAM-based FPGAs are sensitive to radiation effects. Soft errors can appear and accumulate, potentially defeating mitigation strategies deployed at the Application Layer. Therefore, Configuration Memory scrubbing is required to improve radiation ...






Comments