skip to main content
research-article

Fine-Grained Module-Based Error Recovery in FPGA-Based TMR Systems

Published:24 January 2018Publication History
Skip Abstract Section

Abstract

Space processing applications deployed on SRAM-based Field Programmable Gate Arrays (FPGAs) are vulnerable to radiation-induced Single Event Upsets (SEUs). Compared with the well-known SEU mitigation solution—Triple Modular Redundancy (TMR) with configuration memory scrubbing—TMR with module-based error recovery (MER) is notably more energy efficient and responsive in repairing soft-errors in the system. Unfortunately, TMR-MER systems also need to resort to scrubbing when errors occur between sub-components, such as in interconnection nets, which are not recovered by MER. This article addresses this problem by proposing a fine-grained module-based error recovery technique, which can localize and correct errors that classic MER fails to do without additional system hardware. We evaluate our proposal via fault-injection campaigns on three types of circuits implemented in Xilinx 7-Series devices. With respect to scrubbing, we observed reductions in the mean time to repair configuration memory errors of between 48.5% and 89.4%, while reductions in energy used recovering from configuration memory errors were estimated at between 77.4% and 96.1%. These improvements result in higher reliability for systems employing TMR with fine-grained reconfiguration than equivalent systems relying on scrubbing for configuration error recovery.

References

  1. Dimitris Agiakatsikas, Ediz Cetin, and Oliver Diessel. 2016. FMER: A hybrid configuration memory error recovery scheme for highly reliable FPGA SoCs. In FPL. 1--4.Google ScholarGoogle Scholar
  2. Dimitris Agiakatsikas, Nguyen T. H. Nguyen, Zhuoran Zhao, Tong Wu, Ediz Cetin, Oliver Diessel, and Lingkan Gong. 2016. Reconfiguration control networks for TMR systems with module-based recovery. In FCCM. 88--91.Google ScholarGoogle Scholar
  3. Ghazanfar Asadi and Mehdi B. Tahoori. 2005. Soft error rate estimation and mitigation for SRAM-based FPGAs. In FPGA. 149--160. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cristiana Bolchini, Antonio Miele, and Chiara Sandionigi. 2011. A novel design methodology for implementing reliability-aware systems on SRAM-based FPGAs. IEEE Trans. Comput. 60, 12 (2011), 1744--1758. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level synthesis for FPGA-based processor/accelerator systems. In FPGA. 33--36. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ediz Cetin, Oliver Diessel, Lingkan Gong, and Victor Lai. 2013. Towards bounded error recovery time in FPGA-based TMR circuits using dynamic partial reconfiguration. In FPL. 1--4.Google ScholarGoogle Scholar
  7. Ediz Cetin, Oliver Diessel, Tuo Li, Jude A. Ambrose, Thomas Fisk, Sri Parameswaran, and Andrew G. Dempster. 2016. Overview and investigation of SEU detection and recovery approaches for FPGA-based heterogeneous systems. In FPGAs and Parallel Architectures for Aerospace Applications. Springer, 33--46.Google ScholarGoogle Scholar
  8. Sergio D’Angelo, Cecilia Metra, Sandro Pastore, A. Pogutz, and Giacomo R. Sechi. 1998. Fault-tolerant voting mechanism and recovery scheme for TMR FPGA-based systems. In DFT. 233--240. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Jonathan M. Johnson and Michael J. Wirthlin. 2010. Voter insertion algorithms for FPGA designs using triple modular redundancy. In FPGA. 249--258. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Ganghee Lee, Dimitris Agiakatsikas, Tong Wu, Ediz Cetin, and Oliver Diessel. 2017. TLegUp: A TMR code generation tool for SRAM-based FPGA applications using HLS. In FCCM. 1--4.Google ScholarGoogle Scholar
  11. Daniel McMurtrey, Keith S . Morgan, Brian Pratt, and Michael J Wirthlin. 2008. Estimating TMR Reliability on FPGAs Using Markov Models. Technical Report. Brigham Young University. Retrieved from http://scholarsarchive.byu.edu/facpub/149.Google ScholarGoogle Scholar
  12. Razvan Nane, Vlad-Mihai Sima, Christian Pilato, Jongsok Choi, Blair Fort, Andrew Canis, Yu Ting Chen, Hsuan Hsiao, Stephen Brown, Fabrizio Ferrandi, Jason Anderson, and Koen Bertels. 2016. A survey and evaluation of FPGA high-level synthesis tools. IEEE Trans. Comput.-Aid. Des. Integr. Circuits Syst. 35, 10 (2016), 1591--1604. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Gabriel Luca Nazar, Leonardo Pereira Santos, and Luigi Carro. 2015. Fine-grained fast field-programmable gate array scrubbing. IEEE Trans. VLSI Syst. 23, 5 (2015), 893--904.Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. QB50 Project. 2009. Homepage. Retrieved June 6, 2017 from https://www.qb50.eu.Google ScholarGoogle Scholar
  15. Luca Sterpone, Matteo Sonza Reorda, and Massimo Violante. 2005. RoRA: A reliability-oriented place and route algorithm for SRAM-based FPGAs. In PRIME, Vol. 1. IEEE, 173--176.Google ScholarGoogle Scholar
  16. Martin Straka, Jan Kastil, Zdenek Kotasek, and Lukas Miculka. 2013. Fault tolerant system design and SEU injection based testing. Microprocess Microsy 37, 2 (2013), 155--173. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Jorge Tonfat, Fernanda Kastensmidt, and Ricardo Reis. 2015. Analyzing the effectiveness of a frame-level redundancy scrubbing technique for SRAM-based FPGAs. IEEE Trans. Nucl. Sci. 62, 6 (Dec. 2015), 3080--3087.Google ScholarGoogle ScholarCross RefCross Ref
  18. Xilinx Inc. 2013. UG470: 7 Series FPGAs Configuration User Guide. Retrieved from https://www.xilinx.com/support/documentation/user_guides/ug470_7Series_Config.pdf.Google ScholarGoogle Scholar
  19. Xilinx Inc. 2015. PG036: Product Guide - Soft Error Mitigation Controller (v4.1). Retrieved from https://www.xilinx.com/support/documentation/ip_documentation/sem/v4_1/pg036_sem.pdf.Google ScholarGoogle Scholar
  20. Xilinx Inc. 2015. UG909: Vivado Design Suite User Guide—Partial Reconfiguration. Retrieved from https://www.xilinx.com/support/documentation/sw_manuals/xilinx2017_1/ug909-vivado-partial-reconfiguration.pdf.Google ScholarGoogle Scholar
  21. Xilinx Inc.2016. XAPP1222: Isolation Design Flow for Xilinx 7 Series FPGAs or Zynq-7000 AP SoCs (Vivado Tools). Retrieved from https://www.xilinx.com/support/documentation/application_notes/xapp1222-idf-for-7s-or-zynq-vivado.pdf.Google ScholarGoogle Scholar
  22. Zhuoran Zhao, Dimitris Agiakatsikas, Nguyen T. H. Nguyen, Ediz Cetin, and Oliver Diessel. 2016. Fine-grained module-based error recovery in FPGA-based TMR systems. In FPT. 101--108.Google ScholarGoogle Scholar

Index Terms

  1. Fine-Grained Module-Based Error Recovery in FPGA-Based TMR Systems

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!