skip to main content
research-article

Fault Recovery Time Analysis for Coarse-Grained Reconfigurable Architectures

Published:21 November 2017Publication History
Skip Abstract Section

Abstract

Coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their performance and flexibility advantages. Typically, CGRAs incorporate many processing elements in the form of an array, which is suitable for implementing spatial redundancy, as used in the design of fault-tolerant systems. This article introduces a recovery time model for transient faults in CGRAs. The proposed fault-tolerant CGRAs are based on triple modular redundancy and coding techniques for error detection and correction. To evaluate the model, several kernels from space computing are mapped onto the suggested architecture. We demonstrate the tradeoff between recovery time, performance, and area. In addition, the average execution time of an application including recovery time is evaluated using area-based error-rate estimates in harsh radiation environments. The results show that task partitioning is important for bounding the recovery time of applications that have long execution times. It is also shown that error-correcting code (ECC) is of limited practical value for tasks with long execution times in high radiation environments, or when the degree of task partitioning is high.

References

  1. Dawood Alnajjar, Hiroaki Konoura, Younghun Ko, Yukio Mitsuyama, Masanori Hashimoto, and Takao Onoye. 2013. Implementing flexible reliability in a coarse-grained reconfigurable architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, 12 (2013), 2165--2178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Lorena Anghel and Michael Nicolaidis. 2000. Cost reduction and evaluation of temporary faults detecting technique. In Proceedings of the conference on Design, Automation and Test in Europe. ACM, 591--598. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Muhammad Moazam Azeem, Stanislaw J. Piestrak, Olivier Sentieys, and Sebastien Pillement. 2011. Error recovery technique for coarse-grained reconfigurable architectures. In 2011 IEEE 14th International Symposium on Design and Diagnostics of Electronic Circuits 8 Systems (DDECS’11). IEEE, 441--446.Google ScholarGoogle ScholarCross RefCross Ref
  4. Ediz Cetin, Oliver Diessel, Lingkan Gong, and Victor Lai. 2013. Towards bounded error recovery time in FPGA-based TMR circuits using dynamic partial reconfiguration. In 2013 23rd International Conference on Field Programmable Logic and Applications (FPL’13). IEEE, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  5. Ediz Cetin, Oliver Diessel, Tuo Li, Jude A. Ambrose, Thomas Fisk, Sri Parameswaran, and Andrew G. Dempster. 2016. Overview and investigation of SEU detection and recovery approaches for FPGA-based heterogeneous systems. In FPGAs and Parallel Architectures for Aerospace Applications. Springer, 33--46.Google ScholarGoogle Scholar
  6. Jason A. Cheatham, John M. Emmert, and Stan Baumgart. 2006. A survey of fault tolerant methodologies for FPGAs. ACM Transactions on Design Automation of Electronic Systems (TODAES) 11, 2 (2006), 501--533. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. G. Cieslewski, A. Jacobs, C. Conger, and A. George. 2008. Advanced space computing with system-level fault tolerance (invited talk). In 1st Workshop on Fault-Tolerant Spaceborne Computing-Employing New Technologies.Google ScholarGoogle Scholar
  8. L. J. Deutsch and R. L. Miller. 1982. The effects of Viterbi decoder node synchronization losses on the telemetry receiving system. Jet Propulsion Lab., Pasadena, CA, TDA Progress Rep 42 (1982), 68.Google ScholarGoogle Scholar
  9. Robért Glein, Florian Rittner, Andreas Becher, Daniel Ziener, Jürgen Frickel, Jürgen Teich, and Albert Heuberger. 2015. Reliability of space-grade vs. COTS SRAM-based FPGA in N-modular redundancy. In 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS’15). IEEE, 1--8.Google ScholarGoogle ScholarCross RefCross Ref
  10. Richard W. Hamming. 1950. Error detecting and error correcting codes. Bell Labs Technical Journal 29, 2 (1950), 147--160.Google ScholarGoogle ScholarCross RefCross Ref
  11. Kyuseung Han, Ganghee Lee, and Kiyoung Choi. 2014. Software-level approaches for tolerating transient faults in a coarse-grained reconfigurable architecture. IEEE Transactions on Dependable and Secure Computing 11, 4 (2014), 392--398.Google ScholarGoogle ScholarCross RefCross Ref
  12. Reiner Hartenstein. 2001. Coarse grain reconfigurable architectures. In Proceedings of the Design Automation Conference, 2001 (ASP-DAC’01), Asia and South Pacific. IEEE, 564--569. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. David F. Heidel, Paul W. Marshall, Kenneth A. LaBel, James R. Schwank, Kenneth P. Rodbell, Mark C. Hakey, Melanie D. Berg, Paul E. Dodd, Mark R. Friendlich, Anthony D. Phan, Christina M. Seidleck, Marty R. Shaneyfelt, and Michael A. Xapsos. 2008. Low energy proton single-event-upset test results on 65 nm SOI SRAM. IEEE Transactions on Nuclear Science 55, 6 (2008), 3394--3400.Google ScholarGoogle ScholarCross RefCross Ref
  14. Yoonjin Kim and Rabi N. Mahapatra. 2011. Design of Low-Power Coarse-Grained Reconfigurable Architectures. CRC Press, Boca Raton. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Hiroaki Konoura, Dawood Alnajjar, Yukio Mitsuyama, Hajime Shimada, Kazutoshi Kobayashi, Hiroyuki Kanbara, Hiroyuki Ochi, Takashi Imagawa, Kazutoshi Wakabayashix, Masanori Hashimoto, Takao Onoye, and Hidetoshi Onoderas. 2014. Reliability-configurable mixed-grained reconfigurable array supporting c-based design and its irradiation testing. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 97, 12 (2014), 2518--2529.Google ScholarGoogle ScholarCross RefCross Ref
  16. Ganghee Lee, Kiyoung Choi, and Nikil D. Dutt. 2011. Mapping multi-domain applications onto coarse-grained reconfigurable architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 5 (2011), 637--650. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Tyler M. Lovelly, Donavon Bryan, Kevin Cheng, Rachel Kreynin, Alan D. George, Ann Gordon-Ross, and Gabriel Mounce. 2014. A framework to analyze processor architectures for next-generation on-board space computing. In 2014 IEEE Aerospace Conference. IEEE, 1--10.Google ScholarGoogle ScholarCross RefCross Ref
  18. Mathstar. 2007. Arrix Family FPOA architecture guide. Retrieved from http://www.mathstar.com.Google ScholarGoogle Scholar
  19. Riaz Naseer and Jeff Draper. 2008. DEC ECC design to improve memory reliability in sub-100nm technologies. In 15th IEEE International Conference on Electronics, Circuits and Systems, 2008 (ICECS’08). IEEE, 586--589.Google ScholarGoogle ScholarCross RefCross Ref
  20. Patrick S. Ostler, Michael P. Caffrey, Derrick S. Gibelyou, Paul S. Graham, Keith S. Morgan, Brian H. Pratt, Heather M. Quinn, and Michael J. Wirthlin. 2009. SRAM FPGA reliability analysis for harsh radiation environments. IEEE Transactions on Nuclear Science 56, 6 (2009), 3519--3526.Google ScholarGoogle ScholarCross RefCross Ref
  21. Heather Quinn and Paul Graham. 2005. Terrestrial-based radiation upsets: A cautionary tale. In 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2005 (FCCM’05). IEEE, 193--202. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Zoltan E. Rakosi, Masayuki Hiromoto, Hiroyuki Ochi, and Yukihiro Nakamura. 2009. Hot-swapping architecture extension for mitigation of permanent functional unit faults. In International Conference on Field Programmable Logic and Applications, 2009 (FPL’09). IEEE, 578--581.Google ScholarGoogle ScholarCross RefCross Ref
  23. Jason Williams, Chris Massie, Alan D. George, Justin Richardson, Kunal Gosrani, and Herman Lam. 2010. Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 3, 4 (2010), 19. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Xilinx. 1997. Gate count capacity metrics for FPGAs. Retrieved from https://www.xilinx.com/support/documentation/application_notes/xapp059.pdf.Google ScholarGoogle Scholar
  25. Xilinx. 2014. Radiation-hardened, space-grade Virtex-5QV family overview. Retrieved from https://www.xilinx.com/support/documentation/data_sheets/ds192_V5QV_Device_Overview.pdf.Google ScholarGoogle Scholar
  26. Jae-Sung Yoon, Choonseung Lee, Changsoo Park, Ganghee Lee, Kyungkoo Lee, Sungho Roh, Minsu Jeon, Youngbeom Jung, Jinhong Oh, and Jin-Aeon Lee. 2013. An H. 265/HEVC codec for UHD (3840 2160) capturing and playback. In 2013 International SoC Design Conference (ISOCC’13). IEEE, 218--220.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Fault Recovery Time Analysis for Coarse-Grained Reconfigurable Architectures

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!