Abstract
Coarse-grained reconfigurable architectures (CGRAs) have drawn increasing attention due to their performance and flexibility advantages. Typically, CGRAs incorporate many processing elements in the form of an array, which is suitable for implementing spatial redundancy, as used in the design of fault-tolerant systems. This article introduces a recovery time model for transient faults in CGRAs. The proposed fault-tolerant CGRAs are based on triple modular redundancy and coding techniques for error detection and correction. To evaluate the model, several kernels from space computing are mapped onto the suggested architecture. We demonstrate the tradeoff between recovery time, performance, and area. In addition, the average execution time of an application including recovery time is evaluated using area-based error-rate estimates in harsh radiation environments. The results show that task partitioning is important for bounding the recovery time of applications that have long execution times. It is also shown that error-correcting code (ECC) is of limited practical value for tasks with long execution times in high radiation environments, or when the degree of task partitioning is high.
- Dawood Alnajjar, Hiroaki Konoura, Younghun Ko, Yukio Mitsuyama, Masanori Hashimoto, and Takao Onoye. 2013. Implementing flexible reliability in a coarse-grained reconfigurable architecture. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, 12 (2013), 2165--2178. Google Scholar
Digital Library
- Lorena Anghel and Michael Nicolaidis. 2000. Cost reduction and evaluation of temporary faults detecting technique. In Proceedings of the conference on Design, Automation and Test in Europe. ACM, 591--598. Google Scholar
Digital Library
- Muhammad Moazam Azeem, Stanislaw J. Piestrak, Olivier Sentieys, and Sebastien Pillement. 2011. Error recovery technique for coarse-grained reconfigurable architectures. In 2011 IEEE 14th International Symposium on Design and Diagnostics of Electronic Circuits 8 Systems (DDECS’11). IEEE, 441--446.Google Scholar
Cross Ref
- Ediz Cetin, Oliver Diessel, Lingkan Gong, and Victor Lai. 2013. Towards bounded error recovery time in FPGA-based TMR circuits using dynamic partial reconfiguration. In 2013 23rd International Conference on Field Programmable Logic and Applications (FPL’13). IEEE, 1--4.Google Scholar
Cross Ref
- Ediz Cetin, Oliver Diessel, Tuo Li, Jude A. Ambrose, Thomas Fisk, Sri Parameswaran, and Andrew G. Dempster. 2016. Overview and investigation of SEU detection and recovery approaches for FPGA-based heterogeneous systems. In FPGAs and Parallel Architectures for Aerospace Applications. Springer, 33--46.Google Scholar
- Jason A. Cheatham, John M. Emmert, and Stan Baumgart. 2006. A survey of fault tolerant methodologies for FPGAs. ACM Transactions on Design Automation of Electronic Systems (TODAES) 11, 2 (2006), 501--533. Google Scholar
Digital Library
- G. Cieslewski, A. Jacobs, C. Conger, and A. George. 2008. Advanced space computing with system-level fault tolerance (invited talk). In 1st Workshop on Fault-Tolerant Spaceborne Computing-Employing New Technologies.Google Scholar
- L. J. Deutsch and R. L. Miller. 1982. The effects of Viterbi decoder node synchronization losses on the telemetry receiving system. Jet Propulsion Lab., Pasadena, CA, TDA Progress Rep 42 (1982), 68.Google Scholar
- Robért Glein, Florian Rittner, Andreas Becher, Daniel Ziener, Jürgen Frickel, Jürgen Teich, and Albert Heuberger. 2015. Reliability of space-grade vs. COTS SRAM-based FPGA in N-modular redundancy. In 2015 NASA/ESA Conference on Adaptive Hardware and Systems (AHS’15). IEEE, 1--8.Google Scholar
Cross Ref
- Richard W. Hamming. 1950. Error detecting and error correcting codes. Bell Labs Technical Journal 29, 2 (1950), 147--160.Google Scholar
Cross Ref
- Kyuseung Han, Ganghee Lee, and Kiyoung Choi. 2014. Software-level approaches for tolerating transient faults in a coarse-grained reconfigurable architecture. IEEE Transactions on Dependable and Secure Computing 11, 4 (2014), 392--398.Google Scholar
Cross Ref
- Reiner Hartenstein. 2001. Coarse grain reconfigurable architectures. In Proceedings of the Design Automation Conference, 2001 (ASP-DAC’01), Asia and South Pacific. IEEE, 564--569. Google Scholar
Digital Library
- David F. Heidel, Paul W. Marshall, Kenneth A. LaBel, James R. Schwank, Kenneth P. Rodbell, Mark C. Hakey, Melanie D. Berg, Paul E. Dodd, Mark R. Friendlich, Anthony D. Phan, Christina M. Seidleck, Marty R. Shaneyfelt, and Michael A. Xapsos. 2008. Low energy proton single-event-upset test results on 65 nm SOI SRAM. IEEE Transactions on Nuclear Science 55, 6 (2008), 3394--3400.Google Scholar
Cross Ref
- Yoonjin Kim and Rabi N. Mahapatra. 2011. Design of Low-Power Coarse-Grained Reconfigurable Architectures. CRC Press, Boca Raton. Google Scholar
Digital Library
- Hiroaki Konoura, Dawood Alnajjar, Yukio Mitsuyama, Hajime Shimada, Kazutoshi Kobayashi, Hiroyuki Kanbara, Hiroyuki Ochi, Takashi Imagawa, Kazutoshi Wakabayashix, Masanori Hashimoto, Takao Onoye, and Hidetoshi Onoderas. 2014. Reliability-configurable mixed-grained reconfigurable array supporting c-based design and its irradiation testing. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 97, 12 (2014), 2518--2529.Google Scholar
Cross Ref
- Ganghee Lee, Kiyoung Choi, and Nikil D. Dutt. 2011. Mapping multi-domain applications onto coarse-grained reconfigurable architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 5 (2011), 637--650. Google Scholar
Digital Library
- Tyler M. Lovelly, Donavon Bryan, Kevin Cheng, Rachel Kreynin, Alan D. George, Ann Gordon-Ross, and Gabriel Mounce. 2014. A framework to analyze processor architectures for next-generation on-board space computing. In 2014 IEEE Aerospace Conference. IEEE, 1--10.Google Scholar
Cross Ref
- Mathstar. 2007. Arrix Family FPOA architecture guide. Retrieved from http://www.mathstar.com.Google Scholar
- Riaz Naseer and Jeff Draper. 2008. DEC ECC design to improve memory reliability in sub-100nm technologies. In 15th IEEE International Conference on Electronics, Circuits and Systems, 2008 (ICECS’08). IEEE, 586--589.Google Scholar
Cross Ref
- Patrick S. Ostler, Michael P. Caffrey, Derrick S. Gibelyou, Paul S. Graham, Keith S. Morgan, Brian H. Pratt, Heather M. Quinn, and Michael J. Wirthlin. 2009. SRAM FPGA reliability analysis for harsh radiation environments. IEEE Transactions on Nuclear Science 56, 6 (2009), 3519--3526.Google Scholar
Cross Ref
- Heather Quinn and Paul Graham. 2005. Terrestrial-based radiation upsets: A cautionary tale. In 13th Annual IEEE Symposium on Field-Programmable Custom Computing Machines, 2005 (FCCM’05). IEEE, 193--202. Google Scholar
Digital Library
- Zoltan E. Rakosi, Masayuki Hiromoto, Hiroyuki Ochi, and Yukihiro Nakamura. 2009. Hot-swapping architecture extension for mitigation of permanent functional unit faults. In International Conference on Field Programmable Logic and Applications, 2009 (FPL’09). IEEE, 578--581.Google Scholar
Cross Ref
- Jason Williams, Chris Massie, Alan D. George, Justin Richardson, Kunal Gosrani, and Herman Lam. 2010. Characterization of fixed and reconfigurable multi-core devices for application acceleration. ACM Transactions on Reconfigurable Technology and Systems (TRETS) 3, 4 (2010), 19. Google Scholar
Digital Library
- Xilinx. 1997. Gate count capacity metrics for FPGAs. Retrieved from https://www.xilinx.com/support/documentation/application_notes/xapp059.pdf.Google Scholar
- Xilinx. 2014. Radiation-hardened, space-grade Virtex-5QV family overview. Retrieved from https://www.xilinx.com/support/documentation/data_sheets/ds192_V5QV_Device_Overview.pdf.Google Scholar
- Jae-Sung Yoon, Choonseung Lee, Changsoo Park, Ganghee Lee, Kyungkoo Lee, Sungho Roh, Minsu Jeon, Youngbeom Jung, Jinhong Oh, and Jin-Aeon Lee. 2013. An H. 265/HEVC codec for UHD (3840 2160) capturing and playback. In 2013 International SoC Design Conference (ISOCC’13). IEEE, 218--220.Google Scholar
Cross Ref
Index Terms
Fault Recovery Time Analysis for Coarse-Grained Reconfigurable Architectures
Recommendations
Low-Cost TMR for Fault-Tolerance on Coarse-Grained Reconfigurable Architectures
RECONFIG '11: Proceedings of the 2011 International Conference on Reconfigurable Computing and FPGAsHardware redundancy is a common method for improving the reliability of a system. The disadvantage of this approach is the hardware overhead and the additional power consumption. This contribution proposes a strategy for implementing low-cost triple ...
Dynamic context management for low power coarse-grained reconfigurable architecture
GLSVLSI '09: Proceedings of the 19th ACM Great Lakes symposium on VLSICoarse-grained reconfigurable architectures (CGRA) require many processing elements (PEs) and a configuration memory unit (configuration cache) for reconfiguration of its PE array. Al-though this structure is meant for high performance and flexibility, ...
Low power reconfiguration technique for coarse-grained reconfigurable architecture
Coarse-grained reconfigurable architectures (CGRAs) require many processing elements (PEs) and a configuration memory unit (configuration cache) for reconfiguration of its PE array. Although this structure is meant for high performance and flexibility, ...






Comments