Abstract
With an increasing number of processing elements (PEs) integrated on a single chip, fault-tolerant techniques are critical to ensure the reliability of such complex systems. In current reconfigurable architectures, redundant PEs are utilized for fault tolerance. In the presence of faulty PEs, the physical topologies of various chips may be different, so the concept of virtual topology from network embedding problem has been used to alleviate the burden for the operating systems. With limited hardware resources, how to reconfigure a system into the most effective virtual topology such that the maximum repair rate can be reached presents a significant challenge. In this article, a new approach using a maximum flow (MF) algorithm is proposed for an efficient topology reconfiguration in reconfigurable architectures. In this approach, topology reconfiguration is converted into a network flow problem by constructing a directed graph; the solution is then found by using the MF algorithm. This approach optimizes the use of spare PEs with minimal impacts on area, throughput, and delay, and thus it significantly improves the repair rate of faulty PEs. In addition, it achieves a polynomial reconfiguration time. Experimental results show that compared to previous methods, the MF approach increases the probability to repair faulty PEs by up to 50% using the same redundant resources. Compared to a fault-free system, the throughput only decreases by less than 2.5% and latency increases by less than 4%. To consider various types of PEs in a practical application, a cost factor is introduced into the MF algorithm. An enhanced approach using a minimum-cost MF algorithm is further shown to be efficient in the fault-tolerant reconfiguration of heterogeneous reconfigurable architectures.
- ARM. 2014. AMBA Open Specifications. Retrieved April 10, 2015, from http://www.arm.com/products/system-ip/amba/amba-open-specifications.php.Google Scholar
- Shekhar Borkar. 2007. Thousand core chips: A technology perspective. In Proceedings of the 44th Annual Design Automation Conference (DAC’07). ACM, New York, NY, 746--749. DOI:http://dx.doi.org/10.1145/1278480.1278667 Google Scholar
Digital Library
- Yung-Chang Chang, Ching-Te Chiu, Shih-Yin Lin, and Chung-Kai Liu. 2011. On the design and analysis of fault tolerant NoC architecture using spare routers. In Proceedings of the 16th Asia and South Pacific Design Automation Conference (ASPDAC’11). IEEE, Los Alamitos, CA, 431--436. DOI:http://dx.doi.org/10.1109/ASPDAC.2011.5722228 Google Scholar
Digital Library
- Cristian Constantinescu. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 23, 4, 14--19. DOI:http://dx.doi.org/10.1109/MM.2003.1225959 Google Scholar
Digital Library
- William J. Dally and Brain Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proceedings of the 38th Annual Design Automation Conference (DAC’01). ACM, New York, NY, 684--689. DOI:http://dx.doi.org/10.1145/378239.379048 Google Scholar
Digital Library
- Onur Derin, Deniz Kabakci, and Leandro Fiorin. 2011. Online task remapping strategies for fault-tolerant network-on-chip multiprocessors. In Proceedings of the 5th ACM/IEEE International Symposium on Networks-on-Chip (NoCS’11). 129--136. DOI:http://dx.doi.org/10.1145/1999946.1999967 Google Scholar
Digital Library
- Masoumeh Ebrahimi, Masoud Daneshtalab, Fahimeh Farahnakian, Juha Plosila, Pasi Liljeberg, Maurizio Palesi, and HannuTenhunen. 2012. HARAQ congestion-aware learning model for highly adaptive routing algorithm in on-chip networks. In Proceedings of the 6th IEEE/ACM International Symposium on Networks-on-Chip (NoCS’12). 19--26. DOI:http://dx.doi.org/10.1109/NOCS.2012.10 Google Scholar
Digital Library
- Jack Edmonds and Richard M. Karp. 1972. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM 19, 2, 248--264. DOI:http://dx.doi.org/10.1145/321694.321699 Google Scholar
Digital Library
- David Fick, Andrew DeOrio, Jin Hu, Valeria Bertacco, David Blaauw, and Dennis Sylvester. 2009. Vicis: A reliable network for unreliable silicon. In Proceedings of the 46th Annual Design Automation Conference (DAC’09). ACM, New York, NY, 812--817. DOI:http://dx.doi.org/10.1145/1629911.1630119. Google Scholar
Digital Library
- Aysegul Gencata and Biswanath Mukherjee. 2003. Virtual-topology adaptation for WDM mesh networks under dynamic traffic. IEEE/ACM Transactions on Networking 11, 2, 236--247. DOI:http://dx.doi.org/10.1109/TNET.2003.810319 Google Scholar
Digital Library
- Andrew V. Goldberg and Robert E. Tarjan. 1988. A new approach to the maximum-flow problem. Journal of the ACM 35, 4, 921--940. DOI:http://dx.doi.org/10.1145/48014.61051 Google Scholar
Digital Library
- Jie Han and Pieter Jonker. 2003. A defect and fault-tolerant architecture for nanocomputers. Nanotechnology 14, 2, 224--230. DOI:http://dx.doi.org/10.1088/0957-4484/14/2/324Google Scholar
Cross Ref
- IEEE. 2005. IEEE P1500. Retrieved April 10, 2015, from http://grouper.ieee.org/groups/1500/.Google Scholar
- Yahya Jan and Lech Jóźwiak. 2012. Scalable communication architectures for massively parallel hardware multi-processors. Journal of Parallel and Distributed Computing 72, 11, 1450--1463. DOI:http://dx.doi.org/ 10.1016/j.jpdc.2012.01.017 Google Scholar
Digital Library
- Majid Janidarmian, Vahhab S. Bokharaie, Ahmad Khademzadeh, and MisaghTavanpour. 2010. Sorena: New on chip network topology featuring efficient mapping and simple deadlock free routing algorithm. In Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology (CIT’10). 2290--2299. DOI:http://dx.doi.org/10.1109/CIT.2010.395 Google Scholar
Digital Library
- Li Jiang, Qiang Xu, and Bill Eklow. 2012. On effective TSV repair for 3D-stacked ICs. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’12). 793--798. DOI:http://dx.doi.org/10.1109/DATE.2012.6176602 Google Scholar
Digital Library
- Uksong Kang, Hoe-Ju Chung, Seongmoo Heo, Duk-Ha Park, Hoon Lee, Jin Ho Kim, Soon-Hong Ahn, Soo-Ho Cha, Jaesung Ahn, DukMin Kwon, Jae-Wook Lee, Han-Sung Joo, Woo-Seop Kim, Dong Hyeon Jang, Nam Seog Kim, Jung-Hwan Choi, Tae-Gyeong Chung, Jei-Hwan Yoo, Joo Sun Choi, Changhyun Kim, and Young-Hyun Jun. 2010. 8 Gb 3-D DDR3 DRAM using through-silicon-via technology. IEEE Journal of Solid-State Circuits 45, 1, 111--119. DOI:http://dx.doi.org/10.1109/JSSC.2009.2034408Google Scholar
Cross Ref
- Heikki Kariniemi and Jari Nurmi. 2005. Fault-tolerant XGFT network-on-chip for multi-processor system-on chip circuits. In Proceedings of the International Conference on Field Programmable Logic and Applications. 203--210. DOI:http://dx.doi.org/10.1109/FPL.2005.1515723Google Scholar
Cross Ref
- Alexander V. Karzanov. 1974. Determining the maximal flow in a network by the approach of pre-flows. Souviet Mathematics Doklady 15, 434--437.Google Scholar
- Israel Koren and Dhiraj K. Pradhan. 1986. Yield and performance enhancement through redundancy in VLSI and WSI multiprocessor systems. Proceedings of the IEEE 74, 5, 699--711. DOI:http://dx.doi.org/10.1109/PROC.1986.13532Google Scholar
Cross Ref
- Mincent Lee, Li-Ming Deng, and Cheng-Wen Wu. 2011. A memory built-in self-repair scheme based on configurable spares. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 6, 919--929. DOI:http://dx.doi.org/10.1109/TCAD.2011.2106812 Google Scholar
Digital Library
- Hridoy Jyoti Mahanta, Arijit Biswas, and Muhammad Awais Hussain. 2014. Networks on chip: The new trend of on-chip interconnection. In Proceedings of the 2014 4th International Conference on Communication Systems and Network Technologies (CSNT’14). 1050--1053. DOI:http://dx.doi.org/10.1109/CSNT.2014.214 Google Scholar
Digital Library
- Alfonsas Misevicius. 2003. A modified simulated annealing algorithm for the quadratic assignment problem. Informatica 14, 4, 497--514. Google Scholar
Digital Library
- Martin Radetzki, Chaochao Feng, Xueqian Zhao, and Axel Jantsch. 2013. Methods for fault tolerance in networks-on-chip. ACM Computing Surveys 46, 1, Article No. 8. DOI:http://dx.doi.org/10.1145/2522968.2522976 Google Scholar
Digital Library
- Yu Ren, Leibo Liu, Shouyi Yin, Jie Han, Qinghua Wu, and Shaojun Wei. 2013. A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration. Journal of Systems Architecture 59, 7, 482--491. DOI:http://dx.doi.org/10.1016/j.sysarc.2013.03.010 Google Scholar
Digital Library
- Davide Rossi, Fabio Campi, Simone Spolzino, Stefano Pucillo, and Roberto Gueerieri. 2010. A heterogeneous digital signal processor for dynamically reconfigurable computing. IEEE Journal of Solid-State Circuits 45, 8, 1615--1626. DOI:http://dx.doi.org/10.1109/JSSC.2010.2048149Google Scholar
Cross Ref
- Suleyman Sair and Youngsoo Kim. 2005. Designing real-time H. 264 decoders with dataflow architectures. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 291--296. DOI:http://dx.doi.org/10.1145/1084834.1084906 Google Scholar
Digital Library
- Steven L. Scott and Gregory M. Thorson. 1996. The Cray T3E network: Adaptive routing in a high performance 3D torus. In Proceedings of the Hot Interconnects IV Symposium. 147--156.Google Scholar
- William Stallings. 2011. Operating Systems: Internals and Design Principles (7th ed.). Prentice Hall. Google Scholar
Digital Library
- Mikkel B. Stensgaard and Jens Sparso. 2008. ReNoC: A network-on-chip architecture with reconfigurable topology. In Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip (NoCS’08). 55--64. DOI:http://dx.doi.org/10.1109/NOCS.2008.4492725 Google Scholar
Digital Library
- P. J. Tan, Tung Le, Keng-Hian Ng, Prasad Mantri, and James Westfall. 2006. Testing of UltraSPARC T1 microprocessor and its challenges. In Proceedings of the IEEE International Test Conference (ITC’06). 1--10. DOI:http://dx.doi.org/10.1109/TEST.2006.297637Google Scholar
Cross Ref
- Theodora A. Varvarigou, Vwani P. Roychowdhury, and Thomas Kailath. 1993. Reconfiguring processor arrays using multiple-track models: The 3-track-1-spare-approach. IEEE Transactions on Computers 42, 11, 1281--1293. DOI:http://dx.doi.org/10.1109/12.247834 Google Scholar
Digital Library
- Arseniy Vitkovskiy, Vassos Soteriou, and Chrysostomos Nicopoulos. 2012. A dynamically adjusting gracefully degrading link-level fault-tolerant mechanism for NoCs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 8, 1235--1248. DOI:http://dx.doi.org/10.1109/TCAD.2012.2188801 Google Scholar
Digital Library
- Jiunn-Chin Wang. 2012. A multistart simulated annealing algorithm for the quadratic assignment problem. In Proceedings of the 3rd International Conference on Innovations in Bio-Inspired Computing and Applications (IBICA’12). 19--23. DOI:http://dx.doi.org/10.1109/IBICA.2012.56 Google Scholar
Digital Library
- Seongmoon Wang. 2007. A BIST TPG for low power dissipation and high fault coverage. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 15, 7, 777--789. DOI:http://dx.doi.org/10.1109/TVLSI.2007.899234 Google Scholar
Digital Library
- Feng Yuan, Lin Huang, and Qiang Xu. 2008. Re-examining the use of network-on-chip as test access mechanism. In Proceedings of IEEE Design, Automation, and Test in Europe (DATE’08). 808--811. DOI:http://dx.doi.org/10.1109/DATE.2008.4484917 Google Scholar
Digital Library
- Lei Zhang, Yinhe Han, Qiang Xu, Xiaowei Li, and Huawei Li. 2009. On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 9, 1173--1186. DOI:http://dx.doi.org/10.1109/TVLSI.2008.2002108 Google Scholar
Digital Library
Index Terms
Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow Algorithm
Recommendations
An efficient reconfiguration scheme for fault-tolerant meshes
A new reconfiguration scheme, including a reconfiguration algorithm, is proposed in this paper to lift up the fault tolerance and system reconfiguration abilities for the mesh topology. The scheme adds redundancies--spare nodes and links--to the mesh ...
A Reconfiguration-Based Fault-Tolerant Anti-Lock Brake-by-Wire System
Anti-Lock Braking Systems (ABS) and Brake-by-Wire Systems (BBW) are safety-critical applications by nature. Such systems are required to demonstrate high degrees of dependability. Fault-tolerance is the primary means to achieve dependability at runtime ...
A Fault Tolerant Technique for FPGAs
In this paper we present a fault tolerant (FT) technique for field programmable gate arrays (FPGAs) that is based on incrementally reconfiguring circuits and applications that have been previously placed and routed. Our technique targets both logic ...






Comments