skip to main content
research-article

Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow Algorithm

Published:19 May 2015Publication History
Skip Abstract Section

Abstract

With an increasing number of processing elements (PEs) integrated on a single chip, fault-tolerant techniques are critical to ensure the reliability of such complex systems. In current reconfigurable architectures, redundant PEs are utilized for fault tolerance. In the presence of faulty PEs, the physical topologies of various chips may be different, so the concept of virtual topology from network embedding problem has been used to alleviate the burden for the operating systems. With limited hardware resources, how to reconfigure a system into the most effective virtual topology such that the maximum repair rate can be reached presents a significant challenge. In this article, a new approach using a maximum flow (MF) algorithm is proposed for an efficient topology reconfiguration in reconfigurable architectures. In this approach, topology reconfiguration is converted into a network flow problem by constructing a directed graph; the solution is then found by using the MF algorithm. This approach optimizes the use of spare PEs with minimal impacts on area, throughput, and delay, and thus it significantly improves the repair rate of faulty PEs. In addition, it achieves a polynomial reconfiguration time. Experimental results show that compared to previous methods, the MF approach increases the probability to repair faulty PEs by up to 50% using the same redundant resources. Compared to a fault-free system, the throughput only decreases by less than 2.5% and latency increases by less than 4%. To consider various types of PEs in a practical application, a cost factor is introduced into the MF algorithm. An enhanced approach using a minimum-cost MF algorithm is further shown to be efficient in the fault-tolerant reconfiguration of heterogeneous reconfigurable architectures.

References

  1. ARM. 2014. AMBA Open Specifications. Retrieved April 10, 2015, from http://www.arm.com/products/system-ip/amba/amba-open-specifications.php.Google ScholarGoogle Scholar
  2. Shekhar Borkar. 2007. Thousand core chips: A technology perspective. In Proceedings of the 44th Annual Design Automation Conference (DAC’07). ACM, New York, NY, 746--749. DOI:http://dx.doi.org/10.1145/1278480.1278667 Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Yung-Chang Chang, Ching-Te Chiu, Shih-Yin Lin, and Chung-Kai Liu. 2011. On the design and analysis of fault tolerant NoC architecture using spare routers. In Proceedings of the 16th Asia and South Pacific Design Automation Conference (ASPDAC’11). IEEE, Los Alamitos, CA, 431--436. DOI:http://dx.doi.org/10.1109/ASPDAC.2011.5722228 Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Cristian Constantinescu. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 23, 4, 14--19. DOI:http://dx.doi.org/10.1109/MM.2003.1225959 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. William J. Dally and Brain Towles. 2001. Route packets, not wires: On-chip interconnection networks. In Proceedings of the 38th Annual Design Automation Conference (DAC’01). ACM, New York, NY, 684--689. DOI:http://dx.doi.org/10.1145/378239.379048 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Onur Derin, Deniz Kabakci, and Leandro Fiorin. 2011. Online task remapping strategies for fault-tolerant network-on-chip multiprocessors. In Proceedings of the 5th ACM/IEEE International Symposium on Networks-on-Chip (NoCS’11). 129--136. DOI:http://dx.doi.org/10.1145/1999946.1999967 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Masoumeh Ebrahimi, Masoud Daneshtalab, Fahimeh Farahnakian, Juha Plosila, Pasi Liljeberg, Maurizio Palesi, and HannuTenhunen. 2012. HARAQ congestion-aware learning model for highly adaptive routing algorithm in on-chip networks. In Proceedings of the 6th IEEE/ACM International Symposium on Networks-on-Chip (NoCS’12). 19--26. DOI:http://dx.doi.org/10.1109/NOCS.2012.10 Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Jack Edmonds and Richard M. Karp. 1972. Theoretical improvements in algorithmic efficiency for network flow problems. Journal of the ACM 19, 2, 248--264. DOI:http://dx.doi.org/10.1145/321694.321699 Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. David Fick, Andrew DeOrio, Jin Hu, Valeria Bertacco, David Blaauw, and Dennis Sylvester. 2009. Vicis: A reliable network for unreliable silicon. In Proceedings of the 46th Annual Design Automation Conference (DAC’09). ACM, New York, NY, 812--817. DOI:http://dx.doi.org/10.1145/1629911.1630119. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Aysegul Gencata and Biswanath Mukherjee. 2003. Virtual-topology adaptation for WDM mesh networks under dynamic traffic. IEEE/ACM Transactions on Networking 11, 2, 236--247. DOI:http://dx.doi.org/10.1109/TNET.2003.810319 Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Andrew V. Goldberg and Robert E. Tarjan. 1988. A new approach to the maximum-flow problem. Journal of the ACM 35, 4, 921--940. DOI:http://dx.doi.org/10.1145/48014.61051 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Jie Han and Pieter Jonker. 2003. A defect and fault-tolerant architecture for nanocomputers. Nanotechnology 14, 2, 224--230. DOI:http://dx.doi.org/10.1088/0957-4484/14/2/324Google ScholarGoogle ScholarCross RefCross Ref
  13. IEEE. 2005. IEEE P1500. Retrieved April 10, 2015, from http://grouper.ieee.org/groups/1500/.Google ScholarGoogle Scholar
  14. Yahya Jan and Lech Jóźwiak. 2012. Scalable communication architectures for massively parallel hardware multi-processors. Journal of Parallel and Distributed Computing 72, 11, 1450--1463. DOI:http://dx.doi.org/ 10.1016/j.jpdc.2012.01.017 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Majid Janidarmian, Vahhab S. Bokharaie, Ahmad Khademzadeh, and MisaghTavanpour. 2010. Sorena: New on chip network topology featuring efficient mapping and simple deadlock free routing algorithm. In Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology (CIT’10). 2290--2299. DOI:http://dx.doi.org/10.1109/CIT.2010.395 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Li Jiang, Qiang Xu, and Bill Eklow. 2012. On effective TSV repair for 3D-stacked ICs. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’12). 793--798. DOI:http://dx.doi.org/10.1109/DATE.2012.6176602 Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Uksong Kang, Hoe-Ju Chung, Seongmoo Heo, Duk-Ha Park, Hoon Lee, Jin Ho Kim, Soon-Hong Ahn, Soo-Ho Cha, Jaesung Ahn, DukMin Kwon, Jae-Wook Lee, Han-Sung Joo, Woo-Seop Kim, Dong Hyeon Jang, Nam Seog Kim, Jung-Hwan Choi, Tae-Gyeong Chung, Jei-Hwan Yoo, Joo Sun Choi, Changhyun Kim, and Young-Hyun Jun. 2010. 8 Gb 3-D DDR3 DRAM using through-silicon-via technology. IEEE Journal of Solid-State Circuits 45, 1, 111--119. DOI:http://dx.doi.org/10.1109/JSSC.2009.2034408Google ScholarGoogle ScholarCross RefCross Ref
  18. Heikki Kariniemi and Jari Nurmi. 2005. Fault-tolerant XGFT network-on-chip for multi-processor system-on chip circuits. In Proceedings of the International Conference on Field Programmable Logic and Applications. 203--210. DOI:http://dx.doi.org/10.1109/FPL.2005.1515723Google ScholarGoogle ScholarCross RefCross Ref
  19. Alexander V. Karzanov. 1974. Determining the maximal flow in a network by the approach of pre-flows. Souviet Mathematics Doklady 15, 434--437.Google ScholarGoogle Scholar
  20. Israel Koren and Dhiraj K. Pradhan. 1986. Yield and performance enhancement through redundancy in VLSI and WSI multiprocessor systems. Proceedings of the IEEE 74, 5, 699--711. DOI:http://dx.doi.org/10.1109/PROC.1986.13532Google ScholarGoogle ScholarCross RefCross Ref
  21. Mincent Lee, Li-Ming Deng, and Cheng-Wen Wu. 2011. A memory built-in self-repair scheme based on configurable spares. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 30, 6, 919--929. DOI:http://dx.doi.org/10.1109/TCAD.2011.2106812 Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Hridoy Jyoti Mahanta, Arijit Biswas, and Muhammad Awais Hussain. 2014. Networks on chip: The new trend of on-chip interconnection. In Proceedings of the 2014 4th International Conference on Communication Systems and Network Technologies (CSNT’14). 1050--1053. DOI:http://dx.doi.org/10.1109/CSNT.2014.214 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Alfonsas Misevicius. 2003. A modified simulated annealing algorithm for the quadratic assignment problem. Informatica 14, 4, 497--514. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Martin Radetzki, Chaochao Feng, Xueqian Zhao, and Axel Jantsch. 2013. Methods for fault tolerance in networks-on-chip. ACM Computing Surveys 46, 1, Article No. 8. DOI:http://dx.doi.org/10.1145/2522968.2522976 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Yu Ren, Leibo Liu, Shouyi Yin, Jie Han, Qinghua Wu, and Shaojun Wei. 2013. A fault tolerant NoC architecture using quad-spare mesh topology and dynamic reconfiguration. Journal of Systems Architecture 59, 7, 482--491. DOI:http://dx.doi.org/10.1016/j.sysarc.2013.03.010 Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Davide Rossi, Fabio Campi, Simone Spolzino, Stefano Pucillo, and Roberto Gueerieri. 2010. A heterogeneous digital signal processor for dynamically reconfigurable computing. IEEE Journal of Solid-State Circuits 45, 8, 1615--1626. DOI:http://dx.doi.org/10.1109/JSSC.2010.2048149Google ScholarGoogle ScholarCross RefCross Ref
  27. Suleyman Sair and Youngsoo Kim. 2005. Designing real-time H. 264 decoders with dataflow architectures. In Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. 291--296. DOI:http://dx.doi.org/10.1145/1084834.1084906 Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Steven L. Scott and Gregory M. Thorson. 1996. The Cray T3E network: Adaptive routing in a high performance 3D torus. In Proceedings of the Hot Interconnects IV Symposium. 147--156.Google ScholarGoogle Scholar
  29. William Stallings. 2011. Operating Systems: Internals and Design Principles (7th ed.). Prentice Hall. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Mikkel B. Stensgaard and Jens Sparso. 2008. ReNoC: A network-on-chip architecture with reconfigurable topology. In Proceedings of the 2nd ACM/IEEE International Symposium on Networks-on-Chip (NoCS’08). 55--64. DOI:http://dx.doi.org/10.1109/NOCS.2008.4492725 Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. P. J. Tan, Tung Le, Keng-Hian Ng, Prasad Mantri, and James Westfall. 2006. Testing of UltraSPARC T1 microprocessor and its challenges. In Proceedings of the IEEE International Test Conference (ITC’06). 1--10. DOI:http://dx.doi.org/10.1109/TEST.2006.297637Google ScholarGoogle ScholarCross RefCross Ref
  32. Theodora A. Varvarigou, Vwani P. Roychowdhury, and Thomas Kailath. 1993. Reconfiguring processor arrays using multiple-track models: The 3-track-1-spare-approach. IEEE Transactions on Computers 42, 11, 1281--1293. DOI:http://dx.doi.org/10.1109/12.247834 Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Arseniy Vitkovskiy, Vassos Soteriou, and Chrysostomos Nicopoulos. 2012. A dynamically adjusting gracefully degrading link-level fault-tolerant mechanism for NoCs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 31, 8, 1235--1248. DOI:http://dx.doi.org/10.1109/TCAD.2012.2188801 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. Jiunn-Chin Wang. 2012. A multistart simulated annealing algorithm for the quadratic assignment problem. In Proceedings of the 3rd International Conference on Innovations in Bio-Inspired Computing and Applications (IBICA’12). 19--23. DOI:http://dx.doi.org/10.1109/IBICA.2012.56 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. Seongmoon Wang. 2007. A BIST TPG for low power dissipation and high fault coverage. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 15, 7, 777--789. DOI:http://dx.doi.org/10.1109/TVLSI.2007.899234 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Feng Yuan, Lin Huang, and Qiang Xu. 2008. Re-examining the use of network-on-chip as test access mechanism. In Proceedings of IEEE Design, Automation, and Test in Europe (DATE’08). 808--811. DOI:http://dx.doi.org/10.1109/DATE.2008.4484917 Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Lei Zhang, Yinhe Han, Qiang Xu, Xiaowei Li, and Huawei Li. 2009. On topology reconfiguration for defect-tolerant NoC-based homogeneous manycore systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 9, 1173--1186. DOI:http://dx.doi.org/10.1109/TVLSI.2008.2002108 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow Algorithm

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM Transactions on Reconfigurable Technology and Systems
        ACM Transactions on Reconfigurable Technology and Systems  Volume 8, Issue 3
        May 2015
        153 pages
        ISSN:1936-7406
        EISSN:1936-7414
        DOI:10.1145/2770880
        • Editor:
        • Steve Wilton
        Issue’s Table of Contents

        Copyright © 2015 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 19 May 2015
        • Accepted: 1 October 2014
        • Revised: 1 September 2014
        • Received: 1 December 2013
        Published in trets Volume 8, Issue 3

        Permissions

        Request permissions about this article.

        Request Permissions

        Check for updates

        Qualifiers

        • research-article
        • Research
        • Refereed

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!