skip to main content
research-article

Fault-Tolerant Dynamic Task Mapping and Scheduling for Network-on-Chip-Based Multicore Platform

Authors Info & Claims
Published:26 May 2017Publication History
Skip Abstract Section

Abstract

In Network-on-Chip (NoC)-based multicore systems, task allocation and scheduling are known to be important problems, as they affect the performance of applications in terms of energy consumption and timing. Advancement of deep submicron technology has made it possible to scale the transistor feature size to the nanometer range, which has enabled multiple processing elements to be integrated onto a single chip. On the flipside, it has made the integrated entities on the chip more susceptible to different faults. Although a significant amount of work has been done in the domain of fault-tolerant mapping and scheduling, existing algorithms either precompute reconfigured mapping solutions at design time while anticipating fault(s) scenarios or adopt a hybrid approach wherein a part of the fault mitigation strategy relies on the design-time solution. The complexity of the problem rises further for real-time dynamic systems where new applications can arrive in the multicore platform at any time instant. For real-time systems, the validity of computation depends both on the correctness of results and on temporal constraint satisfaction. This article presents an improved fault-tolerant dynamic solution to the integrated problem of application mapping and scheduling for NoC-based multicore platforms. The developed algorithm provides a unified mapping and scheduling method for real-time systems focusing on meeting application deadlines and minimizing communication energy. A predictive model has been used to determine the failure-prone cores in the system for which a fault-tolerant resource allocation with task redundancy has been performed. By selectively using a task replication policy, the reliability of the application, executing on a given NoC platform, is improved. A detailed evaluation of the performance of the proposed algorithm has been conducted for both real and synthetic applications. When compared with other fault-tolerant algorithms reported in the literature, performance of the proposed algorithm shows an average reduction of 56.95% in task re-execution time overhead and an average improvement of 31% in communication energy. Further, for time-constrained tasks, deadline satisfaction has also been achieved for most of the test cases by the developed algorithm, whereas the techniques reported in the literature failed to meet deadline in about 45% test cases.

References

  1. R. Ahmed, P. Ramanathan, and K. K. Saluja. 2014. Necessary and sufficient conditions for thermal schedulability of periodic real-time tasks. In Proceedings of the 2014 26th Euromicro Conference on Real-Time Systems (ECRTS’14). 243--252. DOI:http://dx.doi.org/10.1109/ECRTS.2014.15 Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Ahn, J. Kim, and S. Hong. 1997. Fault-tolerant real-time scheduling using passive replicas. In Proceedings of the Pacific Rim International Symposium on Fault-Tolerant Systems, 1997. 98--103. DOI:http://dx.doi.org/10.1109/PRFTS.1997.640132 Google ScholarGoogle ScholarCross RefCross Ref
  3. O. Arnold and G. Fettweis. 2011. Resilient dynamic task scheduling for unreliable heterogeneous MPSoCs. 2011 Semiconductor Conference Dresden, Dresden, 1--4. DOI:10.1109/SCD.2011.6068747 Google ScholarGoogle ScholarCross RefCross Ref
  4. L. Benini and G. De Micheli. 2002. Networks on chips: A new SoC paradigm. Computer 35, 1 (Jan. 2002), 70--78. DOI:http://dx.doi.org/10.1109/2.976921 Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. F. Bolanos, F. Rivera, J. E. Aedo, and N. Bagherzadeh. 2013. From UML specifications to mapping and scheduling of tasks into a NoC, with reliability considerations. J. Syst. Archit. 59, 7 (Aug. 2013), 429--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. S. Borkar, T. Karnik, and V. De. 2004. Design and reliability challenges in nanometer technologies. In Proceedings of the 41st Annual Design Automation Conference (DAC’04). ACM, New York, NY, 75--75. DOI:http://dx.doi.org/10.1145/996566.996588 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. E. Carvalho and F. Moraes. 2008. Congestion-aware task mapping in heterogeneous MPSoCs. In Proceedings of the International Symposium on System-on-Chip, 2008 (SOC’08). 1--4. DOI:http://dx.doi.org/10.1109/ISSOC.2008.4694878 Google ScholarGoogle ScholarCross RefCross Ref
  8. Y. C. Chang, C. T. Chiu, S. Y. Lin, and C. K. Liu. 2011. On the design and analysis of fault tolerant NoC architecture using spare routers. In Proceedings of the 2011 16th Asia and South Pacific Design Automation Conference (ASP-DAC’11). 431--436. DOI:http://dx.doi.org/10.1109/ASPDAC.2011.5722228 Google ScholarGoogle ScholarCross RefCross Ref
  9. H.-L. Chao, S.-Y. Tung, and P.-A. Hsiung. 2016. Dynamic task mapping with congestion speculation for reconfigurable network-on-chip. ACM Trans. Reconfig. Technol. Syst. 10, 1, Article 3 (Sept. 2016), 25 pages. DOI:http://dx.doi.org/10.1145/2892633 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. N. Chatterjee, N. Prasad, and S. Chattapadhyay. 2014. A spare link based reliable network-on-chip design. In Proceedings of the 18th International Symposium on VLSI Design and Test. 1--6. DOI:http://dx.doi.org/10.1109/ISVDAT.2014.6881036 Google ScholarGoogle ScholarCross RefCross Ref
  11. C. L. Chou and R. Marculescu. 2011. FARM: Fault-aware resource management in NoC-based multiprocessor platforms. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’11). 1--6. DOI:http://dx.doi.org/10.1109/DATE.2011.5763113 Google ScholarGoogle ScholarCross RefCross Ref
  12. C. Constantinescu. 2002. Impact of deep submicron technology on dependability of VLSI circuits. In Proceedings of the International Conference on Dependable Systems and Networks, 2002 (DSN’02). 205--209. DOI:http://dx.doi.org/10.1109/DSN.2002.1028901 Google ScholarGoogle ScholarCross RefCross Ref
  13. C. Constantinescu. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 23, 4 (July 2003), 14--19. DOI:http://dx.doi.org/10.1109/MM.2003.1225959 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Das and A. Kumar. 2012. Fault-aware task re-mapping for throughput constrained multimedia applications on NoC-based MPSoCs. In Proceedings of the 2012 23rd IEEE International Symposium on Rapid System Prototyping (RSP’12). 149--155. DOI:http://dx.doi.org/10.1109/RSP.2012.6380704 Google ScholarGoogle ScholarCross RefCross Ref
  15. A. Das, A. Kumar, and B. Veeravalli. 2013. Reliability-driven task mapping for lifetime extension of networks-on-chip based multiprocessor systems. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’13). 689--694. DOI:http://dx.doi.org/10.7873/DATE.2013.149 Google ScholarGoogle ScholarCross RefCross Ref
  16. A. Das, A. Kumar, and B. Veeravalli. 2014. Communication and migration energy aware task mapping for reliable multiprocessor systems. Future Generation Comput. Syst. 30 (2014), 216--228. DOI:http://dx.doi.org/10.1016/j.future.2013.06.016 Special Issue on Extreme Scale Parallel Architectures and Systems, Cryptography in Cloud Computing and Recent Advances in Parallel and Distributed Systems, {ICPADS} 2012 Selected Papers. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. A. Das, A. K. Singh, and A. Kumar. 2013. Energy-aware dynamic reconfiguration of communication-centric applications for reliable MPSoCs. In Proceedings of the 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC’13). 1--7. DOI:http://dx.doi.org/10.1109/ReCoSoC.2013.6581540 Google ScholarGoogle ScholarCross RefCross Ref
  18. A. Das, A. Kumar Singh, and A. Kumar. 2015. Execution trace--driven energy-reliability optimization for multimedia MPSoCs. ACM Trans. Reconfig. Technol. Syst. 8, 3, Article 18 (May 2015), 19 pages. DOI:http://dx.doi.org/10.1145/2665071 Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign. IEEE Computer Society, 97--101. Google ScholarGoogle ScholarCross RefCross Ref
  20. P. Eles, V. Izosimov, P. Pop, and Z. Peng. 2008. Synthesis of fault-tolerant embedded systems. In Proceedings of the Design, Automation and Test in Europe, 2008 (DATE’08). 1117--1122. DOI:http://dx.doi.org/10.1109/DATE.2008.4484825 Google ScholarGoogle ScholarCross RefCross Ref
  21. D. Fick, A. DeOrio, G. Chen, V. Bertacco, D. Sylvester, and D. Blaauw. 2009. A highly resilient routing algorithm for fault-tolerant NoCs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’09). European Design and Automation Association, Belgium, 21--26. http://dl.acm.org/citation.cfm?id=1874620.1874628 Google ScholarGoogle ScholarCross RefCross Ref
  22. M. S. Floyd, S. Ghiasi, T. W. Keller, K. Rajamani, F. L. Rawson, J. C. Rubio, and M. S. Ware. 2007. System power management support in the IBM POWER6 microprocessor. IBM J. Res. Devel. 51, 6 (Nov. 2007), 733--746. DOI:http://dx.doi.org/10.1147/rd.516.0733 Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. H. Hajimiri, S. Paul, A. Ghosh, S. Bhunia, and P. Mishra. 2011. Reliability improvement in multicore architectures through computing in embedded memory. In Proceedings of the 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS’11). 1--4. DOI:http://dx.doi.org/10.1109/MWSCAS.2011.6026672 Google ScholarGoogle ScholarCross RefCross Ref
  24. J. Huang, J. O. Blech, A. Raabe, C. Buckl, and A. Knoll. 2011. Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems. In Proceedings of the 2011 Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11). 247--256. DOI:http://dx.doi.org/10.1145/2039370.2039409 Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Intel. 2008. Quad-Core Intel Xeon Processor 5400 Series. Retrieved from http://www.intel.com/Assets/enUS/PDF/datasheet/318589.pdf.Google ScholarGoogle Scholar
  26. A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation Test in Europe Conference Exhibition, 2009 (DATE’09). 423--428. DOI:http://dx.doi.org/10.1109/DATE.2009.5090700 Google ScholarGoogle ScholarCross RefCross Ref
  27. F. Khalili and H. R. Zarandi. 2013. A fault-tolerant core mapping technique in networks-on-chip. IET Comput. Digital Techniques 7, 6 (Nov. 2013), 238--245. DOI:http://dx.doi.org/10.1049/iet-cdt.2013.0032 Google ScholarGoogle ScholarCross RefCross Ref
  28. A. Kohler, G. Schley, and M. Radetzki. 2010. Fault tolerant network on chip switching with graceful performance degradation. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 29, 6 (June 2010), 883--896. DOI:http://dx.doi.org/10.1109/TCAD.2010.2048399 Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Kundu and S. Chattopadhyay. 2014. Network-on-Chip: The Next Generation of System-on-Chip Integration. CRC Press. Google ScholarGoogle ScholarCross RefCross Ref
  30. C. Lee, H. Kim, H. W. Park, S. Kim, H. Oh, and S. Ha. 2010. A task remapping technique for reliable multi-core embedded systems. In Proceedings of the 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’10). 307--316. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. F. Liberato, R. Melhem, and D. Mosse. 2000. Tolerance to multiple transient faults for aperiodic tasks in hard real-time systems. IEEE Trans. Comput. 49, 9 (Sept. 2000), 906--914. DOI:http://dx.doi.org/10.1109/12.869322 Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. T. Maqsood, S. Ali, S. U. R. Malik, and S. A. Madani. 2015. Dynamic task mapping for network-on-chip based systems. J. Syst. Architecture 61, 7 (2015), 293--306. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Y. Ren, L. Liu, S. Yin, J. Han, and S. Wei. 2015. Efficient fault-tolerant topology reconfiguration using a maximum flow algorithm. ACM Trans. Reconfig. Technol. Syst. 8, 3, Article 19 (May 2015), 24 pages. DOI:http://dx.doi.org/10.1145/2700417 Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. E. Schuchman and T. N. Vijaykumar. 2005. Rescue: A microarchitecture for testability and defect tolerance. In Proceedings of the 32nd International Symposium on Computer Architecture, 2005 (ISCA’05). 160--171. DOI:http://dx.doi.org/10.1109/ISCA.2005.44 Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Shamshiri and K. T. Cheng. 2011. Modeling yield, cost, and quality of a spare-enhanced multicore chip. IEEE Trans. Comput. 60, 9 (Sept. 2011), 1246--1259. DOI:http://dx.doi.org/10.1109/TC.2011.32 Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. S. Shamshiri, P. Lisherness, S. J. Pan, and K. T. Cheng. 2008. A cost analysis framework for multi-core systems with spares. In Proceedings of the IEEE International Test Conference, 2008 (ITC’08). 1--8. DOI:http://dx.doi.org/10.1109/TEST.2008.4700562 Google ScholarGoogle ScholarCross RefCross Ref
  37. P. Shivakumar, S. W. Keckler, C. R. Moore, and D. Burger. 2012. Exploiting microarchitectural redundancy for defect tolerance. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). 35--42. DOI:http://dx.doi.org/10.1109/ICCD.2012.6378613 Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. O. Sinnen. 2007. Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing). Wiley-Interscience. Google ScholarGoogle ScholarCross RefCross Ref
  39. C. Wang, J. Wu, G. Jiang, and J. Sun. 2013. An efficient topology reconfiguration algorithm for NOC based multiprocessor arrays. In Proceedings of the 2013 IEEE 10th International Conference on High Performance Computing and Communications, 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC EUC’13). 873--880. DOI:http://dx.doi.org/10.1109/HPCC.and.EUC.2013.125 Google ScholarGoogle ScholarCross RefCross Ref
  40. C. Yang and A. Orailoglu. 2007. Predictable execution adaptivity through embedding dynamic reconfigurability into static MPSoC schedules. In Proceedings of the 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 15--20. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. C. Yao, K. K. Saluja, and P. Ramanathan. 2011. Calibrating on-chip thermal sensors in integrated circuits: A design-for-calibration approach. J. Electronic Testing 27, 6 (2011), 711--721. DOI:http://dx.doi.org/10.1007/s10836-011-5253-4 Google ScholarGoogle ScholarCross RefCross Ref
  42. L. Zhang, Y. Han, Q. Xu, and X. Li. 2008. Defect tolerance in homogeneous manycore processors using core-level redundancy with unified topology. In Proceedings of the Design, Automation and Test in Europe, 2008 (DATE’08). 891--896. DOI:http://dx.doi.org/10.1109/DATE.2008.4484787 Google ScholarGoogle ScholarCross RefCross Ref
  43. L. Zhang, Y. Han, Q. Xu, X. W. Li, and H. Li. 2009. On topology reconfiguration for defect-tolerant noc-based homogeneous manycore systems. IEEE Trans. Very Large Scale Integration (VLSI) Syst. 17, 9 (Sept. 2009), 1173--1186. DOI:http://dx.doi.org/10.1109/TVLSI.2008.2002108 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Fault-Tolerant Dynamic Task Mapping and Scheduling for Network-on-Chip-Based Multicore Platform

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!