Abstract
In Network-on-Chip (NoC)-based multicore systems, task allocation and scheduling are known to be important problems, as they affect the performance of applications in terms of energy consumption and timing. Advancement of deep submicron technology has made it possible to scale the transistor feature size to the nanometer range, which has enabled multiple processing elements to be integrated onto a single chip. On the flipside, it has made the integrated entities on the chip more susceptible to different faults. Although a significant amount of work has been done in the domain of fault-tolerant mapping and scheduling, existing algorithms either precompute reconfigured mapping solutions at design time while anticipating fault(s) scenarios or adopt a hybrid approach wherein a part of the fault mitigation strategy relies on the design-time solution. The complexity of the problem rises further for real-time dynamic systems where new applications can arrive in the multicore platform at any time instant. For real-time systems, the validity of computation depends both on the correctness of results and on temporal constraint satisfaction. This article presents an improved fault-tolerant dynamic solution to the integrated problem of application mapping and scheduling for NoC-based multicore platforms. The developed algorithm provides a unified mapping and scheduling method for real-time systems focusing on meeting application deadlines and minimizing communication energy. A predictive model has been used to determine the failure-prone cores in the system for which a fault-tolerant resource allocation with task redundancy has been performed. By selectively using a task replication policy, the reliability of the application, executing on a given NoC platform, is improved. A detailed evaluation of the performance of the proposed algorithm has been conducted for both real and synthetic applications. When compared with other fault-tolerant algorithms reported in the literature, performance of the proposed algorithm shows an average reduction of 56.95% in task re-execution time overhead and an average improvement of 31% in communication energy. Further, for time-constrained tasks, deadline satisfaction has also been achieved for most of the test cases by the developed algorithm, whereas the techniques reported in the literature failed to meet deadline in about 45% test cases.
- R. Ahmed, P. Ramanathan, and K. K. Saluja. 2014. Necessary and sufficient conditions for thermal schedulability of periodic real-time tasks. In Proceedings of the 2014 26th Euromicro Conference on Real-Time Systems (ECRTS’14). 243--252. DOI:http://dx.doi.org/10.1109/ECRTS.2014.15 Google Scholar
Digital Library
- K. Ahn, J. Kim, and S. Hong. 1997. Fault-tolerant real-time scheduling using passive replicas. In Proceedings of the Pacific Rim International Symposium on Fault-Tolerant Systems, 1997. 98--103. DOI:http://dx.doi.org/10.1109/PRFTS.1997.640132 Google Scholar
Cross Ref
- O. Arnold and G. Fettweis. 2011. Resilient dynamic task scheduling for unreliable heterogeneous MPSoCs. 2011 Semiconductor Conference Dresden, Dresden, 1--4. DOI:10.1109/SCD.2011.6068747 Google Scholar
Cross Ref
- L. Benini and G. De Micheli. 2002. Networks on chips: A new SoC paradigm. Computer 35, 1 (Jan. 2002), 70--78. DOI:http://dx.doi.org/10.1109/2.976921 Google Scholar
Digital Library
- F. Bolanos, F. Rivera, J. E. Aedo, and N. Bagherzadeh. 2013. From UML specifications to mapping and scheduling of tasks into a NoC, with reliability considerations. J. Syst. Archit. 59, 7 (Aug. 2013), 429--440. Google Scholar
Digital Library
- S. Borkar, T. Karnik, and V. De. 2004. Design and reliability challenges in nanometer technologies. In Proceedings of the 41st Annual Design Automation Conference (DAC’04). ACM, New York, NY, 75--75. DOI:http://dx.doi.org/10.1145/996566.996588 Google Scholar
Digital Library
- E. Carvalho and F. Moraes. 2008. Congestion-aware task mapping in heterogeneous MPSoCs. In Proceedings of the International Symposium on System-on-Chip, 2008 (SOC’08). 1--4. DOI:http://dx.doi.org/10.1109/ISSOC.2008.4694878 Google Scholar
Cross Ref
- Y. C. Chang, C. T. Chiu, S. Y. Lin, and C. K. Liu. 2011. On the design and analysis of fault tolerant NoC architecture using spare routers. In Proceedings of the 2011 16th Asia and South Pacific Design Automation Conference (ASP-DAC’11). 431--436. DOI:http://dx.doi.org/10.1109/ASPDAC.2011.5722228 Google Scholar
Cross Ref
- H.-L. Chao, S.-Y. Tung, and P.-A. Hsiung. 2016. Dynamic task mapping with congestion speculation for reconfigurable network-on-chip. ACM Trans. Reconfig. Technol. Syst. 10, 1, Article 3 (Sept. 2016), 25 pages. DOI:http://dx.doi.org/10.1145/2892633 Google Scholar
Digital Library
- N. Chatterjee, N. Prasad, and S. Chattapadhyay. 2014. A spare link based reliable network-on-chip design. In Proceedings of the 18th International Symposium on VLSI Design and Test. 1--6. DOI:http://dx.doi.org/10.1109/ISVDAT.2014.6881036 Google Scholar
Cross Ref
- C. L. Chou and R. Marculescu. 2011. FARM: Fault-aware resource management in NoC-based multiprocessor platforms. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’11). 1--6. DOI:http://dx.doi.org/10.1109/DATE.2011.5763113 Google Scholar
Cross Ref
- C. Constantinescu. 2002. Impact of deep submicron technology on dependability of VLSI circuits. In Proceedings of the International Conference on Dependable Systems and Networks, 2002 (DSN’02). 205--209. DOI:http://dx.doi.org/10.1109/DSN.2002.1028901 Google Scholar
Cross Ref
- C. Constantinescu. 2003. Trends and challenges in VLSI circuit reliability. IEEE Micro 23, 4 (July 2003), 14--19. DOI:http://dx.doi.org/10.1109/MM.2003.1225959 Google Scholar
Digital Library
- A. Das and A. Kumar. 2012. Fault-aware task re-mapping for throughput constrained multimedia applications on NoC-based MPSoCs. In Proceedings of the 2012 23rd IEEE International Symposium on Rapid System Prototyping (RSP’12). 149--155. DOI:http://dx.doi.org/10.1109/RSP.2012.6380704 Google Scholar
Cross Ref
- A. Das, A. Kumar, and B. Veeravalli. 2013. Reliability-driven task mapping for lifetime extension of networks-on-chip based multiprocessor systems. In Proceedings of the Design, Automation Test in Europe Conference Exhibition (DATE’13). 689--694. DOI:http://dx.doi.org/10.7873/DATE.2013.149 Google Scholar
Cross Ref
- A. Das, A. Kumar, and B. Veeravalli. 2014. Communication and migration energy aware task mapping for reliable multiprocessor systems. Future Generation Comput. Syst. 30 (2014), 216--228. DOI:http://dx.doi.org/10.1016/j.future.2013.06.016 Special Issue on Extreme Scale Parallel Architectures and Systems, Cryptography in Cloud Computing and Recent Advances in Parallel and Distributed Systems, {ICPADS} 2012 Selected Papers. Google Scholar
Digital Library
- A. Das, A. K. Singh, and A. Kumar. 2013. Energy-aware dynamic reconfiguration of communication-centric applications for reliable MPSoCs. In Proceedings of the 2013 8th International Workshop on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC’13). 1--7. DOI:http://dx.doi.org/10.1109/ReCoSoC.2013.6581540 Google Scholar
Cross Ref
- A. Das, A. Kumar Singh, and A. Kumar. 2015. Execution trace--driven energy-reliability optimization for multimedia MPSoCs. ACM Trans. Reconfig. Technol. Syst. 8, 3, Article 18 (May 2015), 19 pages. DOI:http://dx.doi.org/10.1145/2665071 Google Scholar
Digital Library
- R. P. Dick, D. L. Rhodes, and W. Wolf. 1998. TGFF: Task graphs for free. In Proceedings of the 6th International Workshop on Hardware/Software Codesign. IEEE Computer Society, 97--101. Google Scholar
Cross Ref
- P. Eles, V. Izosimov, P. Pop, and Z. Peng. 2008. Synthesis of fault-tolerant embedded systems. In Proceedings of the Design, Automation and Test in Europe, 2008 (DATE’08). 1117--1122. DOI:http://dx.doi.org/10.1109/DATE.2008.4484825 Google Scholar
Cross Ref
- D. Fick, A. DeOrio, G. Chen, V. Bertacco, D. Sylvester, and D. Blaauw. 2009. A highly resilient routing algorithm for fault-tolerant NoCs. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE’09). European Design and Automation Association, Belgium, 21--26. http://dl.acm.org/citation.cfm?id=1874620.1874628 Google Scholar
Cross Ref
- M. S. Floyd, S. Ghiasi, T. W. Keller, K. Rajamani, F. L. Rawson, J. C. Rubio, and M. S. Ware. 2007. System power management support in the IBM POWER6 microprocessor. IBM J. Res. Devel. 51, 6 (Nov. 2007), 733--746. DOI:http://dx.doi.org/10.1147/rd.516.0733 Google Scholar
Digital Library
- H. Hajimiri, S. Paul, A. Ghosh, S. Bhunia, and P. Mishra. 2011. Reliability improvement in multicore architectures through computing in embedded memory. In Proceedings of the 2011 IEEE 54th International Midwest Symposium on Circuits and Systems (MWSCAS’11). 1--4. DOI:http://dx.doi.org/10.1109/MWSCAS.2011.6026672 Google Scholar
Cross Ref
- J. Huang, J. O. Blech, A. Raabe, C. Buckl, and A. Knoll. 2011. Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems. In Proceedings of the 2011 Proceedings of the 9th International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’11). 247--256. DOI:http://dx.doi.org/10.1145/2039370.2039409 Google Scholar
Digital Library
- Intel. 2008. Quad-Core Intel Xeon Processor 5400 Series. Retrieved from http://www.intel.com/Assets/enUS/PDF/datasheet/318589.pdf.Google Scholar
- A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi. 2009. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Design, Automation Test in Europe Conference Exhibition, 2009 (DATE’09). 423--428. DOI:http://dx.doi.org/10.1109/DATE.2009.5090700 Google Scholar
Cross Ref
- F. Khalili and H. R. Zarandi. 2013. A fault-tolerant core mapping technique in networks-on-chip. IET Comput. Digital Techniques 7, 6 (Nov. 2013), 238--245. DOI:http://dx.doi.org/10.1049/iet-cdt.2013.0032 Google Scholar
Cross Ref
- A. Kohler, G. Schley, and M. Radetzki. 2010. Fault tolerant network on chip switching with graceful performance degradation. IEEE Trans. Comput.-Aided Design Integr. Circuits Syst. 29, 6 (June 2010), 883--896. DOI:http://dx.doi.org/10.1109/TCAD.2010.2048399 Google Scholar
Digital Library
- S. Kundu and S. Chattopadhyay. 2014. Network-on-Chip: The Next Generation of System-on-Chip Integration. CRC Press. Google Scholar
Cross Ref
- C. Lee, H. Kim, H. W. Park, S. Kim, H. Oh, and S. Ha. 2010. A task remapping technique for reliable multi-core embedded systems. In Proceedings of the 2010 IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’10). 307--316. Google Scholar
Digital Library
- F. Liberato, R. Melhem, and D. Mosse. 2000. Tolerance to multiple transient faults for aperiodic tasks in hard real-time systems. IEEE Trans. Comput. 49, 9 (Sept. 2000), 906--914. DOI:http://dx.doi.org/10.1109/12.869322 Google Scholar
Digital Library
- T. Maqsood, S. Ali, S. U. R. Malik, and S. A. Madani. 2015. Dynamic task mapping for network-on-chip based systems. J. Syst. Architecture 61, 7 (2015), 293--306. Google Scholar
Digital Library
- Y. Ren, L. Liu, S. Yin, J. Han, and S. Wei. 2015. Efficient fault-tolerant topology reconfiguration using a maximum flow algorithm. ACM Trans. Reconfig. Technol. Syst. 8, 3, Article 19 (May 2015), 24 pages. DOI:http://dx.doi.org/10.1145/2700417 Google Scholar
Digital Library
- E. Schuchman and T. N. Vijaykumar. 2005. Rescue: A microarchitecture for testability and defect tolerance. In Proceedings of the 32nd International Symposium on Computer Architecture, 2005 (ISCA’05). 160--171. DOI:http://dx.doi.org/10.1109/ISCA.2005.44 Google Scholar
Digital Library
- S. Shamshiri and K. T. Cheng. 2011. Modeling yield, cost, and quality of a spare-enhanced multicore chip. IEEE Trans. Comput. 60, 9 (Sept. 2011), 1246--1259. DOI:http://dx.doi.org/10.1109/TC.2011.32 Google Scholar
Digital Library
- S. Shamshiri, P. Lisherness, S. J. Pan, and K. T. Cheng. 2008. A cost analysis framework for multi-core systems with spares. In Proceedings of the IEEE International Test Conference, 2008 (ITC’08). 1--8. DOI:http://dx.doi.org/10.1109/TEST.2008.4700562 Google Scholar
Cross Ref
- P. Shivakumar, S. W. Keckler, C. R. Moore, and D. Burger. 2012. Exploiting microarchitectural redundancy for defect tolerance. In 2012 IEEE 30th International Conference on Computer Design (ICCD’12). 35--42. DOI:http://dx.doi.org/10.1109/ICCD.2012.6378613 Google Scholar
Digital Library
- O. Sinnen. 2007. Task Scheduling for Parallel Systems (Wiley Series on Parallel and Distributed Computing). Wiley-Interscience. Google Scholar
Cross Ref
- C. Wang, J. Wu, G. Jiang, and J. Sun. 2013. An efficient topology reconfiguration algorithm for NOC based multiprocessor arrays. In Proceedings of the 2013 IEEE 10th International Conference on High Performance Computing and Communications, 2013 IEEE International Conference on Embedded and Ubiquitous Computing (HPCC EUC’13). 873--880. DOI:http://dx.doi.org/10.1109/HPCC.and.EUC.2013.125 Google Scholar
Cross Ref
- C. Yang and A. Orailoglu. 2007. Predictable execution adaptivity through embedding dynamic reconfigurability into static MPSoC schedules. In Proceedings of the 2007 5th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS’07). 15--20. Google Scholar
Digital Library
- C. Yao, K. K. Saluja, and P. Ramanathan. 2011. Calibrating on-chip thermal sensors in integrated circuits: A design-for-calibration approach. J. Electronic Testing 27, 6 (2011), 711--721. DOI:http://dx.doi.org/10.1007/s10836-011-5253-4 Google Scholar
Cross Ref
- L. Zhang, Y. Han, Q. Xu, and X. Li. 2008. Defect tolerance in homogeneous manycore processors using core-level redundancy with unified topology. In Proceedings of the Design, Automation and Test in Europe, 2008 (DATE’08). 891--896. DOI:http://dx.doi.org/10.1109/DATE.2008.4484787 Google Scholar
Cross Ref
- L. Zhang, Y. Han, Q. Xu, X. W. Li, and H. Li. 2009. On topology reconfiguration for defect-tolerant noc-based homogeneous manycore systems. IEEE Trans. Very Large Scale Integration (VLSI) Syst. 17, 9 (Sept. 2009), 1173--1186. DOI:http://dx.doi.org/10.1109/TVLSI.2008.2002108 Google Scholar
Digital Library
Index Terms
Fault-Tolerant Dynamic Task Mapping and Scheduling for Network-on-Chip-Based Multicore Platform
Recommendations
A permanent fault tolerant dynamic task allocation approach for Network-on-Chip based multicore systems
AbstractRapid advancement in deep sub-micron regime has made the integration of multiple processing elements possible on a single chip. This has enabled parallel execution of applications on Network-on-Chip (NoC) based multiprocessor ...
Fault-tolerant Network-on-Chip based on Fault-aware Flits and Deflection Routing
NOCS '15: Proceedings of the 9th International Symposium on Networks-on-ChipDeflection routing is a promising approach for energy and hardware efficient NoCs. Future VLSI designs will have an increasing susceptibility to failures and breakdowns. The inherent redundancy of NoCs can be used to tolerate such failures. We extended ...
Deadline and energy aware dynamic task mapping and scheduling for Network-on-Chip based multi-core platform
Task allocation and scheduling is a challenging problem in Network-on-Chip (NoC) based multi-core systems. It affects the performance of the application in terms of energy consumption and timing. The complexity of the problem increases further for ...






Comments