skip to main content
research-article

Addressing network-on-chip router transient errors with inherent information redundancy

Published:03 July 2013Publication History
Skip Abstract Section

Abstract

We exploit the inherent information redundancy in the control path of Network-on-Chip (NoC) routers to manage transient errors, preventing packet loss and misrouting. Outputs of the routing arbitration units in NoC routers can be used to determine arbitration failures, because the valid arbitration outputs are a subset of all possible values. This feature is exploited to detect and correct logic and register errors in the router arbitration control path. The proposed method is complementary to other error management methods for NoC routers. An analytical reliability model of our method is provided, including parameters such as logic unit size, different error rates for logic gates and registers, and the location of faulty elements. Compared to triple-modular redundancy (TMR), the proposed method improves the arbiter reliability by two orders of magnitude while reducing the total area and power by 43% and 64%, respectively. In the presented case studies, two traffic traces from the PARSEC benchmark suite are used to evaluate the average latency and energy consumption. Simulations performed on a 4× 4 NoC show that our method reduces the average latency by up to 50% and reduces average energy by up to 70% compared to other methods.

References

  1. Baumann, R. 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans. Device Mater. Reliab. 5, 305--316.Google ScholarGoogle Scholar
  2. Benini, L. and De Micheli, G. 2002. Networks on Chips: A new SoC paradigm. Computer, 70--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Bertozzi, D., Benini, L. and De Micheli, G. 2005. Error control scheme for on-chip communication links: the energy-reliability tradeoff. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 24, 6, 818--831. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Constantinides, K., Plaza, S., Blome, J., Zhang, B., Bertacco, V., Mahlke, S., Austin, T., and Orshansky, M. 2006. BulletProof: A defect-tolerant CMP switch architecture. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA'06). 5--16.Google ScholarGoogle Scholar
  5. Dimitrakopoulos, G., Chrysos N., and Galanopoulos K. Fast arbiters for on-chip network switches. In Proceedings of the IEEE International Conference on Computer Design (ICCD'10). 664--670.Google ScholarGoogle Scholar
  6. Dutta, A. and Touba, N. A. 2007. Reliable Network-on-Chip using a low cost unequal error protection code. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). 3--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Fick, D., Deorio, A., Chen, G., Bertacco, V., Sylvester, D., and Blaauw, D. 2009a. A highly resilient routing algorithm for fault-tolerant NoCs. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. 21--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Fick, D., Deorio, A., Hu, J., Bertacco, V., Blaauw, D., and Sylvester, D. 2009b. Vicis: A reliable network for unreliable silicon. In Proceedings of the IEEE/ACM Design Automation Conference. 812--817. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Fu, B. and Ampadu, P. 2009. On Hamming product codes with type-II hybrid ARQ for on-chip interconnects. IEEE Trans. Circuits Syst. Regul. Pap. 56, 9, 2042--2054. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Kim, J., Nicopoulos, C. and Park, D. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'06). 4--15. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Lehtonen, T., Wolpert, D., Lijeberg, P., Plosila, J. and Ampadu, P. 2010. Self-adaptive system for addressing permanent errors in on-chip interconnects. IEEE Trans. VLSI Syst. 18, 4, 527--540. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Lyons, R. E. and Vanderkulk, W. 1962. The use of triple-modular redundancy to improve computer reliability. IBM J. Res. Dev. 6, 2, 200--209. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Mahatme, N. N., Chatterjee, I., Bhuva, B. L., Ahlbin, J., Massengill, L. W., and Shuler, R. 2010. Analysis of soft error rates in combinational and sequential logic and implications of hardening for advanced technologies. In Proceedings of the IEEE International Reliability Physics Symposium. 1031--1035.Google ScholarGoogle Scholar
  14. Mediratta, S. D. and Draper, J. 2007. Characterization of a fault-tolerant NoC router. In Proceedings of the IEEE International Symposium on Circuits and Systems (IISCAS'07). 381--384.Google ScholarGoogle Scholar
  15. Murali, S., Theocharides, T., Vijaykrishnan, N., Irwin, M. J., Benini, L. and De Micheli, G. 2005. Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22, 5, 434--442. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Owens, J., Dally, W., Ho, R., Jayasimha, D., Keckler, S. W. and Peh, L.-S. 2007. Research challenges for on-chip interconnection networks. IEEE Micro 27, 5, 96--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Palesi, M., Kumar, S., and Catania, V. 2010. Leveraging Partially Faulty Links Usage for Enhancing Yield and Performance in Networks-on-Chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29, 3, 426--440. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Parsec Benchmark. http://parsec.cs.princeton.edu.Google ScholarGoogle Scholar
  19. Predictive Technology Model {Online}: http://www.eas.asu.edu/∼ptm.Google ScholarGoogle Scholar
  20. Ramanujam, R. S., Soteriou, V., Lin, B. and Peh, L.-S. 2010. Design of a high-throughput distributed shared-buffer NoC router. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 69--78. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho, J., Silla, F. and Duato, J. 2010. Addressing manufacturing challenges with cost-efficient fault tolerant routing. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 25--32. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Salminen, E., Kulmala, A. and Hämäläinen, T. D. 2008. Survey of network-on-chip proposals White Paper, OCP-IP, 1--13.Google ScholarGoogle Scholar
  23. Sanusi, A. and Bayoumi, M. A. 2009. Smart-flooding: A novel scheme for fault-tolerant NoCs. In Proceedings of the IEEE International SoC Conference. 259--262.Google ScholarGoogle Scholar
  24. Shamshiri, S. and Cheng, K.-T. 2009. Yield and cost analysis of a reliable NoC. In Proceedings of the 27th IEEE VLSI Test Symposium. 173--178. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Sridhara, S. and Shanbhag, N. R. 2005. Coding for system-on-chip networks: a unified framework. IEEE Trans. VLSI Syst. 13, 6, 655--667. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Vangal, S., Howard, J., et al. 2008. An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circuits 43, 1, 29--41.Google ScholarGoogle ScholarCross RefCross Ref
  27. Yanamandra, A., Eachempati, S., Soundararajan, N., Narayanan, V., Irwin, M. J., and Krishnan, R. 2010. Optimizing power and performance for reliable on-chip networks. In Proceedings of the Asia and South Pacific Design Automation Conference. 431--436. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Yu, Q. and Ampadu, P. 2010. Transient and permanent error co-management for reliable network-on-chip. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 145--154. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Yu, Q., Zhang, B., Li, Y., and Ampadu, P. 2010. Error control integration scheme for reliable NoC. In Proceedings of the IEEE International Symposium on Circuits and Systems (IISCAS'10). 3893--3896.Google ScholarGoogle Scholar

Index Terms

  1. Addressing network-on-chip router transient errors with inherent information redundancy

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!