Abstract
We exploit the inherent information redundancy in the control path of Network-on-Chip (NoC) routers to manage transient errors, preventing packet loss and misrouting. Outputs of the routing arbitration units in NoC routers can be used to determine arbitration failures, because the valid arbitration outputs are a subset of all possible values. This feature is exploited to detect and correct logic and register errors in the router arbitration control path. The proposed method is complementary to other error management methods for NoC routers. An analytical reliability model of our method is provided, including parameters such as logic unit size, different error rates for logic gates and registers, and the location of faulty elements. Compared to triple-modular redundancy (TMR), the proposed method improves the arbiter reliability by two orders of magnitude while reducing the total area and power by 43% and 64%, respectively. In the presented case studies, two traffic traces from the PARSEC benchmark suite are used to evaluate the average latency and energy consumption. Simulations performed on a 4× 4 NoC show that our method reduces the average latency by up to 50% and reduces average energy by up to 70% compared to other methods.
- Baumann, R. 2005. Radiation-induced soft errors in advanced semiconductor technologies. IEEE Trans. Device Mater. Reliab. 5, 305--316.Google Scholar
- Benini, L. and De Micheli, G. 2002. Networks on Chips: A new SoC paradigm. Computer, 70--78. Google Scholar
Digital Library
- Bertozzi, D., Benini, L. and De Micheli, G. 2005. Error control scheme for on-chip communication links: the energy-reliability tradeoff. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 24, 6, 818--831. Google Scholar
Digital Library
- Constantinides, K., Plaza, S., Blome, J., Zhang, B., Bertacco, V., Mahlke, S., Austin, T., and Orshansky, M. 2006. BulletProof: A defect-tolerant CMP switch architecture. In Proceedings of the International Symposium on High-Performance Computer Architecture (HPCA'06). 5--16.Google Scholar
- Dimitrakopoulos, G., Chrysos N., and Galanopoulos K. Fast arbiters for on-chip network switches. In Proceedings of the IEEE International Conference on Computer Design (ICCD'10). 664--670.Google Scholar
- Dutta, A. and Touba, N. A. 2007. Reliable Network-on-Chip using a low cost unequal error protection code. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). 3--11. Google Scholar
Digital Library
- Fick, D., Deorio, A., Chen, G., Bertacco, V., Sylvester, D., and Blaauw, D. 2009a. A highly resilient routing algorithm for fault-tolerant NoCs. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. 21--26. Google Scholar
Digital Library
- Fick, D., Deorio, A., Hu, J., Bertacco, V., Blaauw, D., and Sylvester, D. 2009b. Vicis: A reliable network for unreliable silicon. In Proceedings of the IEEE/ACM Design Automation Conference. 812--817. Google Scholar
Digital Library
- Fu, B. and Ampadu, P. 2009. On Hamming product codes with type-II hybrid ARQ for on-chip interconnects. IEEE Trans. Circuits Syst. Regul. Pap. 56, 9, 2042--2054. Google Scholar
Digital Library
- Kim, J., Nicopoulos, C. and Park, D. 2006. A gracefully degrading and energy-efficient modular router architecture for on-chip networks. In Proceedings of the Annual International Symposium on Computer Architecture (ISCA'06). 4--15. Google Scholar
Digital Library
- Lehtonen, T., Wolpert, D., Lijeberg, P., Plosila, J. and Ampadu, P. 2010. Self-adaptive system for addressing permanent errors in on-chip interconnects. IEEE Trans. VLSI Syst. 18, 4, 527--540. Google Scholar
Digital Library
- Lyons, R. E. and Vanderkulk, W. 1962. The use of triple-modular redundancy to improve computer reliability. IBM J. Res. Dev. 6, 2, 200--209. Google Scholar
Digital Library
- Mahatme, N. N., Chatterjee, I., Bhuva, B. L., Ahlbin, J., Massengill, L. W., and Shuler, R. 2010. Analysis of soft error rates in combinational and sequential logic and implications of hardening for advanced technologies. In Proceedings of the IEEE International Reliability Physics Symposium. 1031--1035.Google Scholar
- Mediratta, S. D. and Draper, J. 2007. Characterization of a fault-tolerant NoC router. In Proceedings of the IEEE International Symposium on Circuits and Systems (IISCAS'07). 381--384.Google Scholar
- Murali, S., Theocharides, T., Vijaykrishnan, N., Irwin, M. J., Benini, L. and De Micheli, G. 2005. Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22, 5, 434--442. Google Scholar
Digital Library
- Owens, J., Dally, W., Ho, R., Jayasimha, D., Keckler, S. W. and Peh, L.-S. 2007. Research challenges for on-chip interconnection networks. IEEE Micro 27, 5, 96--108. Google Scholar
Digital Library
- Palesi, M., Kumar, S., and Catania, V. 2010. Leveraging Partially Faulty Links Usage for Enhancing Yield and Performance in Networks-on-Chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 29, 3, 426--440. Google Scholar
Digital Library
- Parsec Benchmark. http://parsec.cs.princeton.edu.Google Scholar
- Predictive Technology Model {Online}: http://www.eas.asu.edu/∼ptm.Google Scholar
- Ramanujam, R. S., Soteriou, V., Lin, B. and Peh, L.-S. 2010. Design of a high-throughput distributed shared-buffer NoC router. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 69--78. Google Scholar
Digital Library
- Rodrigo, S., Flich, J., Roca, A., Medardoni, S., Bertozzi, D., Camacho, J., Silla, F. and Duato, J. 2010. Addressing manufacturing challenges with cost-efficient fault tolerant routing. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 25--32. Google Scholar
Digital Library
- Salminen, E., Kulmala, A. and Hämäläinen, T. D. 2008. Survey of network-on-chip proposals White Paper, OCP-IP, 1--13.Google Scholar
- Sanusi, A. and Bayoumi, M. A. 2009. Smart-flooding: A novel scheme for fault-tolerant NoCs. In Proceedings of the IEEE International SoC Conference. 259--262.Google Scholar
- Shamshiri, S. and Cheng, K.-T. 2009. Yield and cost analysis of a reliable NoC. In Proceedings of the 27th IEEE VLSI Test Symposium. 173--178. Google Scholar
Digital Library
- Sridhara, S. and Shanbhag, N. R. 2005. Coding for system-on-chip networks: a unified framework. IEEE Trans. VLSI Syst. 13, 6, 655--667. Google Scholar
Digital Library
- Vangal, S., Howard, J., et al. 2008. An 80-tile sub-100-W TeraFLOPS processor in 65-nm CMOS. IEEE J. Solid-State Circuits 43, 1, 29--41.Google Scholar
Cross Ref
- Yanamandra, A., Eachempati, S., Soundararajan, N., Narayanan, V., Irwin, M. J., and Krishnan, R. 2010. Optimizing power and performance for reliable on-chip networks. In Proceedings of the Asia and South Pacific Design Automation Conference. 431--436. Google Scholar
Digital Library
- Yu, Q. and Ampadu, P. 2010. Transient and permanent error co-management for reliable network-on-chip. In Proceedings of the ACM/IEEE International Symposium on Networks-on-Chip (NOCS'10). 145--154. Google Scholar
Digital Library
- Yu, Q., Zhang, B., Li, Y., and Ampadu, P. 2010. Error control integration scheme for reliable NoC. In Proceedings of the IEEE International Symposium on Circuits and Systems (IISCAS'10). 3893--3896.Google Scholar
Index Terms
Addressing network-on-chip router transient errors with inherent information redundancy
Recommendations
Exploiting inherent information redundancy to manage transient errors in NoC routing arbitration
NOCS '11: Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-ChipWe exploit the inherent information redundancy in the control path of Networks-on-Chip (NoCs) routers to manage transient errors, preventing packet loss and misrouting. Unlike fault-tolerant routing, our method does not drop packets when faults occur in ...
Transient and Permanent Error Control for High-End Multiprocessor Systems-on-Chip
NOCS '12: Proceedings of the 2012 IEEE/ACM Sixth International Symposium on Networks-on-ChipHigh-end MPSoC systems with built-in high-radix topologies achieve good performance because of the improved connectivity and the reduced network diameter. In high-end MPSoC systems, fault tolerance support is becoming a compulsory feature. In this work, ...
Lottery Router: A Customized Arbitral Priority NOC Router
CSSE '08: Proceedings of the 2008 International Conference on Computer Science and Software Engineering - Volume 03For the different communications of specific NOC (network on chip) applications, this paper proposes a customized arbitral priority NOC router. This router uses the arbitral mechanism based on Lottery algorithm instead of the RR (round robin) algorithm, ...






Comments