skip to main content
research-article

ForEVeR: A complementary formal and runtime verification approach to correct NoC functionality

Published:28 March 2014Publication History
Skip Abstract Section

Abstract

As silicon technology scales, modern processor and embedded systems are rapidly shifting towards complex chip multi-processor (CMP) and system-on-chip (SoC) designs. As a side effect of complexity of these designs, ensuring their correctness has become increasingly problematic. Within these domains, Network-on-Chips (NoCs) are a de-facto choice to implement on-chip interconnect; their design is quickly becoming extremely complex in order to keep up with communication performance demands. As a result, design errors in the NoC may go undetected and escape into the final silicon.

In this work, we propose ForEVeR, a solution that complements the use of formal methods and runtime verification to ensure functional correctness in NoCs. Formal verification, due to its scalability limitations, is used to verify smaller modules, such as individual router components. To deliver correctness guarantees for the complete network, we propose a network-level detection and recovery solution that monitors the traffic in the NoC and protects it against escaped functional bugs. To this end, ForEVeR augments the baseline NoC with a lightweight checker network that alerts destination nodes of incoming packets ahead of time. If a bug is detected, flagged by missed packet arrivals, our recovery mechanism delivers the in-flight data safely to the intended destination via the checker network. ForEVeR's experimental evaluation shows that it can recover from NoC design errors at only 4.9% area cost for an 8x8 mesh interconnect, over a time interval ranging from 0.5K to 30K cycles per recovery event, and it incurs no performance overhead in the absence of errors. ForEVeR can also protect NoC operations against soft-errors: a growing concern with the scaling of silicon. ForEVeR leverages the same monitoring hardware to detect soft-error manifestations, in addition to design-errors. Recovery of the soft-error affected packets is guaranteed by building resiliency features into our checker network. ForEVeR incurs minimal performance penalty up to a flit error rate of 0.01% in lightly loaded networks.

References

  1. R. Abdel-Khalek, R. Parikh, A. DeOrio, and V. Bertacco. 2011. Functional correctness for CMP interconnects. In Proceedings of the IEEE International Conference on Computer Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. K. Aisopos and L.-S. Peh. 2011. A systematic methodology to develop resilient cache coherence protocols. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. K. V. Anjan and Timothy Mark Pinkston. 1995. An efficient, fully adaptive deadlock recovery scheme: DISHA. In Proceedings of the Annual International Symposium on Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. T. M. Austin. 1999. DIVA: A reliable substrate for deep submicron microarchitecture design. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. A. A. Bayazit and S. Malik. 2005. Complementary use of runtime validation and model checking. In Proceedings of the IEEE International Conference on Computer-Aided Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. D. Borrione, A. Helmy, L. Pierre, and J. Schmaltz. 2007. A generic model for formally verifying NoC communication architectures: A case study. In Proceedings of the International Symposium on Networks-on-Chip. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Boule, J.-S. Chenard, and Z. Zilic. 2007. Assertion checkers in verification, silicon debug and in-field diagnosis. In Proceedings of the International Symposium on Quality Electronic Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. R. Brayton and A. Mishchenko. 2010. ABC: an academic industrial-strength verification tool. In Proceedings of the International Conference on Computer Aided Verification. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. S. Chatterjee, M. Kishinevsky, and U. Ogras. 2012. xMAS: Quick formal modeling of communication fabrics to enable verification. IEEE Des. Test Comput. 29, 3.Google ScholarGoogle ScholarCross RefCross Ref
  11. W. Dally and B. Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. W. J. Dally and C. L. Seitz. 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput. 36, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. A. Dixit and A. Wood. 2011. The impact of new technology on soft error rates. In Proceedings of the IEEE International Reliability Physics Symposium. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. A. Dutta and N. A. Touba. 2007. Reliable network-on-chip using a low cost unequal error protection code. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. F. Fazzino, M. Palesi, and D. Patti. Noxim: Network-on-chip simulator. http://noxim.sourceforge.net/.Google ScholarGoogle Scholar
  16. H. Foster, L. Loh, B. Rabii, and V. Singhal. 2006. Guidelines for creating a formal verification testplan. In Proceedings of DVCon.Google ScholarGoogle Scholar
  17. O. Hammami, X. Li, and J.-M. Brault. 2012. NOCEVE: Network on chip emulation and verification environment. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. D. Holcomb, B. Brady, and S. A. Seshia. 2011. Abstraction-based performance analysis of NoCs. In Proceedings of the IEEE/ACM Design Automation Conference. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Intel. 2007. Intel Core2 Duo and Intel Core2 Solo Processor for Intel Centrino Duo Processor technology specification update.Google ScholarGoogle Scholar
  20. Intel. 2010. Intel Core i7-900 Desktop processor series specification update.Google ScholarGoogle Scholar
  21. A. Kahng, B. Li, L.-S. Peh, and K. Samadi. 2009. Orion 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. K. Kailas, V. Paruthi, and B. Monwai. 2009. Formal verification of correctness and performance of random prioritybased arbiters. In Proceedings of the International Conference on Formal Methods in Computer-Aided Design.Google ScholarGoogle Scholar
  23. J. Kim and H. Kim. 2009. Router microarchitecture and scalability of ring topology in on-chip networks. In Proceedings of the 2nd International Workshop on Network on Chip Architectures. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. T. Krishna, C.-H. O. Chen, W. C. Kwon, and L.-S. Peh. 2013. Breaking the on-chip latency barrier using SMART. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. X. Lin, P. McKinley, and L. Ni. 1994. Deadlock-free multicast wormhole routing in 2-d mesh multicomputers. IEEE Trans. Parallel Distrib. Syst. 5, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. P. Lopez, J. M. Martínez, and J. Duato. 1998. A very efficient distributed deadlock detection mechanism for wormhole networks. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. J.-M. Martínez, P. Lopez, J. Duato, and T. Pinkston. 1997. Software-based deadlock recovery technique for true fully adaptive routing in wormhole networks. In Proceedings of the International Conference on Parallel Processing. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. A. Meixner, M. E. Bauer, and D. Sorin. 2007. Argus: Low-cost, comprehensive error detection in simple cores. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. S. Murali, T. Theocharides, N. Vijaykrishnan, M. Irwin, L. Benini, and G. De Micheli. 2005. Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. E. B. Nightingale, J. R. Douceur, and V. Orgovan. 2011. Cycles, cells and platters: An empirical analysis of hardware failures on a million consumer PCs. In Proceedings of the 6th Conference on Computer Systems (EuroSys'11). Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. 2010. Next generation on-chip networks: What kind of congestion control do we need? In Proceedings of the ACM Workshop on Hot Topics in Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. D. Park, C. Nicopoulos, J. Kim, N. Vijaykrishnan, and C. R. Das. 2006. Exploring fault-tolerant network-onchip architectures. In Proceedings of the International Conference on Dependable Systems and Networks. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. L.-S. Peh and W. Dally. 2000. Flit-reservation flow control. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google ScholarGoogle Scholar
  34. A. Prodromou, A. Panteli, C. Nicopoulos, and Y. Sazeides. 2012. NoCAlert: An on-line and real-time fault detection mechanism for network-on-chip architectures. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. S. Shamshiri, A. Ghofrani, and K.-T. Cheng. 2011. End-to-end error correction and online diagnosis for on-chip networks. In Proceedings of the International Teletraffic Congress.Google ScholarGoogle ScholarCross RefCross Ref
  36. D. Starobinski, M. Karpovsky, and L. A. Zakrevski. 2003. Application of network calculus to general topologies using turn-prohibition. IEEE/ACM Trans. Networking 11, 3. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. G. Tsiligiannis and L. Pierre. 2012. A mixed verification strategy tailored for networks on chip. In Proceedings of the International Symposium on Networks-on-Chip. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. S. Vangal, J. Howard, G. Ruhl, et al. 2008. An 80-tile sub-100-w teraflops processor in 65-nm cmos. IEEE J. Solid-State Circuits.Google ScholarGoogle Scholar
  39. F. Verbeek and J. Schmaltz. 2011. Hunting deadlocks efficiently in microarchitectural models of communication fabrics. In Proceedings of the International Conference on Formal Methods in Computer-Aided Design. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. F. Verbeek and J. Schmaltz. 2012. Easy formal specification and validation of unbounded networks-on-chips architectures. ACM Trans. Des. Autom. Electron. Syst. 17, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  41. I. Wagner and V. Bertacco. 2007. Engineering trust with semantic guardians. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. Brown, and A. Agarwal. 2007. On-chip interconnection architecture of the tile processor. IEEE Micro 27, 5. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. ForEVeR: A complementary formal and runtime verification approach to correct NoC functionality

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!