Abstract
As silicon technology scales, modern processor and embedded systems are rapidly shifting towards complex chip multi-processor (CMP) and system-on-chip (SoC) designs. As a side effect of complexity of these designs, ensuring their correctness has become increasingly problematic. Within these domains, Network-on-Chips (NoCs) are a de-facto choice to implement on-chip interconnect; their design is quickly becoming extremely complex in order to keep up with communication performance demands. As a result, design errors in the NoC may go undetected and escape into the final silicon.
In this work, we propose ForEVeR, a solution that complements the use of formal methods and runtime verification to ensure functional correctness in NoCs. Formal verification, due to its scalability limitations, is used to verify smaller modules, such as individual router components. To deliver correctness guarantees for the complete network, we propose a network-level detection and recovery solution that monitors the traffic in the NoC and protects it against escaped functional bugs. To this end, ForEVeR augments the baseline NoC with a lightweight checker network that alerts destination nodes of incoming packets ahead of time. If a bug is detected, flagged by missed packet arrivals, our recovery mechanism delivers the in-flight data safely to the intended destination via the checker network. ForEVeR's experimental evaluation shows that it can recover from NoC design errors at only 4.9% area cost for an 8x8 mesh interconnect, over a time interval ranging from 0.5K to 30K cycles per recovery event, and it incurs no performance overhead in the absence of errors. ForEVeR can also protect NoC operations against soft-errors: a growing concern with the scaling of silicon. ForEVeR leverages the same monitoring hardware to detect soft-error manifestations, in addition to design-errors. Recovery of the soft-error affected packets is guaranteed by building resiliency features into our checker network. ForEVeR incurs minimal performance penalty up to a flit error rate of 0.01% in lightly loaded networks.
- R. Abdel-Khalek, R. Parikh, A. DeOrio, and V. Bertacco. 2011. Functional correctness for CMP interconnects. In Proceedings of the IEEE International Conference on Computer Design. Google Scholar
Digital Library
- K. Aisopos and L.-S. Peh. 2011. A systematic methodology to develop resilient cache coherence protocols. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- K. V. Anjan and Timothy Mark Pinkston. 1995. An efficient, fully adaptive deadlock recovery scheme: DISHA. In Proceedings of the Annual International Symposium on Computer Architecture. Google Scholar
Digital Library
- T. M. Austin. 1999. DIVA: A reliable substrate for deep submicron microarchitecture design. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- A. A. Bayazit and S. Malik. 2005. Complementary use of runtime validation and model checking. In Proceedings of the IEEE International Conference on Computer-Aided Design. Google Scholar
Digital Library
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. Google Scholar
Digital Library
- D. Borrione, A. Helmy, L. Pierre, and J. Schmaltz. 2007. A generic model for formally verifying NoC communication architectures: A case study. In Proceedings of the International Symposium on Networks-on-Chip. Google Scholar
Digital Library
- M. Boule, J.-S. Chenard, and Z. Zilic. 2007. Assertion checkers in verification, silicon debug and in-field diagnosis. In Proceedings of the International Symposium on Quality Electronic Design. Google Scholar
Digital Library
- R. Brayton and A. Mishchenko. 2010. ABC: an academic industrial-strength verification tool. In Proceedings of the International Conference on Computer Aided Verification. Google Scholar
Digital Library
- S. Chatterjee, M. Kishinevsky, and U. Ogras. 2012. xMAS: Quick formal modeling of communication fabrics to enable verification. IEEE Des. Test Comput. 29, 3.Google Scholar
Cross Ref
- W. Dally and B. Towles. 2003. Principles and Practices of Interconnection Networks. Morgan Kaufmann. Google Scholar
Digital Library
- W. J. Dally and C. L. Seitz. 1987. Deadlock-free message routing in multiprocessor interconnection networks. IEEE Trans. Comput. 36, 5. Google Scholar
Digital Library
- A. Dixit and A. Wood. 2011. The impact of new technology on soft error rates. In Proceedings of the IEEE International Reliability Physics Symposium. Google Scholar
Digital Library
- A. Dutta and N. A. Touba. 2007. Reliable network-on-chip using a low cost unequal error protection code. In Proceedings of the 22nd IEEE International Symposium on Defect and Fault-Tolerance in VLSI Systems (DFT'07). Google Scholar
Digital Library
- F. Fazzino, M. Palesi, and D. Patti. Noxim: Network-on-chip simulator. http://noxim.sourceforge.net/.Google Scholar
- H. Foster, L. Loh, B. Rabii, and V. Singhal. 2006. Guidelines for creating a formal verification testplan. In Proceedings of DVCon.Google Scholar
- O. Hammami, X. Li, and J.-M. Brault. 2012. NOCEVE: Network on chip emulation and verification environment. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Google Scholar
Digital Library
- D. Holcomb, B. Brady, and S. A. Seshia. 2011. Abstraction-based performance analysis of NoCs. In Proceedings of the IEEE/ACM Design Automation Conference. Google Scholar
Digital Library
- Intel. 2007. Intel Core2 Duo and Intel Core2 Solo Processor for Intel Centrino Duo Processor technology specification update.Google Scholar
- Intel. 2010. Intel Core i7-900 Desktop processor series specification update.Google Scholar
- A. Kahng, B. Li, L.-S. Peh, and K. Samadi. 2009. Orion 2.0: A fast and accurate NoC power and area model for early-stage design space exploration. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Google Scholar
Digital Library
- K. Kailas, V. Paruthi, and B. Monwai. 2009. Formal verification of correctness and performance of random prioritybased arbiters. In Proceedings of the International Conference on Formal Methods in Computer-Aided Design.Google Scholar
- J. Kim and H. Kim. 2009. Router microarchitecture and scalability of ring topology in on-chip networks. In Proceedings of the 2nd International Workshop on Network on Chip Architectures. Google Scholar
Digital Library
- T. Krishna, C.-H. O. Chen, W. C. Kwon, and L.-S. Peh. 2013. Breaking the on-chip latency barrier using SMART. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
- X. Lin, P. McKinley, and L. Ni. 1994. Deadlock-free multicast wormhole routing in 2-d mesh multicomputers. IEEE Trans. Parallel Distrib. Syst. 5, 8. Google Scholar
Digital Library
- P. Lopez, J. M. Martínez, and J. Duato. 1998. A very efficient distributed deadlock detection mechanism for wormhole networks. In Proceedings of the International Symposium on High-Performance Computer Architecture. Google Scholar
Digital Library
- J.-M. Martínez, P. Lopez, J. Duato, and T. Pinkston. 1997. Software-based deadlock recovery technique for true fully adaptive routing in wormhole networks. In Proceedings of the International Conference on Parallel Processing. Google Scholar
Digital Library
- A. Meixner, M. E. Bauer, and D. Sorin. 2007. Argus: Low-cost, comprehensive error detection in simple cores. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- S. Murali, T. Theocharides, N. Vijaykrishnan, M. Irwin, L. Benini, and G. De Micheli. 2005. Analysis of error recovery schemes for networks on chips. IEEE Des. Test Comput. 22, 5. Google Scholar
Digital Library
- E. B. Nightingale, J. R. Douceur, and V. Orgovan. 2011. Cycles, cells and platters: An empirical analysis of hardware failures on a million consumer PCs. In Proceedings of the 6th Conference on Computer Systems (EuroSys'11). Google Scholar
Digital Library
- G. Nychis, C. Fallin, T. Moscibroda, and O. Mutlu. 2010. Next generation on-chip networks: What kind of congestion control do we need? In Proceedings of the ACM Workshop on Hot Topics in Networks. Google Scholar
Digital Library
- D. Park, C. Nicopoulos, J. Kim, N. Vijaykrishnan, and C. R. Das. 2006. Exploring fault-tolerant network-onchip architectures. In Proceedings of the International Conference on Dependable Systems and Networks. Google Scholar
Digital Library
- L.-S. Peh and W. Dally. 2000. Flit-reservation flow control. In Proceedings of the International Symposium on High-Performance Computer Architecture.Google Scholar
- A. Prodromou, A. Panteli, C. Nicopoulos, and Y. Sazeides. 2012. NoCAlert: An on-line and real-time fault detection mechanism for network-on-chip architectures. In Proceedings of the Annual ACM/IEEE International Symposium on Microarchitecture. Google Scholar
Digital Library
- S. Shamshiri, A. Ghofrani, and K.-T. Cheng. 2011. End-to-end error correction and online diagnosis for on-chip networks. In Proceedings of the International Teletraffic Congress.Google Scholar
Cross Ref
- D. Starobinski, M. Karpovsky, and L. A. Zakrevski. 2003. Application of network calculus to general topologies using turn-prohibition. IEEE/ACM Trans. Networking 11, 3. Google Scholar
Digital Library
- G. Tsiligiannis and L. Pierre. 2012. A mixed verification strategy tailored for networks on chip. In Proceedings of the International Symposium on Networks-on-Chip. Google Scholar
Digital Library
- S. Vangal, J. Howard, G. Ruhl, et al. 2008. An 80-tile sub-100-w teraflops processor in 65-nm cmos. IEEE J. Solid-State Circuits.Google Scholar
- F. Verbeek and J. Schmaltz. 2011. Hunting deadlocks efficiently in microarchitectural models of communication fabrics. In Proceedings of the International Conference on Formal Methods in Computer-Aided Design. Google Scholar
Digital Library
- F. Verbeek and J. Schmaltz. 2012. Easy formal specification and validation of unbounded networks-on-chips architectures. ACM Trans. Des. Autom. Electron. Syst. 17, 1. Google Scholar
Digital Library
- I. Wagner and V. Bertacco. 2007. Engineering trust with semantic guardians. In Proceedings of the Conference and Exhibition on Design, Automation and Test in Europe. Google Scholar
Digital Library
- D. Wentzlaff, P. Griffin, H. Hoffmann, L. Bao, B. Edwards, C. Ramey, M. Mattina, C.-C. Miao, J. Brown, and A. Agarwal. 2007. On-chip interconnection architecture of the tile processor. IEEE Micro 27, 5. Google Scholar
Digital Library
Index Terms
ForEVeR: A complementary formal and runtime verification approach to correct NoC functionality
Recommendations
Formally enhanced runtime verification to ensure NoC functional correctness
MICRO-44: Proceedings of the 44th Annual IEEE/ACM International Symposium on MicroarchitectureAs silicon technology scales, modern processors and embedded systems are rapidly shifting towards complex chip multi-processor (CMP) and system-on-chip (SoC) designs, comprising several processor cores and IP components communicating via a network-on-...
Streamlined network-on-chip for multicore embedded architectures
ARCS'12: Proceedings of the 25th international conference on Architecture of Computing SystemsMPSoCs are becoming complex systems incorporating a large number of compute cores as well as various accelerators and application specific units. To handle the communication in MPSoCs, the Network-on-Chip (NoC) concept has been proposed as a versatile ...
MIRA: A Multi-layered On-Chip Interconnect Router Architecture
Recently, Network-on-Chip (NoC) architectures have gained popularity to address the interconnect delay problem for designing CMP / multi-core / SoC systems in deep sub-micron technology. However, almost all prior studies have focused on 2D NoC designs. ...






Comments