skip to main content
research-article

Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

Published:07 October 2019Publication History
Skip Abstract Section

Abstract

Networks-on-chip (NoCs) have become the standard for interconnect solutions in industrial designs ranging from client CPUs to many-core chip-multiprocessors. Since NoCs play a vital role in system performance and power consumption, pre-silicon evaluation environments include cycle-accurate NoC simulators. Long simulations increase the execution time of evaluation frameworks, which are already notoriously slow, and prohibit design-space exploration. Existing analytical NoC models, which assume fair arbitration, cannot replace these simulations since industrial NoCs typically employ priority schedulers and multiple priority classes. To address this limitation, we propose a systematic approach to construct priority-aware analytical performance models using micro-architecture specifications and input traffic. Our approach decomposes the given NoC into individual queues with modified service time to enable accurate and scalable latency computations. Specifically, we introduce novel transformations along with an algorithm that iteratively applies these transformations to decompose the queuing system. Experimental evaluations using real architectures and applications show high accuracy of 97% and up to 2.5× speedup in full-system simulation.

References

  1. N. Agarwal et al. [n.d.]. GARNET: A detailed on-chip network model inside a full-system simulator. In 2009 IEEE intl. symp. on Performance Analysis of Systems and Software. 33--42.Google ScholarGoogle Scholar
  2. I. Awan and R. Fretwell. 2005. Analysis of discrete-time queues with space and service priorities for arbitrary arrival processes. In Parallel and Distributed Systems. Proc. 11th Intl Conf. on, Vol. 2. 115--119.Google ScholarGoogle Scholar
  3. A. Bartolini et al. 2010. A virtual platform environment for exploring power, thermal and reliability management control strategies in high-performance multicores. In Proc. of the Great lakes Symp. on VLSI. 311--316.Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. A. W. Berger and W. Whitt. 2000. Workload bounds in fluid models with priorities. Performance Evaluation 41, 4 (2000), 249--267.Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. D. P. Bertsekas, R. G. Gallager, and P. Humblet. 1992. Data Networks. Vol. 2. Prentice-Hall International New Jersey.Google ScholarGoogle Scholar
  6. C. Bienia, S. Kumar, J. P. Singh, and K. Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proc. of the Intl. Conf. on Parallel Arch. and Compilation Tech. 72--81.Google ScholarGoogle Scholar
  7. N. Binkert et al. 2011. The Gem5 simulator. SIGARCH Comp. Arch. News (May. 2011).Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. P. Bogdan and R. Marculescu. 2011. Non-stationary traffic analysis and its implications on multicore platform design. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 30, 4 (2011), 508--519.Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. G. Bolch, S. Greiner, H. De Meer, and K. S. Trivedi. 2006. Queueing Networks and Markov Chains: Modeling and Performance Evaluation with Computer Science Applications. John Wiley 8 Sons.Google ScholarGoogle Scholar
  10. W. Choi et al. 2017. On-chip communication network for efficient training of deep convolutional networks on heterogeneous manycore systems. IEEE Trans. on Computers 67, 5 (2017), 672--686.Google ScholarGoogle ScholarCross RefCross Ref
  11. A. C. de Melo. 2010. The new linux perf tools. In Linux Kongress, Vol. 18.Google ScholarGoogle Scholar
  12. J. Doweck et al. 2017. Inside 6th-generation intel core: New microarchitecture code-named skylake. IEEE Micro 2 (2017), 52--62.Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. S. Ikehara and M. Miyazaki. [n.d.]. Approximate analysis of queueing networks with nonpreemptive priority scheduling. In Proc. 11th Int. Teletraffic Congr.Google ScholarGoogle Scholar
  14. J. Jeffers, J. Reinders, and A. Sodani. 2016. Intel Xeon Phi Processor High Performance Programming: Knights Landing Edition. Morgan Kaufmann.Google ScholarGoogle Scholar
  15. N. Jiang et al. [n.d.]. A detailed and flexible cycle-accurate network-on-chip simulator. In 2013 IEEE Intl. Symp. on Performance Analysis of Systems and Software (ISPASS). 86--96.Google ScholarGoogle Scholar
  16. X. Jin and G. Min. 2009. Modelling and analysis of priority queueing systems with multi-class self-similar network traffic: A novel and efficient queue-decomposition approach. IEEE Trans. on Communications 57, 5 (2009).Google ScholarGoogle Scholar
  17. J. A. Kahle et al. 2005. Introduction to the cell multiprocessor. IBM journal of Research and Development 49, 4.5 (2005), 589--604.Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. H. Kashif and H. Patel. 2014. Bounding buffer space requirements for real-time priority-aware networks. In Asia and South Pacific Design Autom. Conf. 113--118.Google ScholarGoogle Scholar
  19. C. N. Keltcher, K. J. McGrath, A. Ahmed, and P. Conway. 2003. The AMD opteron processor for multiprocessor servers. IEEE Micro 23, 2 (2003), 66--76.Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. E. Kiasari, Z. Lu, and A. Jantsch. 2013. An analytical latency model for networks-on-chip. IEEE Trans. on Very Large Scale Integration (VLSI) Systems 21, 1 (2013), 113--123.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. R. Leupers et al. 2011. Virtual manycore platforms: Moving towards 100+ processor cores. In Proc. of DATE. 1--6.Google ScholarGoogle ScholarCross RefCross Ref
  22. P. S. Magnusson et al. [n.d.]. Simics: A full system simulation platform. Computer 35, 2 ([n. d.]), 50--58.Google ScholarGoogle Scholar
  23. U. Y. Ogras, P. Bogdan, and R. Marculescu. 2010. An analytical approach for network-on-chip performance analysis. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 29, 12 (2010), 2001--2013.Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. U. Y. Ogras, Y. Emre, J. Xu, T. Kam, and M. Kishinevsky. 2012. Energy-guided exploration of on-chip network design for exa-scale computing. In Proc. of Intl. Workshop on System Level Interconnect Prediction. 24--31.Google ScholarGoogle Scholar
  25. U. Y. Ogras, M. Kishinevsky, and S. Chatterjee. [n.d.]. xPLORE: Communication Fabric Design and Optimization Framework. Developed at Strategic CAD Labs, Intel Corp.Google ScholarGoogle Scholar
  26. P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh. 2005. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers 54, 8 (2005), 1025--1040.Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. A. Patel et al. 2011. MARSS: A full system simulator for multicore x86 CPUs. In Design Autom. Conf. 1050--1055.Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Y. Qian, Z. Lu, and W. Dou. [n.d.]. Analysis of worst-case delay bounds for best-effort communication in wormhole networks on chip. In 2009 3rd ACM/IEEE Interl. Symp. on Networks-on-Chip. 44--53.Google ScholarGoogle Scholar
  29. Z.-L. Qian et al. 2015. A support vector regression (SVR)-based latency model for network-on-chip (NoC) architectures. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems 35, 3 (2015), 471--484.Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. A. Rico et al. 2017. ARM HPC ecosystem and the reemergence of vectors. In Proc. of the Computing Frontiers Conf. ACM, 329--334.Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. E. Rotem and S. P. Engineer. 2015. Intel architecture, code name skylake deep dive: A new architecture to manage power performance and energy efficiency. In Intel Developer Forum.Google ScholarGoogle Scholar
  32. M. P. Singh and M. K. Jain. 2014. Evolution of processor architecture in mobile phones. Intl. Journ. of Computer Applications 90, 4 (2014).Google ScholarGoogle Scholar
  33. J. Walraevens. 2004. Discrete-time Queueing Models with Priorities. Ph.D. Dissertation. Ghent University.Google ScholarGoogle Scholar
  34. P. Wettin et al. 2014. Performance evaluation of wireless NoCs in presence of irregular network routing strategies. In Proc. of the conf. on DATE. 272.Google ScholarGoogle Scholar
  35. Y. Wu et al. 2010. Analytical modelling of networks in multicomputer systems under bursty and batch arrival traffic. The Journ. of Supercomputing 51, 2 (2010), 115--130.Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Venkata Yaswanth Raparti, Nishit Kapadia, and Sudeep Pasricha. 2017. ARTEMIS: An aging-aware runtime application mapping framework for 3D NoC-based chip multiprocessors. IEEE Transactions on Multi-Scale Computing Systems 3, 2 (2017), 72--85.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Analytical Performance Models for NoCs with Multiple Priority Traffic Classes

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format .

      View HTML Format
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!