skip to main content
research-article

A Hierarchical Distributed Runtime Resource Management Scheme for NoC-Based Many-Cores

Published:23 April 2018Publication History
Skip Abstract Section

Abstract

As technology constantly strengthens its presence in all aspects of human life, computing systems integrate a high number of processing cores, whereas applications become more complex and greedy for computational resources. Inevitably, this high increase in processing elements combined with the unpredictable resource requirements of executed applications at design time impose new design constraints to resource management of many-core systems, turning the distributed functionality into a necessity. In this work, we present a distributed runtime resource management framework for many-core systems utilizing a network-on-chip (NoC) infrastructure. Specifically, we couple the concept of distributed management with parallel applications by assigning different roles to the available computing resources. The presented design is based on the idea of local controllers and managers, whereas an on-chip intercommunication scheme ensures decision distribution. The evaluation of the proposed framework was performed on an Intel Single-Chip Cloud Computer, an actual NoC-based, many-core system. Experimental results show that the proposed scheme manages to allocate resources efficiently at runtime, leading to gains of up to 30% in application execution latency compared to relevant state-of-the-art distributed resource management frameworks.

References

  1. Mohammad Abdullah Al Faruque, Rudolf Krist, and Jórg Henkel. 2008. ADAM: Run-time agent-based distributed application mapping for on-chip communication. In Proceedings of the 45th ACM/IEEE Design Automation Conference (DAC’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Iraklis Anagnostopoulos, Alexandros Bartzas, Georgios Kathareios, and Dimitrios Soudris. 2012. A divide and conquer based distributed run-time mapping methodology for many-core platforms. In Proceedings of the Conference on Design, Automation, and Test in Europe. 111--116. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Iraklis Anagnostopoulos, Vasileios Tsoutsouras, Alexandros Bartzas, and Dimitrios Soudris. 2013. Distributed run-time resource management for malleable applications on many-core platforms. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, 168. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Tatsumi Aoyama, Ken-Ichi Ishikawa, Yasuyuki Kimura, Hideo Matsufuru, Atsushi Sato, Tomohiro Suzuki, and Sunao Torii. 2016. First application of lattice QCD to Pezy-SC processor. Procedia Computer Science 80, 1418--1427. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Dimitra Azariadi, Vasileios Tsoutsouras, Sotirios Xydis, and Dimitrios Soudris. 2016. ECG signal analysis and arrhythmia detection on IoT wearable medical devices. In Proceedings of the 2016 5th International Conferrence on Modern Circuits and Systems Technologies (MOCAST’16). IEEE, Los Alamitos, CA, 1--4.Google ScholarGoogle ScholarCross RefCross Ref
  6. Antonio Barbalace, Binoy Ravindran, and David Katz. 2014. Popcorn: A replicated-kernel OS based on Linux. In Proceedings of the 2014 Linux Symposium.Google ScholarGoogle Scholar
  7. Andrew Baumann, Paul Barham, Pierre-Evariste Dagand, Tim Harris, Rebecca Isaacs, Simon Peter, Timothy Roscoe, Adrian Schüpbach, and Akhilesh Singhania. 2009. The multikernel: A new OS architecture for scalable multicore systems. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles. ACM, New York, NY, 29--44. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Adam Beguelin, Erik Seligman, and Peter Stephan. 1997. Application level fault tolerance in heterogeneous networks of workstations. Journal of Parallel and Distributed Computing 43, 2, 147--155. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, NY, 72--81. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Tobias Bjerregaard and Shankar Mahadevan. 2006. A survey of research and practices of network-on-chip. ACM Computing Surveys 38, 1, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Brent Bohnenstiehl, Aaron Stillmaker, Jon Pimentel, Timothy Andreas, Bin Liu, Anh Tran, Emmanuel Adeagbo, and Bevan Baas. 2016. A 5.8 pJ/Op 115 billion ops/sec, to 1.78 trillion ops/sec 32nm 1000-processor array. In Proceedings of the 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits’16). IEEE, Los Alamitos, CA, 1--2.Google ScholarGoogle ScholarCross RefCross Ref
  12. Chih-Chung Chang and Chih-Jen Lin. 2011. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology 2, 3, 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Z. Chen and D. Marculescu. 2015. Distributed reinforcement learning for power limited many-core system performance optimization. In Proceedings of the 2015 Conference on Design, Automation, and Test in Europe (DATE’15). 1521--1526. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Jules L. Coleman. 1979. Efficiency, utility, and wealth maximization. Hofstra Law Review 8, 509.Google ScholarGoogle Scholar
  15. Juan A. Colmenares, Gage Eads, Steven Hofmeyr, Sarah Bird, Miquel Moretó, David Chou, Brian Gluzman, et al. 2013. Tessellation: Refactoring the OS around explicit resource containers with continuous adaptation. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, 76. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Yingnan Cui, Wei Zhang, and Hao Yu. 2012. Decentralized agent based re-clustering for task mapping of tera-scale network-on-chip system. In Proceedings of the 2012 IEEE International Symposium on Circuits and Systems (ISCAS’12). IEEE, Los Alamitos, CA, 2437--2440.Google ScholarGoogle ScholarCross RefCross Ref
  17. Travis Desell, Kaoutar El Maghraoui, and Carlos A. Varela. 2007. Malleable applications for scalable high performance computing. Cluster Computing 10, 3, 323--337. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Bryan Donyanavard, Tiago Mück, Santanu Sarma, and Nikil Dutt. 2016. SPARTA: Runtime task allocation for energy efficient heterogeneous many-cores. In Proceedings of the 11th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. ACM, New York, NY, 27. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Allen B. Downey. 1997. A Model for Speedup of Parallel Programs. Technical Report. University of California at Berkeley. Google ScholarGoogle Scholar
  20. Mohammad Fattah, Masoud Daneshtalab, Pasi Liljeberg, and Juha Plosila. 2013. Smart hill climbing for agile dynamic mapping in many-core systems. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, 39. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mohammad Fattah, Maurizio Palesi, Pasi Liljeberg, Juha Plosila, and Hannu Tenhunen. 2014. Shifa: System-level hierarchy in run-time fault-aware management of many-core systems. In Proceedings of the 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC’14). IEEE, Los Alamitos, CA, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. Dror G. Feitelson and Larry Rudolph. 1996. Toward convergence in job schedulers for parallel supercomputers. In Job Scheduling Strategies for Parallel Processing. Lecture Notes in Computer Science, Vol. 1162. Springer, 1--26. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Tobias Fleig, Oliver Mattes, and Wolfgang Karl. 2014. Evaluation of adaptive memory management techniques on the Tilera Tile-GX platform. In Proceedings of the 2014 Workshop on Architecture of Computing Systems (ARCS’14). 1--8.Google ScholarGoogle Scholar
  24. Marti A. Hearst, Susan T. Dumais, Edgar Osuna, John Platt, and Bernhard Scholkopf. 1998. Support vector machines. IEEE Intelligent Systems and Their Applications 13, 4, 18--28. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Jason Howard, Saurabh Dighe, Yatin Hoskote, Sriram Vangal, David Finan, Gregory Ruhl, David Jenkins, et al. 2010. A 48-core IA-32 message-passing processor with DVFS in 45nm CMOS. In Proceedings of the 2010 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’10). IEEE, Los Alamitos, CA, 108--109.Google ScholarGoogle ScholarCross RefCross Ref
  26. Engin Ipek, Bronis R. De Supinski, Martin Schulz, and Sally A. McKee. 2005. An approach to performance prediction for parallel applications. In Proceedings of the 2005 European Conference on Parallel Processing. 196--205. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. James Jeffers and James Reinders. 2013. Intel Xeon Phi Coprocessor High-Performance Programming. Newnes. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. David Katz, Antonio Barbalace, Saif Ansary, Akshay Ravichandran, and Binoy Ravindran. 2015. Thread migration in a replicated-kernel OS. In Proceedings of the 2015 IEEE 35th International Conference on Distributed Computing Systems (ICDCS’15). IEEE, Los Alamitos, CA.Google ScholarGoogle ScholarCross RefCross Ref
  29. Sebastian Kobbe, Lars Bauer, and Jörg Henkel. 2015. Adaptive on-the-fly application performance modeling for many cores. In Proceedings of the 2015 Design, Automation, and Test in Europe Conference and Exhibition. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. Sebastian Kobbe, Lars Bauer, Daniel Lohmann, Wolfgang Schröder-Preikschat, and Jörg Henkel. 2011. DistRM: Distributed resource management for on-chip many-core systems. In Proceedings of the 7th IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis. ACM, New York, NY, 119--128. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Samuel Kounev, Fabian Brosig, Nikolaus Huber, and Ralf Reussner. 2010. Towards self-aware performance and resource management in modern service-oriented systems. In Proceedings of the 2010 IEEE International Conference on Services Computing (SCC’10). Google ScholarGoogle ScholarDigital LibraryDigital Library
  32. George Kurian, Jason E. Miller, James Psota, Jonathan Eastep, Jifeng Liu, Jurgen Michel, Lionel C. Kimerling, and Anant Agarwal. 2010. ATAC: A 1000-core cache-coherent processor with on-chip optical network. In Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. ACM, New York, NY, 477--488. Google ScholarGoogle ScholarDigital LibraryDigital Library
  33. Ong Mao, Frans Kaashoek, Robert Morris, Aleksey Pesterev, Lex Stein, Ming Wu, Yuehua Dai, Yang Zhang, and Zheng Zhang. 2008. Corey: An operating system for many cores. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation (OSDI’08). 43--57. Google ScholarGoogle ScholarDigital LibraryDigital Library
  34. T. Mattson and R. van der Wijngaart. 2010. RCCE: A Small Library for Many-Core Communication. Intel Corporation.Google ScholarGoogle Scholar
  35. V. Nollet, T. Marescaux, P. Avasare, D. Verkest, and J.-Y. Mignolet. 2005. Centralized run-time resource management in a network-on-chip containing reconfigurable hardware tiles. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’05). IEEE, Los Alamitos, CA, 234--239. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. Andreas Olofsson. 2016. Epiphany-V: A 1024 processor 64-bit RISC system-on-chip. arXiv:1610.01832.Google ScholarGoogle Scholar
  37. Anuj Pathania, Vanchinathan Venkataramani, Muhammad Shafique, Tulika Mitra, and Jörg Henkel. 2016. Distributed fair scheduling for many-cores. In Proceedings of the 2016 Conference on Design, Automation, and Test in Europe (DATE’16). 379--384. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. Anuj Pathania, Vanchinathan Venkataramani, Muhammad Shafique, Tulika Mitra, and Jörg Henkel. 2016. Distributed scheduling for many-cores using cooperative game theory. In Proceedings of the 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC’16). IEEE, Los Alamitos, CA, 1--6. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. Anuj Pathania, Vanchinathan Venkataramani, Muhammad Shafique, Tulika Mitra, and Jorg Henkel. 2016. Optimal greedy algorithm for many-core scheduling. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 36, 6, 1054--1058. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. Subramanian Ramachandran and Frank Mueller. 2016. Distributed job allocation for large-scale manycores. In Proceedings of the 2016 International Conference on High Performance Computing. 404--425.Google ScholarGoogle ScholarCross RefCross Ref
  41. Sabela Ramos and Torsten Hoefler. 2013. Modeling communication in cache-coherent SMP systems: A case-study with Xeon Phi. In Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. ACM, New York, NY, 97--108. Google ScholarGoogle ScholarDigital LibraryDigital Library
  42. Barret Rhoden, Kevin Klues, David Zhu, and Eric Brewer. 2011. Improving per-node efficiency in the datacenter with new OS abstractions. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM, New York, NY, 25. Google ScholarGoogle ScholarDigital LibraryDigital Library
  43. G. Sabin, M. Lang, and P. Sadayappan. 2006. Moldable parallel job scheduling using job efficiency: An iterative approach. In Proceedings of the 2006 Workshop on Job Scheduling Strategies for Parallel Processing. 94--114. Google ScholarGoogle ScholarDigital LibraryDigital Library
  44. Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep Dubey, Stephen Junkins, et al. 2008. Larrabee: A many-core x86 architecture for visual computing. ACM Transactions on Graphics 27, 18. Google ScholarGoogle ScholarDigital LibraryDigital Library
  45. Muhammad Shafique, Anton Ivanov, Benjamin Vogel, and Jörg Henkel. 2016. Scalable power management for on-chip systems with malleable applications. IEEE Transactions on Computers 65, 11, 3398--3412. Google ScholarGoogle ScholarDigital LibraryDigital Library
  46. C. Silvano, W. Fornaciari, S. Crespi Reghizzi, G. Agosta, G. Palermo, V. Zaccaria, P. Bellasi, et al. 2011. Parallel paradigms and run-time management techniques for many-core architectures: The 2PARMA approach. In Proceedings of the 2011 9th IEEE International Conference on Industrial Informatics. IEEE, Los Alamitos, CA, 835--840.Google ScholarGoogle ScholarCross RefCross Ref
  47. Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. 2013. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, 1. Google ScholarGoogle ScholarDigital LibraryDigital Library
  48. Inderpreet Singh, Arrvindh Shriraman, Wilson W. L. Fung, Mike O’Connor, and Tor M. Aamodt. 2013. Cache coherence for GPU architectures. In Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture. Google ScholarGoogle ScholarDigital LibraryDigital Library
  49. Vasileios Tsoutsouras, Sotirios Xydis, and Dimitrios Soudris. 2015. Job-arrival aware distributed run-time resource management on Intel SCC manycore platform. In Proceedings of the 2015 IEEE 13th International Conference on Embedded and Ubiquitous Computing (EUC’15). IEEE, Los Alamitos, CA, 17--24. Google ScholarGoogle ScholarDigital LibraryDigital Library
  50. S. S. Vadhiyar and J. Dongarra. 2003. SRS: A framework for developing malleable and migratable parallel applications for distributed systems. Parallel Processing Letters 13, 291--312.Google ScholarGoogle ScholarCross RefCross Ref
  51. Sriram Vangal, Jason Howard, Gregory Ruhl, Saurabh Dighe, Howard Wilson, James Tschanz, David Finan, et al. 2007. An 80-tile 1.28 TFLOPS network-on-chip in 65nm CMOS. In Proceedings of the 2007 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC’07). IEEE, Los Alamitos, CA, 98--589.Google ScholarGoogle ScholarCross RefCross Ref
  52. David Wentzlaff and Anant Agarwal. 2009. Factored operating systems (fos): The case for a scalable operating system for multicores. ACM SIGOPS Operating Systems Review 43, 2, 76--85. Google ScholarGoogle ScholarDigital LibraryDigital Library
  53. Bo Yang, Liang Guang, Tero Säntti, and Juha Plosila. 2013. Mapping multiple applications with unbounded and bounded number of cores on many-core networks-on-chip. Microprocessors and Microsystems 37, 4, 460--471.Google ScholarGoogle ScholarCross RefCross Ref
  54. Lei Yang, Weichen Liu, Weiwen Jiang, Mengquan Li, Juan Yi, and Edwin Hsing-Mean Sha. 2016. Application mapping and scheduling for network-on-chip-based multiprocessor system-on-chip with fine-grain communication optimization. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 10, 3027--3040. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. A Hierarchical Distributed Runtime Resource Management Scheme for NoC-Based Many-Cores

          Recommendations

          Comments

          Login options

          Check if you have access through your login credentials or your institution to get full access on this article.

          Sign in

          Full Access

          PDF Format

          View or Download as a PDF file.

          PDF

          eReader

          View online with eReader.

          eReader
          About Cookies On This Site

          We use cookies to ensure that we give you the best experience on our website.

          Learn more

          Got it!