skip to main content
research-article
Free Access

Defragmentation of Tasks in Many-Core Architecture

Published:13 March 2017Publication History
Skip Abstract Section

Abstract

Many-cores can execute multiple multithreaded tasks in parallel. A task performs most efficiently when it is executed over a spatially connected and compact subset of cores so that performance loss due to communication overhead imposed by the task’s threads spread across the allocated cores is minimal. Over a span of time, unallocated cores can get scattered all over the many-core, creating fragments in the task mapping. These fragments can prevent efficient contiguous mapping of incoming new tasks leading to loss of performance. This problem can be alleviated by using a task defragmenter, which consolidates smaller fragments into larger fragments wherein the incoming tasks can be efficiently executed. Optimal defragmentation of a many-core is an NP-hard problem in the general case. Therefore, we simplify the original problem to a problem that can be solved optimally in polynomial time. In this work, we introduce a concept of exponentially separable mapping (ESM), which defines a set of task mapping constraints on a many-core. We prove that an ESM enforcing many-core can be defragmented optimally in polynomial time.

References

  1. Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 2008 Conference on Parallel Architectures and Compilation Techniques (PACT’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, et al. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’11). ACM, New York, NY, 52. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. V. Catania, A. Mineo, S. Monteleone, M. Palesi, and D. Patti. 2015. Noxim: An open, extensible and cycle-accurate network on chip simulator. In 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 162--163.Google ScholarGoogle Scholar
  5. Sangyeun Cho and Lei Jin. 2006. Managing distributed, shared L2 caches through OS-level page allocation. In Proceedings of the International Symposium on Microarchitecture (MICRO’06). Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Erik D. Demaine and Martin L. Demaine. 2007. Jigsaw puzzles, edge matching, and polyomino packing: Connections and complexity. Graphs and Combinatorics 23, 1, 195--208. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Erik D. Demaine and Michael Hoffmann. 2001. Pushing blocks is NP-complete for noncrossing solution paths. In Proceedings of the Canadian Conference on Computational Geometry.Google ScholarGoogle Scholar
  8. Pierre-François Dutot, Grégory Mounié, and Denis Trystram. 2004. Scheduling parallel tasks: Approximation algorithms. In Handbook of Scheduling: Algorithms, Models, and Performance Analysis. CRC Press, Boca Raton, FL, 26-1.Google ScholarGoogle Scholar
  9. T. Ebi, M. Faruque, and J. Henkel. 2009. TAPE: Thermal-aware agent-based power economy multi/many-core architectures. In Proceedings of the 2009 International Conference on Computer Aided Design (ICCAD’09). Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Al Faruque, Mohammad Abdullah, Rudolf Krist, and Jörg Henkel. 2008. ADAM: Run-time agent-based distributed application mapping for on-chip communication. In Proceedings of the 2008 Design Automation Conference (DAC’08). Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Mohammad Fattah, Masoud Daneshtalab, Pasi Liljeberg, and Juha Plosila. 2013. Smart hill climbing for agile dynamic mapping in many-core systems. In Proceedings of the 2013 Design Automation Conference (DAC’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Mohammad Fattah, Pasi Liljeberg, Juha Plosila, and Hannu Tenhunen. 2014. Adjustable contiguity of run-time task allocation in networked many-core systems. In Proceedings of the 2014 Asia and South Pacific Design Automation Conference (ASP-DAC’14). Google ScholarGoogle ScholarCross RefCross Ref
  13. Dror G. Feitelson and Larry Rudolph. 1998. Metrics and benchmarking for parallel job scheduling. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing (IPPS/SPDP’98). 1--24. Google ScholarGoogle ScholarCross RefCross Ref
  14. Mo Guan and Minghai Gu. 2010. Design and implementation of an embedded Web server based on ARM. In Proceedings of the 2010 International Conference on Software Engineering and Service Sciences (ICSESS’10). Google ScholarGoogle ScholarCross RefCross Ref
  15. Jörg Henkel, Andreas Herkersdorf, Lars Bauer, Thomas Wild, Michael Hübner, Ravi Kumar Pujari, Artjom Grudnitsky, et al. 2012. Invasive manycore architectures. In Proceedings of the 2012 Asia and South Pacific Design Automation Conference (ASP-DAC’12). Google ScholarGoogle ScholarCross RefCross Ref
  16. Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Liu, and Onur Mutlu. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In Proceedings of the 2012 International Symposium on Computer Architecture (ISCA’12). IEEE, Los Alamitos, CA. Google ScholarGoogle ScholarCross RefCross Ref
  17. Kenneth C. Knowlton. 1965. A fast storage allocator. Communications of the ACM 8, 10, 623--624. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Keqin Li and Kam Hoi Cheng. 1990. A two dimensional buddy system for dynamic resource allocation in a partitionable mesh connected system. In Proceedings of the 1990 Annual Computer Science Conference (CSC’90). Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the International Conference on Supercomputing (SC’07). Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Jim Ng, Xiaohang Wang, Amit Kumar Singh, and Terrence Mak. 2016. Defragmentation for efficient runtime resource management in NoC-based many-core systems. Transactions on Very Large Scale Integration (VLSI) Systems 24, 11, 3359--3372.Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Venkatesh Pallipadi and Alexey Starikovskiy. 2006. The ondemand governor. In Proceedings of the 2006 Linux Symposium.Google ScholarGoogle Scholar
  22. Anuj Pathania, Vanchinathan Venkataramani, Muhammad Shafique, Tulika Mitra, and Jörg Henkel. 2016. Distributed scheduling for many-cores using cooperative game theory. In Proceedings of the 2016 Design Automation Conference (DAC’16). Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Vijay Janapa Reddi, Alex Settle, Daniel A. Connors, and Robert S. Cohn. 2004. PIN: A binary instrumentation tool for computer architecture research and education. In Proceedings of the 2004 International Symposium on Computer Architecture (ISCA’04). Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. 2013. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 2013 Design Automation Conference (DAC’13). Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. Neil J. A. Sloane. 2003. The On-Line Encyclopedia of Integer Sequences. Retrieved February 14, 2017, from http://oeis.org.Google ScholarGoogle Scholar
  26. Jonathan A. Winter, David H. Albonesi, and Christine A. Shoemaker. 2010. Scalable thread scheduling and global power management for heterogeneous many-core architectures. In Proceedings of the 2010 International Conference on Parallel Architectures and Compilation Techniques (PACT’10). Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Defragmentation of Tasks in Many-Core Architecture

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM Transactions on Architecture and Code Optimization
      ACM Transactions on Architecture and Code Optimization  Volume 14, Issue 1
      March 2017
      258 pages
      ISSN:1544-3566
      EISSN:1544-3973
      DOI:10.1145/3058793
      Issue’s Table of Contents

      Copyright © 2017 ACM

      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 13 March 2017
      • Accepted: 1 November 2016
      • Revised: 1 October 2016
      • Received: 1 June 2016
      Published in taco Volume 14, Issue 1

      Permissions

      Request permissions about this article.

      Request Permissions

      Check for updates

      Qualifiers

      • research-article
      • Research
      • Refereed

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader