Abstract
Many-cores can execute multiple multithreaded tasks in parallel. A task performs most efficiently when it is executed over a spatially connected and compact subset of cores so that performance loss due to communication overhead imposed by the task’s threads spread across the allocated cores is minimal. Over a span of time, unallocated cores can get scattered all over the many-core, creating fragments in the task mapping. These fragments can prevent efficient contiguous mapping of incoming new tasks leading to loss of performance. This problem can be alleviated by using a task defragmenter, which consolidates smaller fragments into larger fragments wherein the incoming tasks can be efficiently executed. Optimal defragmentation of a many-core is an NP-hard problem in the general case. Therefore, we simplify the original problem to a problem that can be solved optimally in polynomial time. In this work, we introduce a concept of exponentially separable mapping (ESM), which defines a set of task mapping constraints on a many-core. We prove that an ESM enforcing many-core can be defragmented optimally in polynomial time.
- Christian Bienia, Sanjeev Kumar, Jaswinder Pal Singh, and Kai Li. 2008. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 2008 Conference on Parallel Architectures and Compilation Techniques (PACT’08). Google Scholar
Digital Library
- Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, et al. 2011. The gem5 simulator. SIGARCH Computer Architecture News 39, 2, 1--7. Google Scholar
Digital Library
- Trevor E. Carlson, Wim Heirman, and Lieven Eeckhout. 2011. Sniper: Exploring the level of abstraction for scalable and accurate parallel multi-core simulation. In Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’11). ACM, New York, NY, 52. Google Scholar
Digital Library
- V. Catania, A. Mineo, S. Monteleone, M. Palesi, and D. Patti. 2015. Noxim: An open, extensible and cycle-accurate network on chip simulator. In 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE, 162--163.Google Scholar
- Sangyeun Cho and Lei Jin. 2006. Managing distributed, shared L2 caches through OS-level page allocation. In Proceedings of the International Symposium on Microarchitecture (MICRO’06). Google Scholar
Digital Library
- Erik D. Demaine and Martin L. Demaine. 2007. Jigsaw puzzles, edge matching, and polyomino packing: Connections and complexity. Graphs and Combinatorics 23, 1, 195--208. Google Scholar
Digital Library
- Erik D. Demaine and Michael Hoffmann. 2001. Pushing blocks is NP-complete for noncrossing solution paths. In Proceedings of the Canadian Conference on Computational Geometry.Google Scholar
- Pierre-François Dutot, Grégory Mounié, and Denis Trystram. 2004. Scheduling parallel tasks: Approximation algorithms. In Handbook of Scheduling: Algorithms, Models, and Performance Analysis. CRC Press, Boca Raton, FL, 26-1.Google Scholar
- T. Ebi, M. Faruque, and J. Henkel. 2009. TAPE: Thermal-aware agent-based power economy multi/many-core architectures. In Proceedings of the 2009 International Conference on Computer Aided Design (ICCAD’09). Google Scholar
Digital Library
- Al Faruque, Mohammad Abdullah, Rudolf Krist, and Jörg Henkel. 2008. ADAM: Run-time agent-based distributed application mapping for on-chip communication. In Proceedings of the 2008 Design Automation Conference (DAC’08). Google Scholar
Digital Library
- Mohammad Fattah, Masoud Daneshtalab, Pasi Liljeberg, and Juha Plosila. 2013. Smart hill climbing for agile dynamic mapping in many-core systems. In Proceedings of the 2013 Design Automation Conference (DAC’13). Google Scholar
Digital Library
- Mohammad Fattah, Pasi Liljeberg, Juha Plosila, and Hannu Tenhunen. 2014. Adjustable contiguity of run-time task allocation in networked many-core systems. In Proceedings of the 2014 Asia and South Pacific Design Automation Conference (ASP-DAC’14). Google Scholar
Cross Ref
- Dror G. Feitelson and Larry Rudolph. 1998. Metrics and benchmarking for parallel job scheduling. In Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing (IPPS/SPDP’98). 1--24. Google Scholar
Cross Ref
- Mo Guan and Minghai Gu. 2010. Design and implementation of an embedded Web server based on ARM. In Proceedings of the 2010 International Conference on Software Engineering and Service Sciences (ICSESS’10). Google Scholar
Cross Ref
- Jörg Henkel, Andreas Herkersdorf, Lars Bauer, Thomas Wild, Michael Hübner, Ravi Kumar Pujari, Artjom Grudnitsky, et al. 2012. Invasive manycore architectures. In Proceedings of the 2012 Asia and South Pacific Design Automation Conference (ASP-DAC’12). Google Scholar
Cross Ref
- Yoongu Kim, Vivek Seshadri, Donghyuk Lee, Jamie Liu, and Onur Mutlu. 2012. A case for exploiting subarray-level parallelism (SALP) in DRAM. In Proceedings of the 2012 International Symposium on Computer Architecture (ISCA’12). IEEE, Los Alamitos, CA. Google Scholar
Cross Ref
- Kenneth C. Knowlton. 1965. A fast storage allocator. Communications of the ACM 8, 10, 623--624. Google Scholar
Digital Library
- Keqin Li and Kam Hoi Cheng. 1990. A two dimensional buddy system for dynamic resource allocation in a partitionable mesh connected system. In Proceedings of the 1990 Annual Computer Science Conference (CSC’90). Google Scholar
Digital Library
- Tong Li, Dan Baumberger, David A. Koufaty, and Scott Hahn. 2007. Efficient operating system scheduling for performance-asymmetric multi-core architectures. In Proceedings of the International Conference on Supercomputing (SC’07). Google Scholar
Digital Library
- Jim Ng, Xiaohang Wang, Amit Kumar Singh, and Terrence Mak. 2016. Defragmentation for efficient runtime resource management in NoC-based many-core systems. Transactions on Very Large Scale Integration (VLSI) Systems 24, 11, 3359--3372.Google Scholar
Digital Library
- Venkatesh Pallipadi and Alexey Starikovskiy. 2006. The ondemand governor. In Proceedings of the 2006 Linux Symposium.Google Scholar
- Anuj Pathania, Vanchinathan Venkataramani, Muhammad Shafique, Tulika Mitra, and Jörg Henkel. 2016. Distributed scheduling for many-cores using cooperative game theory. In Proceedings of the 2016 Design Automation Conference (DAC’16). Google Scholar
Digital Library
- Vijay Janapa Reddi, Alex Settle, Daniel A. Connors, and Robert S. Cohn. 2004. PIN: A binary instrumentation tool for computer architecture research and education. In Proceedings of the 2004 International Symposium on Computer Architecture (ISCA’04). Google Scholar
Digital Library
- Amit Kumar Singh, Muhammad Shafique, Akash Kumar, and Jörg Henkel. 2013. Mapping on multi/many-core systems: Survey of current and emerging trends. In Proceedings of the 2013 Design Automation Conference (DAC’13). Google Scholar
Digital Library
- Neil J. A. Sloane. 2003. The On-Line Encyclopedia of Integer Sequences. Retrieved February 14, 2017, from http://oeis.org.Google Scholar
- Jonathan A. Winter, David H. Albonesi, and Christine A. Shoemaker. 2010. Scalable thread scheduling and global power management for heterogeneous many-core architectures. In Proceedings of the 2010 International Conference on Parallel Architectures and Compilation Techniques (PACT’10). Google Scholar
Digital Library
Index Terms
Defragmentation of Tasks in Many-Core Architecture
Recommendations
From GPGPU to Many-Core: Nvidia Fermi and Intel Many Integrated Core Architecture
Comparing the architectures and performance levels of an Nvidia Fermi accelerator with an Intel MIC Architecture coprocessor demonstrates the benefit of the coprocessor for bringing highly parallel applications into, or even beyond, GPGPU performance ...
Using many-core coprocessor to boost up Erlang VM
Erlang '13: Proceedings of the twelfth ACM SIGPLAN workshop on ErlangThe trend in processor design is to build more cores on a single chip. Commercial many-core processor is emerging these years. Intel Xeon Phi coprocessor , which is equipped with at least 60 relatively slow cores, is the first commercial many-core ...
High performance in silico virtual drug screening on many-core processors
Drug screening is an important part of the drug development pipeline for the pharmaceutical industry. Traditional, lab-based methods are increasingly being augmented with computational methods, ranging from simple molecular similarity searches through ...





Comments