skip to main content
research-article
Open Access

Resource-Aware Task Scheduling

Published:21 January 2015Publication History
Skip Abstract Section

Abstract

Dependency-aware task-based parallel programming models have proven to be successful for developing efficient application software for multicore-based computer architectures. The programming model is amenable to programmers, thereby supporting productivity, whereas hardware performance is achieved through a runtime system that dynamically schedules tasks onto cores in such a way that all dependencies are respected. However, even if the scheduling is completely successful with respect to load balancing, the scaling with the number of cores may be suboptimal due to resource contention. Here we consider the problem of scheduling tasks not only with respect to their interdependencies but also with respect to their usage of resources, such as memory and bandwidth. At the software level, this is achieved by user annotations of the task resource consumption. In the runtime system, the annotations are translated into scheduling constraints. Experimental results for different hardware, demonstrating performance gains both for model examples and real applications, are presented. Furthermore, we provide a set of tools to detect resource sensitivity and predict the performance improvements that can be achieved by resource-aware scheduling. These tools are solely based on parallel execution traces and require no instrumentation or modification of the application code.

References

  1. Emmanuel Agullo, Jim Demmel, Jack Dongarra, Bilel Hadri, Jakub Kurzak, Julien Langou, Hatem Ltaief, Piotr Luszczek, and Stanimire Tomov. 2009. Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects. Journal of Physics: Conference Series 180, 1, 012037. http://stacks.iop.org/1742-6596/180/i=1/a=012037Google ScholarGoogle ScholarCross RefCross Ref
  2. R. Al-Omary, Guillermo Miranda, Xavier Martorell, Jesus Labarta, Rosa M. Badia, D. Keyes, and Hatem Ltaief. 2013. Dense Cholesky factorization on NUMA architectures with socket-aware work stealing. Submitted.Google ScholarGoogle Scholar
  3. Cédric Augonnet, Samuel Thibault, Raymond Namyst, and Pierre-André Wacrenier. 2011. StarPU: A unified platform for task scheduling on heterogeneous multicore architectures. Concurrency and Computation: Practice and Experience 23, 2, 187--198. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Eduard Ayguadé, Rosa M. Badia, Pieter Bellens, Daniel Cabrera, Alejandro Duran, Roger Ferrer, Marc González, Francisco D. Igual, Daniel Jiménez-González, and Jesús Labarta. 2010. Extending OpenMP to survive the heterogeneous multi-core era. International Journal of Parallel Programming 38, 5--6, 440--459.Google ScholarGoogle ScholarCross RefCross Ref
  5. Major Bhadauria and Sally A. McKee. 2010. An approach to resource-aware co-scheduling for CMPs. In Proceedings of the 24th ACM International Conference on Supercomputing (ICS’10). ACM, New York, NY, 189--199. DOI: http://dx.doi.org/10.1145/1810085.1810113 Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Ramazan Bitirgen, Engin Ipek, and Jose F. Martinez. 2008. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 41). IEEE, Los Alamitos, CA, 318--329. DOI: http://dx.doi.org/10.1109/MICRO.2008.4771801 Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Sergey Blagodurov, Sergey Zhuravlev, and Alexandra Fedorova. 2010. Contention-aware scheduling on multicore systems. ACM Transactions on Computer Systems 28, 4, 8. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Gérman Ceballos and David Black-Schaffer. 2013. Shared resource sensitivity in task-based runtime systems. In Proceedings of the 6th Swedish Workshop on Multicore Computing. 61--64.Google ScholarGoogle Scholar
  9. Kaushik Datta, Mark Murphy, Vasily Volkov, Samuel Williams, Jonathan Carter, Leonid Oliker, David Patterson, John Shalf, and Katherine Yelick. 2008. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures. In Proceedings of the ACM/IEEE Conference on Supercomputing (SC’08). IEEE, Los Alamitos, CA, Article No. 4. http://dl.acm.org/citation.cfm?id=1413370.1413375 Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Alejandro Duran, Eduard Ayguadé, Rosa M. Badia, Jesús Labarta, Luis Martinell, Xavier Martorell, and Judit Planas. 2011. OmpSs: A proposal for programming heterogeneous multi-core architectures. Parallel Processing Letters 21, 2, 173--193.Google ScholarGoogle ScholarCross RefCross Ref
  11. Pontus Ekberg and Wang Yi. 2012. Outstanding paper award: Bounding and shaping the demand of mixed-criticality sporadic tasks. In Proceedings of the 24th Euromicro Conference on Real-Time Systems (ECRTS’12). 135--144. DOI: http://dx.doi.org/10.1109/ECRTS.2012.24 Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. David Eklov, Nikos Nikoleris, David Black-Schaffer, and Erik Hagersten. 2011. Cache pirating: Measuring the curse of the shared cache. In Proceedings of the International Conference on Parallel Processing (ICPP’11). 165--175. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Ali El-Moursy, Rajeev Garg, David H. Albonesi, and Sandhya Dwarkadas. 2006. Compatible phase co-scheduling on a CMP of multi-threaded processors. In Proceedings of the 20th International Parallel and Distributed Processing Symposium (IPDPS’06). DOI: http://dx.doi.org/10.1109/IPDPS.2006.1639376 Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. Alexandra Fedorova, Margo Seltzer, Christoper Small, and Daniel Nussbaum. 2005. Performance of multithreaded chip multiprocessors and implications for operating system design. In Proceedings of the Annual Conference on USENIX Annual Technical Conference (ATEC’05). 26. http://dl.acm.org/citation.cfm?id=1247360.1247386 Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. Nan Guan, Martin Stigge, Wang Yi, and Ge Yu. 2009. Cache-aware scheduling and analysis for multicores. In Proceedings of the 7th ACM International Conference on Embedded Software (EMSOFT’09). ACM, New York, NY, 245--254. DOI: http://dx.doi.org/10.1145/1629335.1629369 Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. Charles E. Leiserson. 2010. The Cilk++ concurrency platform. Journal of Supercomputing 51, 3, 244--257. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Cristoph Niethammer, Colin W. Glass, and José Gracia. 2012. Avoiding serialization effects in data/dependency aware task parallel algorithms for spatial decomposition. In Proceedings of the IEEE 10th International Symposium on Parallel and Distributed Processing with Applications (ISPA). 743--748. DOI: http://dx.doi.org/10.1109/ISPA.2012.109 Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. Josep M. Pérez, Rosa M. Badia, and Jesús Labarta. 2008. A dependency-aware task-based programming environment for multi-core architectures. In Proceedings of the IEEE International Conference on Cluster Computing. 142--151.Google ScholarGoogle ScholarCross RefCross Ref
  19. Judit Planas, Rosa M. Badia, Eduard Ayguadé, and Jesús Labarta. 2009. Hierarchical task-based programming with StarSs. International Journal of High Performance Computing Applications 23, 3, 284--299. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. Allan Snavely and Dean M. Tullsen. 2000. Symbiotic jobscheduling for a simultaneous multithreaded processor. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS IX). ACM, New York, NY, 234--244. DOI: http://dx.doi.org/10.1145/378993.379244 Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Martin Tillenius. 2012a. Leveraging Multicore Processors for Scientific Computing. Licentiate thesis. Department of Information Technology, Uppsala University.Google ScholarGoogle Scholar
  22. Martin Tillenius. 2012b. SuperGlue Project. Retrieved October 28, 2014, from http://www.it.uu.se/research/scicomp/software/superglueGoogle ScholarGoogle Scholar
  23. Martin Tillenius and Elisabeth Larsson. 2010. An efficient task-based approach for solving the n-body problem on multicore architectures. In PARA 2010: State of the Art in Scientific and Parallel Computing.Google ScholarGoogle Scholar
  24. Martin Tillenius, Elisabeth Larsson, Rosa M. Badia, and Xavier Martorell. 2013. Resource aware task scheduling. In Proceedings of the 8th International Conference on High-Performance and Embedded Architectures and Compilers (Workshop on Parallel Programming and Run-Time Management Techniques for Many-Core Architectures). ACM, New York, NY.Google ScholarGoogle Scholar
  25. Hans Vandierendonck, George Tzenakis, and Dimitrios S. Nikolopoulos. 2011. A unified scheduler for recursive and task dataflow parallelism. In Proceedings of the International Conference on Parallel Architectures and Compilation Techniques. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. Samuel Williams, Leonid Oliker, Richard Vuduc, John Shalf, Katherine Yelick, and James Demmel. 2009. Optimization of sparse matrix--vector multiplication on emerging multicore platforms. Parallel Computing 35, 3, 178--194. DOI: http://dx.doi.org/10.1016/j.parco.2008.12.006 Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. Asim YarKhan, Jakub Kurzak, and Jack Dongarra. 2011. QUARK Users’ Guide: QUeueing and Runtime for Kernels. Technical Report ICL-UT-11-02. ICL, University of Tennessee, Knoxville, TN.Google ScholarGoogle Scholar
  28. Afshin Zafari, Martin Tillenius, and Elisabeth Larsson. 2012. Programming models based on data versioning for dependency-aware task-based parallelisation. In CSE 2012: The 15th IEEE International Conference on Computational Science and Engineering. Google ScholarGoogle ScholarDigital LibraryDigital Library
  29. Sergey Zhuravlev, Juan Carlos Saez, Sergey Blagodurov, Alexandra Fedorova, and Manuel Prieto. 2012. Survey of scheduling techniques for addressing shared resources in multicore processors. ACM Computing Surveys 45, 1, Article No. 4. DOI: http://dx.doi.org/10.1145/2379776.2379780 Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Resource-Aware Task Scheduling

            Recommendations

            Comments

            Login options

            Check if you have access through your login credentials or your institution to get full access on this article.

            Sign in

            Full Access

            PDF Format

            View or Download as a PDF file.

            PDF

            eReader

            View online with eReader.

            eReader
            About Cookies On This Site

            We use cookies to ensure that we give you the best experience on our website.

            Learn more

            Got it!