skip to main content
research-article

NUMA-aware scheduling and memory allocation for data-flow task-parallel applications

Published:27 February 2016Publication History
Skip Abstract Section

Abstract

Dynamic task parallelism is a popular programming model on shared-memory systems. Compared to data parallel loop-based concurrency, it promises enhanced scalability, load balancing and locality. These promises, however, are undermined by non-uniform memory access (NUMA) systems. We show that it is possible to preserve the uniform hardware abstraction of contemporary task-parallel programming models, for both computing and memory resources, while achieving near-optimal data locality. Our run-time algorithms for NUMA-aware task and data placement are fully automatic, application-independent, performance-portable across NUMA machines, and adapt to dynamic changes. Placement decisions use information about inter-task data dependences and reuse. This information is readily available in the run-time systems of modern task-parallel programming frameworks, and from the operating system regarding the placement of previously allocated memory. Our algorithms take advantage of data-flow style task parallelism, where the privatization of task data enhances scalability through the elimination of false dependences and enables fine-grained dynamic control over the placement of application data. We demonstrate that the benefits of dynamically managing data placement outweigh the privatization cost, even when comparing with target-specific optimizations through static, NUMA-aware data interleaving. Our implementation and the experimental evaluation on a set of high-performance benchmarks executing on a 192-core system with 24 NUMA nodes show that the fraction of local memory accesses can be increased to more than 99%, resulting in a speedup of up to 5× compared to a NUMA-aware hierarchical work-stealing baseline.

References

  1. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proc. of the 5th ACM SIGPLAN Symp. on Princ. and Pract. of Par. Programming, PPOPP '95, pages 207--216, New York, NY, USA, 1995. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: An object-oriented approach to nonuniform cluster computing. In Proc. of the 20th Annual ACM SIGPLAN Conf. on OOP, Systems, Languages, and Applications, OOPSLA '05, pages 519--538, New York, NY, USA, 2005. ACM. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. A. Drebes, A. Pop, K. Heydemann, A. Cohen, and N. Drach. Topology-aware and dependence-aware scheduling and memory allocation for task-parallel languages. ACM TACO, 11(3):30:1--30:25, Aug. 2014. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. OpenMP Architecture Review Board. OpenMP API Version 4.0, July 2013.Google ScholarGoogle Scholar
  5. J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta. Hierarchical task-based programming with StarSs. Intl. J. on HPC Arch., 23(3):284--299, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. A. Pop and A. Cohen. OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM TACO, 9(4):53:1--53:25, Jan. 2013. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. P. Pratikakis, H. Vandierendonck, S. Lyberis, and D. S. Nikolopoulos. A programming model for deterministic task parallelism. In Proc. of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness, MSPC '11, pages 7--12, New York, NY, USA, 2011. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 51, Issue 8
    PPoPP '16
    August 2016
    405 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3016078
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '16: Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
      February 2016
      420 pages
      ISBN:9781450340922
      DOI:10.1145/2851141

    Copyright © 2016 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 27 February 2016

    Check for updates

    Qualifiers

    • research-article

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!