skip to main content
poster
Public Access

Performance challenges in modular parallel programs

Published:10 February 2018Publication History
Skip Abstract Section

Abstract

Over the past decade, many programming languages and systems for parallel-computing have been developed, including Cilk, Fork/Join Java, Habanero Java, Parallel Haskell, Parallel ML, and X10. Although these systems raise the level of abstraction at which parallel code are written, performance continues to require the programmer to perform extensive optimizations and tuning, often by taking various architectural details into account. One such key optimization is granularity control, which requires the programmer to determine when and how parallel tasks should be sequentialized.

In this paper, we briefly describe some of the challenges associated with automatic granularity control when trying to achieve portable performance for parallel programs with arbitrary nesting of parallel constructs. We consider a result from the functional-programming community, whose starting point is to consider an "oracle" that can predict the work of parallel codes, and thereby control granularity. We discuss the challenges in implementing such an oracle and proving that it has the desired theoretical properties under the nested-parallel programming model.

References

  1. U. A. Acar, A. Charguéraud, and M. Rainey. 2016. Oracle-guided scheduling for controlling granularity in implicitly parallel languages. JFP 26 (2016).Google ScholarGoogle Scholar
  2. A. Duran, J. Corbalan, and E. Ayguade. 2008. An adaptive cut-off for task parallelism. In SC. 1--11. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Intel. 2011. Intel Threading Building Blocks. (2011). https://www.threadingbuildingblocks.org/.Google ScholarGoogle Scholar
  4. S. Iwasaki and K. Taura. 2016. A static cut-off for task parallel programs. In PACT. 139--150. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. E. Mohr, D. A. Kranz, and R. H. Halstead. 1991. Lazy task creation: a technique for increasing the granularity of parallel programs. IEEE TPDS 2, 3 (1991), 264--280. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. J. Pehoushek and J. Weening. 1990. Low-cost process creation and dynamic partitioning in Qlisp. In LNCS. Vol. 441.182--199. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. A. Tzannes, G. C. Caragea, U. Vishkin, and R. Barua. 2014. Lazy Scheduling: A Runtime Adaptive Scheduler for Declarative Parallelism. ACM TOPLAS 36, 3 (2014), 10:1--10:51. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. J. S. Weening. 1989. Parallel Execution of Lisp Programs. Ph.D. Dissertation. Google ScholarGoogle ScholarDigital LibraryDigital Library

Recommendations

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

  • Published in

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 53, Issue 1
    PPoPP '18
    January 2018
    426 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/3200691
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '18: Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
      February 2018
      442 pages
      ISBN:9781450349826
      DOI:10.1145/3178487

    Copyright © 2018 ACM

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    • Published: 10 February 2018

    Check for updates

    Qualifiers

    • poster

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader
About Cookies On This Site

We use cookies to ensure that we give you the best experience on our website.

Learn more

Got it!