skip to main content
research-article

Steal Tree: low-overhead tracing of work stealing schedulers

Published:16 June 2013Publication History
Skip Abstract Section

Abstract

Work stealing is a popular approach to scheduling task-parallel programs. The flexibility inherent in work stealing when dealing with load imbalance results in seemingly irregular computation structures, complicating the study of its runtime behavior. In this paper, we present an approach to efficiently trace async-finish parallel programs scheduled using work stealing. We identify key properties that allow us to trace the execution of tasks with low time and space overheads. We also study the usefulness of the proposed schemes in supporting algorithms for data-race detection and retentive stealing presented in the literature. We demonstrate that the perturbation due to tracing is within the variation in the execution time with 99% confidence and the traces are concise, amounting to a few tens of kilobytes per thread in most cases. We also demonstrate that the traces enable significant reductions in the cost of detecting data races and result in low, stable space overheads in supporting retentive stealing for async-finish programs.

References

  1. U. A. Acar, G. E. Blelloch, and R. D. Blumofe. The data locality of work stealing. Theory of Computing Systems, 35(3):321--347, 2002. ISSN 1432-4350.Google ScholarGoogle ScholarCross RefCross Ref
  2. K. Agrawal, J. T. Fineman, and J. Sukha. Nested parallelism in transactional memory. In Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming, PPoPP '08, pages 163--174, 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. E. Ayguadé, N. Copty, A. Duran, J. Hoeflinger, Y. Lin, F. Massaioli, X. Teruel, P. Unnikrishnan, and G. Zhang. The design of OpenMP tasks. Parallel and Distributed Systems, IEEE Transactions on, 20(3): 404--418, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. M. A. Bender, J. T. Fineman, S. Gilbert, and C. E. Leiserson. Onthe-fly maintenance of series-parallel relationships in fork-join multithreaded programs. In Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures, SPAA '04, pages 133--144, 2004. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Blumofe. Executing multithreaded programs efficiently. PhD thesis, Massachusetts Institute of Technology, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming, PPOPP '95, pages 207--216, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. J. F. Box. Guinness, Gosset, Fisher, and Small Samples. Statistical Science, 2(1):45--52, Feb. 1987. ISSN 0883-4237. doi: 10.1214/ss/1177013437.Google ScholarGoogle ScholarCross RefCross Ref
  8. B. Fulgham. Computer language benchmarks game, August 2012. URL http://shootout.alioth.debian.org/.Google ScholarGoogle Scholar
  9. Y. Guo, R. Barik, R. Raman, and V. Sarkar. Work-first and help-first scheduling policies for async-finish task parallelism. In IEEE International Symposium on Parallel & Distributed Processing, IPDPS '09, pages 1--12. IEEE, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. T. Karunaratna. Nondeterminator-3: a provably good data-race detector that runs in parallel. PhD thesis, Massachusetts Institute of Technology, 2005.Google ScholarGoogle Scholar
  11. D. Lea et al. Java specification request 166: Concurrency utilities, 2004.Google ScholarGoogle Scholar
  12. J. Lifflander, S. Krishnamoorthy, and L. V. Kale. Work stealing and persistence-based load balancers for iterative overdecomposed applications. In Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing, HPDC '12, pages 137--148, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. R. Raman, J. Zhao, V. Sarkar, M. Vechev, and E. Yahav. Scalable and precise dynamic datarace detection for structured parallelism. In Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation, PLDI '12, pages 531--542, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. J. Reinders. Intel Threading Building Blocks: Outfitting C++ for Multi-Core Processor Parallelism. O'Reilly Media, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. V. A. Saraswat, V. Sarkar, and C. von Praun. X10: concurrent programming for modern architectures. In Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '07, pages 271--271, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. N. R. Tallent and J. M. Mellor-Crummey. Identifying performance bottlenecks in work-stealing computations. IEEE Computer, 42(12): 44--50, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. C. Wu, A. Kalyanaraman, and W. R. Cannon. PGraph: efficient parallel construction of large-scale protein sequence homology graphs. IEEE Transactions on Parallel and Distributed Systems, 23(10):1923--1933, 2012. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. G. Zheng. Achieving high performance on extremely large parallel machines: performance prediction and load balancing. PhD thesis, Department of Computer Science, University of Illinois at Urbana-Champaign, 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Steal Tree: low-overhead tracing of work stealing schedulers

      Recommendations

      Comments

      Login options

      Check if you have access through your login credentials or your institution to get full access on this article.

      Sign in

      Full Access

      • Published in

        cover image ACM SIGPLAN Notices
        ACM SIGPLAN Notices  Volume 48, Issue 6
        PLDI '13
        June 2013
        515 pages
        ISSN:0362-1340
        EISSN:1558-1160
        DOI:10.1145/2499370
        Issue’s Table of Contents
        • cover image ACM Conferences
          PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation
          June 2013
          546 pages
          ISBN:9781450320146
          DOI:10.1145/2491956

        Copyright © 2013 ACM

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        • Published: 16 June 2013

        Check for updates

        Qualifiers

        • research-article

      PDF Format

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader
      About Cookies On This Site

      We use cookies to ensure that we give you the best experience on our website.

      Learn more

      Got it!