skip to main content
10.1145/1504176.1504186acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

Idempotent work stealing

Published:14 February 2009Publication History

ABSTRACT

Load balancing is a technique which allows efficient parallelization of irregular workloads, and a key component of many applications and parallelizing runtimes. Work-stealing is a popular technique for implementing load balancing, where each parallel thread maintains its own work set of items and occasionally steals items from the sets of other threads.

The conventional semantics of work stealing guarantee that each inserted task is eventually extracted exactly once. However, correctness of a wide class of applications allows for relaxed semantics, because either: i) the application already explicitly checks that no work is repeated or ii) the application can tolerate repeated work.

In this paper, we introduce idempotent work tealing, and present several new algorithms that exploit the relaxed semantics to deliver better performance. The semantics of the new algorithms guarantee that each inserted task is eventually extracted at least once-instead of exactly once.

On mainstream processors, algorithms for conventional work stealing require special atomic instructions or store-load memory ordering fence instructions in the owner's critical path operations. In general, these instructions are substantially slower than regular memory access instructions. By exploiting the relaxed semantics, our algorithms avoid these instructions in the owner's operations.

We evaluated our algorithms using common graph problems and micro-benchmarks and compared them to well-known conventional work stealing algorithms, the THE Cilk and Chase-Lev algorithms. We found that our best algorithm (with LIFO extraction) outperforms existing algorithms in nearly all cases, and often by significant margins.

References

  1. Nimar S. Arora, Robert D. Blumofe, and C. Greg Plaxton. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the Tenth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, pages 119--129, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. David A. Bader and Guojing Cong. A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs). Journal of Parallel and Distributed Computing, 65(9):994--1006, September 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. David A. Bader and Joseph JáJá. SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs). Journal of Parallel and Distributed Computing, 58(1):92--108, July 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Robert D. Blumofe, Christopher F. Joerg, Bradley C. Kuszmaul, Charles E. Leiserson, Keith H. Randall, and Yuli Zhou. Cilk: an efficient multithreaded runtime system. In Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP, pages 207--216, October 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Philippe Charles, Christian Grothoff, Vijay A. Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. In Proceedings of the Twentieth Annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications, OOPSLA, pages 519--538, October 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. David Chase and Yossi Lev. Dynamic circular work-stealing deque. In Proceedings of the Seventeenth Annual ACM Symposium on Parallelism in Algorithms and Architectures, SPAA, pages 21--28, July 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Christine H. Flood, David Detlefs, Nir Shavit, and Xiaolan Zhang. Parallel garbage collection for shared memory multiprocessors. In Proceedings of the First Java Virtual Machine Research and Technology Symposium, JVM, pages 21--21, April 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The implementation of the cilk-5 multithreaded language. In Proceedings of the ACM SIGPLAN 1998 Conference on Programming Language Design and Implementation, PLDI, pages 212--223, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. John Greiner. A comparison of parallel algorithms for connected components. In Proceedings of the Sixth Annual ACM Symposium on Parallel Algorithms and Architectures, SPAA, pages 16--25, June 1994. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Danny Hendler, Yossi Lev, Mark Moir, and Nir Shavit. A dynamic-sized nonblocking work stealing deque. Distributed Computing, 18(3):189--207, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. Danny Hendler and Nir Shavit. Non-blocking steal-half work queues. In Proceedings Twenty-First Annual ACM Symposium on Principles of Distributed Computing, PODC, pages 280--289, July 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. IBM System/370 Extended Architecture, Principles of Operation, 1983. Publication No. SA22-7085.Google ScholarGoogle Scholar
  13. Doug Lea. The JSR-133 Cookbook for Compiler Writers. Web page.Google ScholarGoogle Scholar
  14. Maged M. Michael. Practical lock-free and wait-free LL/SC/VL implementations using 64-bit CAS. In Proceedings of the Eighteenth International Conference on Distributed Computing, DISC, pages 144--158, October 2004.Google ScholarGoogle ScholarCross RefCross Ref

Index Terms

  1. Idempotent work stealing

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in
        • Published in

          cover image ACM Conferences
          PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
          February 2009
          322 pages
          ISBN:9781605583976
          DOI:10.1145/1504176
          • cover image ACM SIGPLAN Notices
            ACM SIGPLAN Notices  Volume 44, Issue 4
            PPoPP '09
            April 2009
            294 pages
            ISSN:0362-1340
            EISSN:1558-1160
            DOI:10.1145/1594835
            Issue’s Table of Contents

          Copyright © 2009 ACM

          Publisher

          Association for Computing Machinery

          New York, NY, United States

          Publication History

          • Published: 14 February 2009

          Permissions

          Request permissions about this article.

          Request Permissions

          Check for updates

          Qualifiers

          • research-article

          Acceptance Rates

          Overall Acceptance Rate230of1,014submissions,23%

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!