skip to main content
research-article

Backtracking-based load balancing

Published:14 February 2009Publication History
Skip Abstract Section

Abstract

High-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones; that is, it keeps all workers busy by creating plenty of "logical" threads and adopting the oldest-first work stealing strategy. This paper proposes a "logical thread"-free framework called Tascell, which achieves a higher performance and supports a wider range of parallel environments including clusters without loss of productivity. A Tascell worker spawns a "real" task only when requested by another idle worker. The worker performs the spawning by temporarily "backtracking" and restoring its oldest task-spawnable state. Our approach eliminates the cost of spawning/managing logical threads. It also promotes the reuse of workspaces and improves the locality of reference since it does not need to prepare a workspace for each concurrently runnable logical thread. Furthermore, Tascell enables elegant and highly-efficient backtrack search algorithms with delayed workspace copying. For instance, our 16-queens problem solver is 1.86 times faster than Cilk on a system with two dual-core processors. Our approach also enables a single program to run in both shared and distributed memory environments with reasonable efficiency and scalability.

References

  1. Thomas M. Breuel. Lexical closures for C++. In Usenix Proceedings, C++ Conference, 1988.Google ScholarGoogle Scholar
  2. Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519--538, 2005. ISSN 0362-1340. Google ScholarGoogle ScholarDigital LibraryDigital Library
  3. Marc Feeley. A message passing implementation of lazy task creation. In Proceedings of the International Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications, number 748 in Lecture Notes in Computer Science, pages 94--107. Springer-Verlag, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. Marc Feeley. Lazy remote procedure call and its implementation in a parallel variant of C. In Proceedings of International Workshop on Parallel Symbolic Languages and Systems, number 1068 in Lecture Notes in Computer Science, pages 3--21. Springer-Verlag, 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The im-plementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices (PLDI '98), 33(5):212--223, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. Seth C. Goldstein, Klaus E. Schauser, and David E. Culler. Lazy Threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 3(1):5--20, August 1996. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. Robert H. Halstead, Jr. New ideas in parallel Lisp: Language design, implementation, and programming tools. In T. Ito and R. H. Halstead, editors, Parallel Lisp: Languages and Systems, volume 441 of Lecture Notes in Computer Science, pages 2--57, Sendai, Japan, June 5-8, 1990. Springer, Berlin. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. Tasuku Hiraishi, Masahiro Yasugi, and Taiichi Yuasa. A transformation-based implementation of lightweight nested functions. IPSJ Digital Courier, 2:262-279, 2006. (IPSJ Transaction on Programming, Vol. 47, No. SIG 6(PRO 29), pp. 50--67.).Google ScholarGoogle ScholarCross RefCross Ref
  9. Intel Corporation. Intel Threading Building Block Tutorial, 2007. http://threadingbuildingblocks.org/. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. Richard Kelsey, William Clinger, and Jonathan Rees. Revised 5 report on the algorithmic language Scheme. ACM SIGPLAN Notices, 33(9): 26--76, September 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  11. B. C. Kuszmaul. Cilk provides the "best overall productivity" for high performance computing:(and won the HPC challenge award to prove it). Proceedings of the nineteenth annual ACM Symposium on Parallel Algorithms and Architectures, pages 299--300, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  12. Eric Mohr, David A. Kranz, and Robert H. Halstead, Jr. Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2(3): 264--280, July 1991. Google ScholarGoogle ScholarDigital LibraryDigital Library
  13. Liang Peng, Weng Fai Wong, Ming Dong Feng, and Chung Kwong Yuen. SilkRoad: A multithreaded runtime system with software distributed shared memory for SMP clusters. In IEEE International Conferrence on Cluster Computing (Cluster2000), pages 243--249, November 2000.Google ScholarGoogle ScholarCross RefCross Ref
  14. K. Randall. Cilk: Efficient multithreaded computing. Technical Report MIT/LCS/TR-749, 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. M. Stallman. Using and porting GNU Compiler Collection. 1999.Google ScholarGoogle Scholar
  16. Volker Strumpen. Indolent closure creation. Technical Report MIT-LCS-TM-580, MIT, June 1998. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Supercomputing Technologies Group. Cilk 5.4.6 Reference Manual. Massachusetts Institute of Technology, Laboratory for Computer Science, Cambridge, Massachusetts, USA.Google ScholarGoogle Scholar
  18. Kenjiro Taura, Kunio Tabata, and Akinori Yonezawa. Stack-Threads/MP: Integrating futures into calling standards. In Proceedings of ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP'99), pages 60--71, May 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. Seiji Umatani, Masahiro Yasugi, Tsuneyasu Komiya, and Taiichi Yuasa. Pursuing laziness for efficient implementation of modern multithreaded languages. In Proceedings of the Fifth International Symposium on High Performance Computing, number 2858 in Lecture Notes in Computer Science, pages 174--188, October 2003.Google ScholarGoogle ScholarCross RefCross Ref
  20. Rob V. van Nieuwpoort, Thilo Kielmann, and Henri E. Bal. Efficient load balancing for wide-area divide-and-conquer applications. In In Proceedings of the eighth ACM SIGPLAN symposium on Principles and Practices of Parallel Programming (PPoPP'01), pages 34--43, New York, NY, USA, 2001. ACM. ISBN 1-58113-346-4. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. Mark T. Vandevoorde and Eric S. Roberts. WorkCrews: An abstraction for controlling parallelism. International Journal of Parallel Program-ming, 17(4):347--366, 1988. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. David B. Wagner and Bradley G. Calder. Leapfrogging: A portable technique for implementing efficient futures. In Proceedings of Principles and Practice of Parallel Programming (PPoPP'93), pages 208--217, 1993. Google ScholarGoogle ScholarDigital LibraryDigital Library
  23. Masahiro Yasugi, Tasuku Hiraishi, and Taiichi Yuasa. Lightweight lexical closures for legitimate execution stack access. In Proceedings of the Fifteenth International Conference on Compiler Construction (CC2006), number 3923 in Lecture Notes in Computer Science, pages 170--184. Springer-Verlag, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library

Index Terms

  1. Backtracking-based load balancing

    Recommendations

    Comments

    Login options

    Check if you have access through your login credentials or your institution to get full access on this article.

    Sign in

    Full Access

    • Published in

      cover image ACM SIGPLAN Notices
      ACM SIGPLAN Notices  Volume 44, Issue 4
      PPoPP '09
      April 2009
      294 pages
      ISSN:0362-1340
      EISSN:1558-1160
      DOI:10.1145/1594835
      Issue’s Table of Contents
      • cover image ACM Conferences
        PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
        February 2009
        322 pages
        ISBN:9781605583976
        DOI:10.1145/1504176

      Copyright © 2009 ACM

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      • Published: 14 February 2009

      Check for updates

      Qualifiers

      • research-article

    PDF Format

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader
    About Cookies On This Site

    We use cookies to ensure that we give you the best experience on our website.

    Learn more

    Got it!