Abstract
High-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones; that is, it keeps all workers busy by creating plenty of "logical" threads and adopting the oldest-first work stealing strategy. This paper proposes a "logical thread"-free framework called Tascell, which achieves a higher performance and supports a wider range of parallel environments including clusters without loss of productivity. A Tascell worker spawns a "real" task only when requested by another idle worker. The worker performs the spawning by temporarily "backtracking" and restoring its oldest task-spawnable state. Our approach eliminates the cost of spawning/managing logical threads. It also promotes the reuse of workspaces and improves the locality of reference since it does not need to prepare a workspace for each concurrently runnable logical thread. Furthermore, Tascell enables elegant and highly-efficient backtrack search algorithms with delayed workspace copying. For instance, our 16-queens problem solver is 1.86 times faster than Cilk on a system with two dual-core processors. Our approach also enables a single program to run in both shared and distributed memory environments with reasonable efficiency and scalability.
- Thomas M. Breuel. Lexical closures for C++. In Usenix Proceedings, C++ Conference, 1988.Google Scholar
- Philippe Charles, Christian Grothoff, Vijay Saraswat, Christopher Donawa, Allan Kielstra, Kemal Ebcioglu, Christoph von Praun, and Vivek Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10):519--538, 2005. ISSN 0362-1340. Google Scholar
Digital Library
- Marc Feeley. A message passing implementation of lazy task creation. In Proceedings of the International Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications, number 748 in Lecture Notes in Computer Science, pages 94--107. Springer-Verlag, 1993. Google Scholar
Digital Library
- Marc Feeley. Lazy remote procedure call and its implementation in a parallel variant of C. In Proceedings of International Workshop on Parallel Symbolic Languages and Systems, number 1068 in Lecture Notes in Computer Science, pages 3--21. Springer-Verlag, 1995. Google Scholar
Digital Library
- Matteo Frigo, Charles E. Leiserson, and Keith H. Randall. The im-plementation of the Cilk-5 multithreaded language. ACM SIGPLAN Notices (PLDI '98), 33(5):212--223, 1998. Google Scholar
Digital Library
- Seth C. Goldstein, Klaus E. Schauser, and David E. Culler. Lazy Threads: Implementing a fast parallel call. Journal of Parallel and Distributed Computing, 3(1):5--20, August 1996. Google Scholar
Digital Library
- Robert H. Halstead, Jr. New ideas in parallel Lisp: Language design, implementation, and programming tools. In T. Ito and R. H. Halstead, editors, Parallel Lisp: Languages and Systems, volume 441 of Lecture Notes in Computer Science, pages 2--57, Sendai, Japan, June 5-8, 1990. Springer, Berlin. Google Scholar
Digital Library
- Tasuku Hiraishi, Masahiro Yasugi, and Taiichi Yuasa. A transformation-based implementation of lightweight nested functions. IPSJ Digital Courier, 2:262-279, 2006. (IPSJ Transaction on Programming, Vol. 47, No. SIG 6(PRO 29), pp. 50--67.).Google Scholar
Cross Ref
- Intel Corporation. Intel Threading Building Block Tutorial, 2007. http://threadingbuildingblocks.org/. Google Scholar
Digital Library
- Richard Kelsey, William Clinger, and Jonathan Rees. Revised 5 report on the algorithmic language Scheme. ACM SIGPLAN Notices, 33(9): 26--76, September 1998. Google Scholar
Digital Library
- B. C. Kuszmaul. Cilk provides the "best overall productivity" for high performance computing:(and won the HPC challenge award to prove it). Proceedings of the nineteenth annual ACM Symposium on Parallel Algorithms and Architectures, pages 299--300, 2007. Google Scholar
Digital Library
- Eric Mohr, David A. Kranz, and Robert H. Halstead, Jr. Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Transactions on Parallel and Distributed Systems, 2(3): 264--280, July 1991. Google Scholar
Digital Library
- Liang Peng, Weng Fai Wong, Ming Dong Feng, and Chung Kwong Yuen. SilkRoad: A multithreaded runtime system with software distributed shared memory for SMP clusters. In IEEE International Conferrence on Cluster Computing (Cluster2000), pages 243--249, November 2000.Google Scholar
Cross Ref
- K. Randall. Cilk: Efficient multithreaded computing. Technical Report MIT/LCS/TR-749, 1998. Google Scholar
Digital Library
- R. M. Stallman. Using and porting GNU Compiler Collection. 1999.Google Scholar
- Volker Strumpen. Indolent closure creation. Technical Report MIT-LCS-TM-580, MIT, June 1998. Google Scholar
Digital Library
- Supercomputing Technologies Group. Cilk 5.4.6 Reference Manual. Massachusetts Institute of Technology, Laboratory for Computer Science, Cambridge, Massachusetts, USA.Google Scholar
- Kenjiro Taura, Kunio Tabata, and Akinori Yonezawa. Stack-Threads/MP: Integrating futures into calling standards. In Proceedings of ACM SIGPLAN Symposium on Principles & Practice of Parallel Programming (PPoPP'99), pages 60--71, May 1999. Google Scholar
Digital Library
- Seiji Umatani, Masahiro Yasugi, Tsuneyasu Komiya, and Taiichi Yuasa. Pursuing laziness for efficient implementation of modern multithreaded languages. In Proceedings of the Fifth International Symposium on High Performance Computing, number 2858 in Lecture Notes in Computer Science, pages 174--188, October 2003.Google Scholar
Cross Ref
- Rob V. van Nieuwpoort, Thilo Kielmann, and Henri E. Bal. Efficient load balancing for wide-area divide-and-conquer applications. In In Proceedings of the eighth ACM SIGPLAN symposium on Principles and Practices of Parallel Programming (PPoPP'01), pages 34--43, New York, NY, USA, 2001. ACM. ISBN 1-58113-346-4. Google Scholar
Digital Library
- Mark T. Vandevoorde and Eric S. Roberts. WorkCrews: An abstraction for controlling parallelism. International Journal of Parallel Program-ming, 17(4):347--366, 1988. Google Scholar
Digital Library
- David B. Wagner and Bradley G. Calder. Leapfrogging: A portable technique for implementing efficient futures. In Proceedings of Principles and Practice of Parallel Programming (PPoPP'93), pages 208--217, 1993. Google Scholar
Digital Library
- Masahiro Yasugi, Tasuku Hiraishi, and Taiichi Yuasa. Lightweight lexical closures for legitimate execution stack access. In Proceedings of the Fifteenth International Conference on Compiler Construction (CC2006), number 3923 in Lecture Notes in Computer Science, pages 170--184. Springer-Verlag, 2006. Google Scholar
Digital Library
Index Terms
Backtracking-based load balancing
Recommendations
Backtracking-based load balancing
PPoPP '09: Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programmingHigh-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones; that is, ...
Study on Dynamic Load Balancing Algorithm Based on MPICH
WCSE '09: Proceedings of the 2009 WRI World Congress on Software Engineering - Volume 01MPICH is the most important parallel programming tool in cluster currently. It implements communication in parallel program by message. Implementing load balance in MPI parallel program is very important. It may reduce running time and improve ...
Load Balancing Multi-Zone Applications on a Heterogeneous Cluster with Multi-Level Parallelism
ISPDC '04: Proceedings of the Third International Symposium on Parallel and Distributed Computing/Third International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous NetworksWe investigate the feasibility of running parallel applications on heterogeneous clusters. The motivation for doing so is twofold. First, it is practical to be able to pull together existing machines to run a job that is too big for any one of them, ...







Comments