ABSTRACT
In this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution of a program, (ii) from this to identify pieces of work which are promising sources of parallelism, (iii) recompile the program with this work being performed speculatively via a work-stealing system and then (iv) to detect at run-time any attempt to perform operations that would reveal the presence of speculation.
We assess the practicality of the approach through an implementation based on GHC 6.6 along with a limit study based on the execution profiles we gathered. We support the full Concurrent Haskell language compiled with traditional optimizations and including I/O operations and synchronization as well as pure computation. We use 20 of the larger programs from the 'nofib' benchmark suite. The limit study shows that programs vary a lot in the parallelism we can identify: some have none, 16 have a potential 2x speed-up, 4 have 32x. In practice, on a 4-core processor, we get 10-80% speed-ups on 7 programs. This is mainly achieved at the addition of a second core rather than beyond this.
This approach is therefore not a replacement for manual parallelization, but rather a way of squeezing extra performance out of the threads of an already-parallel program or out of a program that has not yet been parallelized.
- Arvind, David E. Culler, and Gino K. Maa. Assessing the benefits of fine-grained parallelism in dataflow programs. Technical Report 279, Computation Structures Group, MIT, March 1988.Google Scholar
- K. E. Batcher. Sorting networks and their applications. In Proceedings of the AFIPS Spring Joint Computing Conference, 1969.Google Scholar
- R. Blumofe and C. Leiserson. Scheduling multithreaded computations by work stealing. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, New Mexico., pages 356--368, November 1994.Google Scholar
Digital Library
- Robert Ennals. Adaptive Evaluation of Non-Strict Programs. PhD thesis, Cambridge University Computer Laboratory, 2004.Google Scholar
- Robert Ennals and Simon Peyton Jones. Optimistic evaluation: an adaptive evaluation strategy for non-strict programs. In ICFP '03: Proceedings of the eighth ACM SIGPLAN international conference on Functional programming, pages 287--298. ISBN 1-58113-756-7. Google Scholar
Digital Library
- Kevin Hammond. Parallel functional programming: An introduction. In International Symposium on Parallel Symbolic Computation, Hagenberg/Linz, Austria, September 1994.Google Scholar
- Tim Harris. Dynamic adaptive pre-tenuring. In International Symposium on Memory Management (ISMM,'00), volume 36(1) of ACM SIGPLAN Notices, pages 127--136, January 2001. Google Scholar
Digital Library
- Tim Harris, Simon Marlow, and Simon Peyton Jones. Haskell on a shared-memory multiprocessor. In Haskell '05: Proceedings of the 2005 ACM SIGPLAN workshop on Haskell, pages 49--61, September 2005. Google Scholar
Digital Library
- Danny Hendler, Yossi Lev, Mark Moir, and Nir Shavit. A dynamic-sized nonblocking work stealing deque. Technical Report TR-2005-144, Sun Microsystems Laboratories, 2005.Google Scholar
- Raj Jain. The art of computer systems performance analysis. Wiley, 1991.Google Scholar
- H-W. Loidl. Granularity in Large-Scale Parallel Functional Programming. PhD thesis, Department of Computing Science, University of Glasgow, March 1998.Google Scholar
- James Stewart Mattson. An effective speculative evaluation technique for parallel supercombinator graph reduction. PhD thesis, University of California at San Diego.Google Scholar
- Rishiyur S Nikhil and Arvind. Implicit Parallel Programming in pH. Morgan Kaufman, 2001. Google Scholar
Digital Library
- Randy B. Osborne. Speculative computation in multilisp. In LFP '90: Proceedings of the 1990 ACM conference on LISP and functional programming, pages 198--208. ISBN 0-89791-368-X. Google Scholar
Digital Library
- Simon Peyton Jones, Andrew Gordon, and Sigbjorn Finne. Concurrent haskell. In POPL '96: Proceedings of the 23rd ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 295--308, New York, NY, USA, 1996. ISBN 0-89791-769-3. Google Scholar
Digital Library
- Paul Roe. Parallel Programming Using Functional Languages. PhD thesis, Department of Computing Science, University of Glasgow, 1991.Google Scholar
- Kenneth R. Traub. Implementation of non-strict functional programming languages. MIT Press, Cambridge, MA, USA, 1991. ISBN 0-262-70042-5. Google Scholar
Digital Library
- Kenneth R. Traub, Gregory M. Papadopoulos, Michael J. Beckerle, James E. Hicks, and Jonathan Young. Overview of the Monsoon project. In ICCD '91: Proceedings of the 1991 IEEE International Conference on Computer Design on VLSI in Computer & Processors, pages 150--155. ISBN 0-8186-2270-9. Google Scholar
Digital Library
- P. W. Trinder, K. Hammond, H.-W. Loidl, and S. L. Peyton Jones. Algorithm + strategy = parallelism. J. Funct. Program., 8 (1): 23--60, 1998. ISSN 0956-7968. Google Scholar
Digital Library
Index Terms
Feedback directed implicit parallelism
Recommendations
Feedback directed implicit parallelism
Proceedings of the ICFP '07 conferenceIn this paper we present an automated way of using spare CPU resources within a shared memory multi-processor or multi-core machine. Our approach is (i) to profile the execution of a program, (ii) from this to identify pieces of work which are promising ...
Comparing Parallel Functional Languages: Programming and Performance
This paper presents a practical evaluation and comparison of three state-of-the-art parallel functional languages. The evaluation is based on implementations of three typical symbolic computation programs, with performance measured on a Beowulf-class ...
Exploiting Implicit Parallelism in Dynamic Array Programming Languages
ARRAY'14: Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array ProgrammingWe have built an interpreter for the array programming language J. The interpreter exploits implicit data parallelism in the language to achieve good parallel speedups on a variety of benchmark applications.
Many array programming languages operate on ...







Comments