ABSTRACT
This paper proposes a new parallel execution model where programmers augment a sequential program with pieces of code called serializers that dynamically map computational operations into serialization sets of dependent operations. A runtime system executes operations in the same serialization set in program order, and may concurrently execute operations in different sets. Because serialization sets establish a logical ordering on all operations, the resulting parallel execution is predictable and deterministic.
We describe the API and design of Prometheus, a C++ library that implements the serialization set abstraction through compile-time template instantiation and a runtime support library. We evaluate a set of parallel programs running on the x86_64 and SPARC-V9 instruction sets and study their performance on multicore, symmetric multiprocessor, and ccNUMA parallel machines. By contrast with conventional parallel execution models, we find that Prometheus programs are significantly easier to write, test, and debug, and their parallel execution achieves comparable performance.
- C. Bienia et al. The PARSEC benchmark suite: Characterization and architectural implications. In Proceedings of the 17th international conference on Parallel Architectures and Compilation Techniques (PACT), October 2008. Google Scholar
Digital Library
- K. Czarnecki and U. W. Eisenecker. Generative Programming. Addison-Wesley, 2000.Google Scholar
Digital Library
- R. Das et al. Communications optimizations for irregular scientific computations on distributed memory architectures. Journal of Parallel and Distributed Computing, 22:462--478, September 1994. Google Scholar
Digital Library
- J. Dean and S. Ghemawat. MapReduce: Simplified data processing on large clusters. In Proceedings of the 6th symposium on Operating System Design and Implementation (OSDI), 2004. Google Scholar
Digital Library
- M. Frigo et al. The implementation of the Cilk-5 multithreaded language. In Proceedings of the 1998 conference on Programming Language Design and Implementation (PLDI), pages 212--223, 1998. Google Scholar
Digital Library
- J. Giacomoni et al. FastForward for efficient pipeline parallelism: A cache-optimized concurrent lock-free queue. In Proceedings of the 13th symposium on Principles and Practice of Parallel Programming (PPoPP), pages 43--52, 2008. Google Scholar
Digital Library
- R. H. Halstead. MULTILISP: A language for concurrent symbolic computation. ACM Transactions on Programming Languages and Systems, 7(4):501--538, 1985. Google Scholar
Digital Library
- C. Hewitt. Viewing control structures as patterns of passing messages. Journal of Aritificial Intelligence, 8:323--363, 1977.Google Scholar
Digital Library
- Intel. Threading building blocks. http://threadingbuildingblocks.org.Google Scholar
- M. Kulkarni et al. Optimistic parallelism requires abstractions. In Proceedings of the 2007 conference on Programming Language Design and Implementation (PLDI), pages 211--222, 2007. Google Scholar
Digital Library
- R. G. Lavender and D. C. Schmidt. Active object: An object behavioral pattern for concurrent programming. In Proceedings of the 2nd conference on Pattern Languages of Programs (PLoP), 1995. Google Scholar
Digital Library
- E. A. Lee. The problem with threads. IEEE Computer, 39(5):33--42, May 2006. Google Scholar
Digital Library
- C. E. Leiserson. Cilk++: Multicore-enabling legacy C++ code. Carnegie Mellon University Parallel Thinking Series, April 2008.Google Scholar
- Microsoft. Programming the ThreadPool in .NET. http://msdn.microsoft.com/en-us/library/ms973903.aspx.Google Scholar
- Microsoft. Task parallel library (TPL). http://msdn.microsoft.com/en-us/magazine/cc163340.aspx.Google Scholar
- OpenMP. The OpenMP API specification for parallel programming. http://openmp.org/wp/.Google Scholar
- J. Pisharath et al. NU-MineBench 2.0. Technical Report CUCIS-2005-08-01, Northwestern University, 2005.Google Scholar
- R. Rajwar and J. Larus. Transactional Memory. Morgan Claypool, October 2006.Google Scholar
- C. Ranger et al. Evaluating MapReduce for multi-core and multiprocessor systems, 2007.Google Scholar
- M. C. Rinard and M. S. Lam. The design, implementation, and evaluation of jade. ACM Transactions on Programming Languages and Systems, 20(3):483--545, 1998. Google Scholar
Digital Library
- H. Sutter and J. Larus. Software and the concurrency revolution. ACM Queue, 3(7), September 2005. Google Scholar
Digital Library
- T. Veldhuizen. Using C++ template metaprograms. C++ Report, 7(4):36--43, May 1995.Google Scholar
Index Terms
Serialization sets: a dynamic dependence-based parallel execution model
Recommendations
Serialization sets: a dynamic dependence-based parallel execution model
PPoPP '09This paper proposes a new parallel execution model where programmers augment a sequential program with pieces of code called serializers that dynamically map computational operations into serialization sets of dependent operations. A runtime system ...
Programming with exceptions in JCilk
Special issue: Synchronization and concurrency in object-oriented languagesJCilk extends the serial subset of the Java language by importing the fork-join primitives spawn and sync from the Cilk multithreaded language, thereby providing call-return semantics for multithreaded subcomputations. In addition, JCilk transparently ...
The implementation of the Cilk-5 multithreaded language
The fifth release of the multithreaded language Cilk uses a provably good "work-stealing" scheduling algorithm similar to the first system, but the language has been completely redesigned and the runtime system completely reengineered. The efficiency of ...







Comments