Abstract
In writing parallel programs, programmers expose parallelism and optimize it to meet a particular performance goal on a single platform under an assumed set of workload characteristics. In the field, changing workload characteristics, new parallel platforms, and deployments with different performance goals make the programmer's development-time choices suboptimal. To address this problem, this paper presents the Degree of Parallelism Executive (DoPE), an API and run-time system that separates the concern of exposing parallelism from that of optimizing it. Using the DoPE API, the application developer expresses parallelism options. During program execution, DoPE's run-time system uses this information to dynamically optimize the parallelism options in response to the facts on the ground. We easily port several emerging parallel applications to DoPE's API and demonstrate the DoPE run-time system's effectiveness in dynamically optimizing the parallelism for a variety of performance goals.
- R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann Publishers Inc., 2002. Google Scholar
Digital Library
- APC metered rack PDU user's guide. http://www.apc.comGoogle Scholar
- C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the Seventeenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2008. Google Scholar
Digital Library
- F. Blagojevic, D. S. Nikolopoulos, A. Stamatakis, C. D. Antonopoulos, and M. Curtis-Maury. Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems. Parallel Computing, 2007. Google Scholar
Digital Library
- R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 1995. Google Scholar
Digital Library
- M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2007. Google Scholar
Digital Library
- C. B. Colohan, A. Ailamaki, J. G. Steffan, and T. C. Mowry. Optimistic intra-transaction parallelism on chip multiprocessors. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), 2005. Google Scholar
Digital Library
- M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In Proceedings of the 20th International Conference on Supercomputing (ICS), 2006. Google Scholar
Digital Library
- Y. Ding, M. Kandemir, P. Raghavan, and M. J. Irwin. Adapting application execution in CMPs using helper threads. Journal of Parallel and Distributed Computing, 2009. Google Scholar
Digital Library
- GNU Image Manipulation Program. http://www.gimp.orgGoogle Scholar
- M. W. Hall and M. Martonosi. Adaptive parallelism in compiler-parallelized code. In Proceedings of the 2nd SUIF Compiler Workshop, 1997.Google Scholar
- N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007.Google Scholar
- W. Ko, M. N. Yankelevsky, D. S. Nikolopoulos, and C. D. Polychronopoulos. Effective cross-platform, multilevel parallelism via dynamic adaptive execution. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2002. Google Scholar
Digital Library
- M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI), 2007. Google Scholar
Digital Library
- R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanović, and J. Kubiatowicz. Tessellation: Space-time partitioning in a manycore client OS. In Proceedings of the First USENIX Workshop on Hot Topics in Parallelism (HotPar), 2009. Google Scholar
Digital Library
- C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009. Google Scholar
Digital Library
- Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Ferret: A toolkit for content-based similarity search of feature-rich data. ACM SIGOPS Operating Systems Review, 2006. Google Scholar
Digital Library
- J. Mars, N. Vachharajani, M. L. Soffa, and R. Hundt. Contention aware execution: Online contention detection and response. In Proceedings of the Eighth Annual International Symposium on Code Generation and Optimization (CGO), 2010. Google Scholar
Digital Library
- D. Meisner, B. T. Gold, and T. F. Wenisch. PowerNap: Eliminating server idle power. In Proceedings of the Fourteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009. Google Scholar
Digital Library
- A. Moreno, E. César, A. Guevara, J. Sorribes, T. Margalef, and E. Luque. Dynamic Pipeline Mapping (DPM). In Proceedings of the International Euro-Par Conference on Parallel Processing (Euro-Par), 2008. Google Scholar
Digital Library
- A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. Analytical modeling of pipeline parallelism. In Proceedings of the Eighteenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2009. Google Scholar
Digital Library
- The OpenMP API specification for parallel programming. http://www.openmp.org .Google Scholar
- H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with Lithe. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI), 2010. Google Scholar
Digital Library
- M. K. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2005. Google Scholar
Digital Library
- A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the Fifteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010. Google Scholar
Digital Library
- J. Reinders. Intel Threading Building Blocks. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2007. Google Scholar
Digital Library
- G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA), 2002. Google Scholar
Digital Library
- Standard Performance Evaluation Corporation (SPEC). http://www.spec.org.Google Scholar
- M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. Feedback-directed pipeline parallelism. In Proceedings of the Nineteenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2010. Google Scholar
Digital Library
- M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. In Proceedings of the Thirteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008. Google Scholar
Digital Library
- Sybase adaptive server. http://sybooks.sybase.com/nav/base.do.Google Scholar
- J. Tellez and B. Dageville. Method for computing the degree of parallelism in a multi-user environment. United States Patent No. 6,820,262. Oracle International Corporation, 2004.Google Scholar
- The IEEE and The Open Group. The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition. 2004.Google Scholar
- C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2008. Google Scholar
Digital Library
- G. Upadhyaya, V. S. Pai, and S. P. Midkiff. Expressing and exploiting concurrency in networked applications with Aspen. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007. Google Scholar
Digital Library
- M. J. Voss and R. Eigenmann. ADAPT: Automated De-Coupled Adaptive Program Transformation. In Proceedings of the 28th International Conference on Parallel Processing (ICPP), 1999. Google Scholar
Digital Library
- Z. Wang and M. F. O'Boyle. Mapping parallelism to multi-cores: A machine learning based approach. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2009. Google Scholar
Digital Library
- M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for well-conditioned, scalable internet services. ACM SIGOPS Operating Systems Review, 2001. Google Scholar
Digital Library
- T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 2003. Google Scholar
Digital Library
- H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA), 2008.Google Scholar
Index Terms
Parallelism orchestration using DoPE: the degree of parallelism executive
Recommendations
Parallelism orchestration using DoPE: the degree of parallelism executive
PLDI '11: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and ImplementationIn writing parallel programs, programmers expose parallelism and optimize it to meet a particular performance goal on a single platform under an assumed set of workload characteristics. In the field, changing workload characteristics, new parallel ...
(R) Scheduling of Wavefront Parallelism on Scalable Shared-memory Multiprocessors
ICPP '96: Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3Abstract: Tiling exploits temporal reuse carried by an outer loop of a loop nest to enhance cache locality. Loop skewing is typically required to make tiling legal. This restricts parallelism to wavefronts in the tiled iteration space. For a small ...
Transformations techniques for extracting parallelism in non-uniform nested loops
Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the ...







Comments