skip to main content
research-article

Parallelism orchestration using DoPE: the degree of parallelism executive

Published:04 June 2011Publication History
Skip Abstract Section

Abstract

In writing parallel programs, programmers expose parallelism and optimize it to meet a particular performance goal on a single platform under an assumed set of workload characteristics. In the field, changing workload characteristics, new parallel platforms, and deployments with different performance goals make the programmer's development-time choices suboptimal. To address this problem, this paper presents the Degree of Parallelism Executive (DoPE), an API and run-time system that separates the concern of exposing parallelism from that of optimizing it. Using the DoPE API, the application developer expresses parallelism options. During program execution, DoPE's run-time system uses this information to dynamically optimize the parallelism options in response to the facts on the ground. We easily port several emerging parallel applications to DoPE's API and demonstrate the DoPE run-time system's effectiveness in dynamically optimizing the parallelism for a variety of performance goals.

References

  1. R. Allen and K. Kennedy. Optimizing compilers for modern architectures: A dependence-based approach. Morgan Kaufmann Publishers Inc., 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  2. APC metered rack PDU user's guide. http://www.apc.comGoogle ScholarGoogle Scholar
  3. C. Bienia, S. Kumar, J. P. Singh, and K. Li. The PARSEC benchmark suite: characterization and architectural implications. In Proceedings of the Seventeenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  4. F. Blagojevic, D. S. Nikolopoulos, A. Stamatakis, C. D. Antonopoulos, and M. Curtis-Maury. Runtime scheduling of dynamic parallelism on accelerator-based multi-core systems. Parallel Computing, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  5. R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall, and Y. Zhou. Cilk: An efficient multithreaded runtime system. In Proceedings of the 5th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 1995. Google ScholarGoogle ScholarDigital LibraryDigital Library
  6. M. Bridges, N. Vachharajani, Y. Zhang, T. Jablin, and D. August. Revisiting the sequential programming model for multi-core. In Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  7. C. B. Colohan, A. Ailamaki, J. G. Steffan, and T. C. Mowry. Optimistic intra-transaction parallelism on chip multiprocessors. In Proceedings of the 31st International Conference on Very Large Data Bases (VLDB), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  8. M. Curtis-Maury, J. Dzierwa, C. D. Antonopoulos, and D. S. Nikolopoulos. Online power-performance adaptation of multithreaded programs using hardware event-based prediction. In Proceedings of the 20th International Conference on Supercomputing (ICS), 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  9. Y. Ding, M. Kandemir, P. Raghavan, and M. J. Irwin. Adapting application execution in CMPs using helper threads. Journal of Parallel and Distributed Computing, 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  10. GNU Image Manipulation Program. http://www.gimp.orgGoogle ScholarGoogle Scholar
  11. M. W. Hall and M. Martonosi. Adaptive parallelism in compiler-parallelized code. In Proceedings of the 2nd SUIF Compiler Workshop, 1997.Google ScholarGoogle Scholar
  12. N. Hardavellas, I. Pandis, R. Johnson, N. Mancheril, A. Ailamaki, and B. Falsafi. Database servers on chip multiprocessors: Limitations and opportunities. In Proceedings of the Third Biennial Conference on Innovative Data Systems Research (CIDR), 2007.Google ScholarGoogle Scholar
  13. W. Ko, M. N. Yankelevsky, D. S. Nikolopoulos, and C. D. Polychronopoulos. Effective cross-platform, multilevel parallelism via dynamic adaptive execution. In Proceedings of the International Parallel and Distributed Processing Symposium (IPDPS), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  14. M. Kulkarni, K. Pingali, B. Walter, G. Ramanarayanan, K. Bala, and L. P. Chew. Optimistic parallelism requires abstractions. In Proceedings of the ACM SIGPLAN 2007 Conference on Programming Language Design and Implementation (PLDI), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  15. R. Liu, K. Klues, S. Bird, S. Hofmeyr, K. Asanović, and J. Kubiatowicz. Tessellation: Space-time partitioning in a manycore client OS. In Proceedings of the First USENIX Workshop on Hot Topics in Parallelism (HotPar), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  16. C.-K. Luk, S. Hong, and H. Kim. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping. In Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  17. Q. Lv, W. Josephson, Z. Wang, M. Charikar, and K. Li. Ferret: A toolkit for content-based similarity search of feature-rich data. ACM SIGOPS Operating Systems Review, 2006. Google ScholarGoogle ScholarDigital LibraryDigital Library
  18. J. Mars, N. Vachharajani, M. L. Soffa, and R. Hundt. Contention aware execution: Online contention detection and response. In Proceedings of the Eighth Annual International Symposium on Code Generation and Optimization (CGO), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  19. D. Meisner, B. T. Gold, and T. F. Wenisch. PowerNap: Eliminating server idle power. In Proceedings of the Fourteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  20. A. Moreno, E. César, A. Guevara, J. Sorribes, T. Margalef, and E. Luque. Dynamic Pipeline Mapping (DPM). In Proceedings of the International Euro-Par Conference on Parallel Processing (Euro-Par), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  21. A. Navarro, R. Asenjo, S. Tabik, and C. Cascaval. Analytical modeling of pipeline parallelism. In Proceedings of the Eighteenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  22. The OpenMP API specification for parallel programming. http://www.openmp.org .Google ScholarGoogle Scholar
  23. H. Pan, B. Hindman, and K. Asanović. Composing parallel software efficiently with Lithe. In Proceedings of the ACM SIGPLAN 2010 Conference on Programming Language Design and Implementation (PLDI), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  24. M. K. Prabhu and K. Olukotun. Exposing speculative thread parallelism in SPEC2000. In Proceedings of the 10th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2005. Google ScholarGoogle ScholarDigital LibraryDigital Library
  25. A. Raman, H. Kim, T. R. Mason, T. B. Jablin, and D. I. August. Speculative parallelization using software multi-threaded transactions. In Proceedings of the Fifteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  26. J. Reinders. Intel Threading Building Blocks. O'Reilly & Associates, Inc., Sebastopol, CA, USA, 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  27. G. Semeraro, G. Magklis, R. Balasubramonian, D. H. Albonesi, S. Dwarkadas, and M. L. Scott. Energy-efficient processor design using multiple clock domains with dynamic voltage and frequency scaling. In Proceedings of the Eighth International Symposium on High-Performance Computer Architecture (HPCA), 2002. Google ScholarGoogle ScholarDigital LibraryDigital Library
  28. Standard Performance Evaluation Corporation (SPEC). http://www.spec.org.Google ScholarGoogle Scholar
  29. M. A. Suleman, M. K. Qureshi, Khubaib, and Y. N. Patt. Feedback-directed pipeline parallelism. In Proceedings of the Nineteenth International Conference on Parallel Architecture and Compilation Techniques (PACT), 2010. Google ScholarGoogle ScholarDigital LibraryDigital Library
  30. M. A. Suleman, M. K. Qureshi, and Y. N. Patt. Feedback-driven threading: Power-efficient and high-performance execution of multi-threaded workloads on CMPs. In Proceedings of the Thirteenth International Symposium on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  31. Sybase adaptive server. http://sybooks.sybase.com/nav/base.do.Google ScholarGoogle Scholar
  32. J. Tellez and B. Dageville. Method for computing the degree of parallelism in a multi-user environment. United States Patent No. 6,820,262. Oracle International Corporation, 2004.Google ScholarGoogle Scholar
  33. The IEEE and The Open Group. The Open Group Base Specifications Issue 6 IEEE Std 1003.1, 2004 Edition. 2004.Google ScholarGoogle Scholar
  34. C. Tian, M. Feng, V. Nagarajan, and R. Gupta. Copy or discard execution model for speculative parallelization on multicores. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2008. Google ScholarGoogle ScholarDigital LibraryDigital Library
  35. G. Upadhyaya, V. S. Pai, and S. P. Midkiff. Expressing and exploiting concurrency in networked applications with Aspen. In Proceedings of the 12th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2007. Google ScholarGoogle ScholarDigital LibraryDigital Library
  36. M. J. Voss and R. Eigenmann. ADAPT: Automated De-Coupled Adaptive Program Transformation. In Proceedings of the 28th International Conference on Parallel Processing (ICPP), 1999. Google ScholarGoogle ScholarDigital LibraryDigital Library
  37. Z. Wang and M. F. O'Boyle. Mapping parallelism to multi-cores: A machine learning based approach. In Proceedings of the 14th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2009. Google ScholarGoogle ScholarDigital LibraryDigital Library
  38. M. Welsh, D. Culler, and E. Brewer. SEDA: An architecture for well-conditioned, scalable internet services. ACM SIGOPS Operating Systems Review, 2001. Google ScholarGoogle ScholarDigital LibraryDigital Library
  39. T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra. Overview of the H.264/AVC video coding standard. IEEE Transactions on Circuits and Systems for Video Technology, 2003. Google ScholarGoogle ScholarDigital LibraryDigital Library
  40. H. Zhong, M. Mehrara, S. Lieberman, and S. Mahlke. Uncovering hidden loop level parallelism in sequential applications. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA), 2008.Google ScholarGoogle Scholar

Index Terms

  1. Parallelism orchestration using DoPE: the degree of parallelism executive

        Recommendations

        Comments

        Login options

        Check if you have access through your login credentials or your institution to get full access on this article.

        Sign in

        Full Access

        PDF Format

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader
        About Cookies On This Site

        We use cookies to ensure that we give you the best experience on our website.

        Learn more

        Got it!