Abstract
The performance of parallel code significantly depends on the parallel task granularity (PTG). If the PTG is too coarse, performance suffers due to load imbalance; if the PTG is too fine, performance suffers from the overhead that is induced by parallel task creation and scheduling.
This paper presents a software platform that automatically determines the PTG at run-time. Automatic PTG selection is enabled by concurrent calls, which are special source language constructs that provide a late decision (at run-time) of whether concurrent calls are executed sequentially or concurrently (as a parallel task). Furthermore, the execution semantics of concurrent calls permits the runtime system to merge two (or more) concurrent calls thereby coarsening the PTG. We present an integration of concurrent calls into the Java programming language, the Java Memory Model, and show how the Java Virtual Machine can adapt the PTG based on dynamic profiling. The performance evaluation shows that our runtime system performs competitively to Java programs for which the PTG is tuned manually. Compared to an unfortunate choice of the PTG, this approach performs up to 3x faster than standard Java code.
- OpenMP Application Program Interface, 2012. http://openmp.org/wp/.Google Scholar
- B. Alpern, C. R. Attanasio, J. J. Barton, M. G. Burke, P. Cheng, J.-D. Choi, A. Cocchi, S. J. Fink, D. Grove, M. Hind, S. F. Hummel, D. Lieber, V. Litvinov, M. F. Mergen, T. Ngo, J. R. Russell, V. Sarkar, M. J. Serrano, J. C. Shepherd, S. E. Smith, V. C. Sreedhar, H. Srinivasan, and J. Whaley. The Jalape, no virtual machine. IBM Syst. J., 39(1), 2000. Google Scholar
Digital Library
- M. Arnold, S. Fink, D. Grove, M. Hind, and P. F. Sweeney. Adaptive optimization in the jalapeno JVM.Google Scholar
- R. Barik and V. Sarkar. Interprocedural load elimination for dynamic optimization of parallel programs. In PACT '09. Google Scholar
Digital Library
- P. Charles, C. Grothoff, V. Saraswat, C. Donawa, A. Kielstra, K. Ebcioglu, C. von Praun, and V. Sarkar. X10: an object-oriented approach to non-uniform cluster computing. SIGPLAN Not., 40(10), 2005. Google Scholar
Digital Library
- M. Frigo, C. E. Leiserson, and K. H. Randall. The implementation of the Cilk-5 multithreaded language. SIGPLAN Not., 33(5), 1998. Google Scholar
Digital Library
- J. Gosling, B. Joy, G. Steele, and G. Bracha. Java(TM) Language Specification, The 3rd Edition. Addison-Wesley Professional, 2005. Google Scholar
Digital Library
- D. Grunwald and H. Srinivasan. Data flow equations for explicitly parallel programs. SIGPLAN Not., 28, July 1993. Google Scholar
Digital Library
- P. G. Joisha, R. S. Schreiber, P. Banerjee, H. J. Boehm, and D. R. Chakrabarti. A technique for the effective and automatic reuse of classical compiler optimizations on multithreaded code. In POPL '11. Google Scholar
Digital Library
- A. Kaminsky. Parallel Java: A unified API for shared memory and cluster parallel programming in 100% Java. In IPDPS '07.Google Scholar
- J. Knoop, B. Steffen, and J. Vollmer. Parallelism for free: efficient and optimal bitvector analyses for parallel programs. ACM Trans. Program. Lang. Syst., 18, May 1996. Google Scholar
Digital Library
- L. Lamport. How to make a multiprocessor computer that correctly executes multiprocess programs. IEEE Trans. Comput., 28(9), 1979. Google Scholar
Digital Library
- J. Lee, S. P. Midkiff, and D. A. Padua. Concurrent static single assignment form and constant propagation for explicitly parallel programs. In LCPC '97. Google Scholar
Digital Library
- J. Lee, D. A. Padua, and S. P. Midkiff. Basic compiler algorithms for parallel programs. In PPoPP '99. Google Scholar
Digital Library
- D. Leijen, W. Schulte, and S. Burckhardt. The design of a task parallel library. SIGPLAN Not., 44(10), 2009. Google Scholar
Digital Library
- J. Manson, W. Pugh, and S. V. Adve. The Java memory model. In POPL '05. Google Scholar
Digital Library
- N. D. Matsakis and T. R. Gross. A time-aware type system for data-race protection and guaranteed initialization. In OOPSLA '10. Google Scholar
Digital Library
- S. S. Muchnick. Advanced compiler design and implementation. Morgan Kaufmann Publishers Inc., 1997. Google Scholar
Digital Library
- R. Rugina and M. C. Rinard. Pointer analysis for structured parallel programs. ACM Trans. Program. Lang. Syst., 25, January 2003. Google Scholar
Digital Library
- V. Sarkar. Analysis and optimization of explicitly parallel programs using the parallel program graph representation. In LCPC '97. Google Scholar
Digital Library
- S. Satoh, K. Kusano, and M. Sato. Compiler optimization techniques for openmp programs. Sci. Program., 9, 2001. Google Scholar
Digital Library
- L. A. Smith, J. M. Bull, and J. Obdrzálek. A parallel Java grande benchmark suite. In Supercomputing '01. Google Scholar
Digital Library
- H. Srinivasan, J. Hook, and M. Wolfe. Static single assignment for explicitly parallel programs. In POPL '93. Google Scholar
Digital Library
- Z. Sura, X. Fang, C.-L. Wong, S. P. Midkiff, J. Lee, and D. Padua. Compiler techniques for high performance sequentially consistent java programs. In PPoPP '05. Google Scholar
Digital Library
- X. Tian, M. Girkar, S. Shah, D. Armstrong, E. Su, and P. Petersen. Compiler and runtime support for running OpenMP programs on pentium-and itanium-architectures. In HIPS '03. Google Scholar
Digital Library
- J. Whaley. Partial method compilation using dynamic profile information. SIGPLAN Not., 36, October 2001. Google Scholar
Digital Library
- Y. Zhang, E. Duesterwald, and G. R. Gao. Concurrency analysis for shared memory programs with textually unaligned barriers. In LCPC'08.Google Scholar
- J. Zhao, J. Shirako, V. K. Nandivada, and V. Sarkar. Reducing task creation and termination overhead in explicitly parallel programs. In PACT '10. Google Scholar
Digital Library
Index Terms
Online feedback-directed optimizations for parallel Java code
Recommendations
Online feedback-directed optimization of Java
This paper describes the implementation of an online feedback-directed optimization system. The system is fully automatic; it requires no prior (offline) profiling run. It uses a previously developed low-overhead instrumentation sampling framework to ...
Online feedback-directed optimizations for parallel Java code
OOPSLA '13: Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applicationsThe performance of parallel code significantly depends on the parallel task granularity (PTG). If the PTG is too coarse, performance suffers due to load imbalance; if the PTG is too fine, performance suffers from the overhead that is induced by parallel ...







Comments